Может кто-нибудь объяснить мне, почему функция потерь в этом коде не понимает это правильно? - PullRequest
0 голосов
/ 26 марта 2019

Таким образом, этот код создает нейронные сети, которые должны правильно классифицировать цифры в наборе данных MNIST.

Метод, используемый для обучения сети, - это не обратное распространение, а использование (или попытка использовать впо крайней мере) метод, называемый нейроэволюцией, основанный на принципе эволюции Дарвина, для создания популяции нейронных сетей, их оценки и использования лучших из них для создания нейронных сетей нового поколения и т. д.

ВВ этом коде я создаю совокупность из 10 нейронных сетей, которые оцениваются с помощью функции кросс-энтропийной потери.Я оставляю 5 лучших из них для следующего поколения и заменяю 5 других «дочерними» сетями, созданными из уже сохраненных 5.

Моя проблема в том, что я не вижу те же сети (сохраняемые) сохраняют одно и то же значение потерь от поколения к другому.

Например, если оцениваются 5 лучших сетей, для каждой из них отображается определенное значение потерь, а затем они сохраняются для следующегопоколение, и все население переоценено.Но там я не могу найти те же значения потерь, что и ранее.Поскольку совокупность представлена ​​в виде объекта списка и хранится в том же порядке, если 5 вновь созданных других «дочерних» сетей были лучше, то они должны заменить значения 5 ранее сохраненных сетей, но это показывает, что «дочерние "сети часто имеют худшие значения, и, тем не менее, очевидно, что значения потерь, рассчитанные для сохраненных сетей, варьируются от одного поколения к другому.

В кратком предложении: значение потерь от одного поколения к другому для ЖЕ сетейОбъекты не одинаковы, принимая при этом точно такие же данные и те же параметры.

Если у кого-то есть время взглянуть на него и понять, что не так в коде.Возможно, это где-то проблема с кодом при расчете потерь, но я не могу понять это.

PS: Я также видел, и есть какой-то код, показывающий это, что вычисление значения потерьдля конкретной сети меняется очень незначительно (десятичные дроби) от одного расчета к другому, и это то, чего я уже не понимаю, но он все еще не может объяснить большие различия в расчете стоимости потерь от одного поколения к другому.

Итак, вот три модуля моего кода:

Первый модуль:

"""
Utility used by the Network class to actually train.
Based on:
    https://github.com/fchollet/keras/blob/master/examples/mnist_mlp.py
"""
#from keras.datasets import cifar10

from sklearn.datasets import fetch_mldata
from sklearn.metrics import classification_report, confusion_matrix
import numpy as np





def get_mnist():
    """Get MNIST dataset through scikit-learn and pre-process data to make it usable by our classifiers"""

    mnist = fetch_mldata('MNIST original')

    #X as images - array (70000, 784) corresponding respectively to the number of examples and the number of pixels per image
    #y as labels  - array (70000,) corresponding to the number of examples, each value is a digit from 0 to 9
    X, y = mnist["data"], mnist["target"]

    #Normalize pixels of images
    X = X / 255


    digits = 10
    examples = y.shape[0]

    # Reshape y as array (1, 70000)
    y = y.reshape(1, examples)

    """Create a label array of shape (10, 70000) and replace each digit value from 0 to 9 by value of 1.
        Rest of the array composed of zeros.
    Allow us to design it the same way as our networks's output array will be, with the maximum value corresponding to what the digit is."""
    Y_new = np.eye(digits)[y.astype('int32')]
    Y_new = Y_new.T.reshape(digits, examples)

    m = 60000
    m_test = X.shape[0] - m

    #Get images for train set and test set, transposing in array of shape (784, 60000) and (784, 10000)
    X_train, X_test = X[:m].T, X[m:].T

    #Get labels for train set and test set
    Y_train, Y_test = Y_new[:,:m], Y_new[:,m:]

    #Shuffle train set to randomize it as it is organized from digits 0 to 9
    shuffle_index = np.random.permutation(m)
    X_train, Y_train = X_train[:, shuffle_index], Y_train[:, shuffle_index]


    return (X_train, X_test, Y_train, Y_test)


def sigmoid(z):
    #sigmoid activation function
    s = 1 / (1 + np.exp(-z))
    return s


def compute_multiclass_loss(Y, Y_hat):
    """Fitness function: Categorical cross-entropy cost function, used in the case of multi-class outputs."""
    L_sum = np.sum(np.multiply(Y, np.log(10**(-15)+Y_hat)))
    m = Y.shape[1]
    L = -(1/m) * L_sum

    return L

def compute_accuracy(Y, Y_hat):
    """Fitness function: a different way to compute cost function, with accuracy.
    Tested, seems to work but the formula has to be verified as it may contain potential approximations"""

    correct = 0
    uncorrect = 0
    argmax_Y = np.argmax(Y, axis=0)
    argmax_Y_hat = np.argmax(Y_hat, axis=0)
    for i in range(60000):
        if argmax_Y[i] == argmax_Y_hat[i]:
            correct += 1
        else:
            uncorrect += 1

    accuracy = correct / (correct + uncorrect)

    return accuracy




def neural_network_evaluator(input_layer_to_hidden_layer, hidden_layer_to_output_layer, b1, b2):
    """Function used to: 1/ forwardpropagate the input in a particular neural network
    2/ Generate outputs
    3/ Determine the cost of fitness function for this network"""

    X_train, X_test, Y_train, Y_test = get_mnist()

    #Feedforward for training neural network on training set
    Z1 = np.matmul(input_layer_to_hidden_layer,X_train) + b1
    A1 = sigmoid(Z1)
    Z2 = np.matmul(hidden_layer_to_output_layer,A1) + b2
    A2 = np.exp(Z2) / np.sum(np.exp(Z2), axis=0)

    cost = compute_multiclass_loss(Y_train, A2)



    return cost 

Второй модуль:

"""
Class that holds a genetic algorithm for evolving a network.
Credit:
    A lot of those code was originally inspired by:
    http://lethain.com/genetic-algorithms-cool-name-damn-simple/
"""
from functools import reduce
from operator import add
import numpy as np
import random
import logging
from train_Neuroevolution_ameliored import neural_network_evaluator
from train_Neuroevolution_ameliored import get_mnist

class Optimizer():
    """Class that implements genetic algorithm for MLP optimization.
        Evolving process.
        Cross-over and mutation processes.

        Also used as neural network creator class triggered AFTER evolution process, and used for multiple purposes.
        Neural network object creator.
        Populations creator for both pre and post evolution.
        Average fitness of populations.
        Cost value compiler.
        ..."""

    def __init__(self, retain=0.5, random_select=0.0, mutation_rate=0.5):
        """Create an optimizer.
        Args:
            retain (float): Percentage of population to retain after
                each generation
            random_select (float): Probability of a rejected network
                remaining in the population
            mutation_rate (float): Probability a network will be
                randomly mutated
            ...

        Initialize our network parameters, for network population after first evolution.
        """

        self.mutation_rate = mutation_rate
        self.random_select = random_select
        self.retain = retain
        self.accuracy = 0.
        self.network = []
        self.b1 = 0
        self.b1_lines = 64
        self.b2 = 0
        self.b2_lines = 10
        self.input_layer_to_hidden_layer = 0
        self.input_layer_to_hidden_layer_shape_lines = 64
        self.input_layer_to_hidden_layer_shape_columns = 784
        self.hidden_layer_to_output_layer = 0
        self.hidden_layer_to_output_layer_shape_lines = 10
        self.hidden_layer_to_output_layer_shape_columns = 64





    def create_neural_network(self, dataset):
        """ Randomly set parameters for a neural network object with a fixed structure.
            Structure is: - one input layer with 784 inputs corresponding to each pixel of the MNIST dataset
                            - one hidden layer with 64 neurons (arbitrary value)
                            - one output layer with 10 neurons determining each of the 10 digits probability"""

        #We can ignore Cifar10 as it is not build for it for the moment
        """if dataset == 'cifar10':
            nb_classes, batch_size, input_shape, x_train, \
                x_test, y_train, y_test = get_cifar10()"""


        if dataset == 'mnist':
            X_train = get_mnist() [0]

        n_x = X_train.shape[0]
        n_h = 64

        self.input_layer_to_hidden_layer = np.random.randn(n_h, n_x) #Weights from input to hidden layer
        self.input_layer_to_hidden_layer_shape_lines = self.input_layer_to_hidden_layer.shape[0] #Used for weights mutation
        self.input_layer_to_hidden_layer_shape_columns = self.input_layer_to_hidden_layer.shape[1] #Used for weights mutation
        self.b1 = np.zeros((n_h, 1)) #Biases for hidden layer
        self.b1_lines = self.b1.shape[0] #Used for biases mutation

        self.hidden_layer_to_output_layer = np.random.randn(10, n_h) #Weights from hidden to output layer
        self.hidden_layer_to_output_layer_shape_lines = self.hidden_layer_to_output_layer.shape[0] #Used for weights mutation
        self.hidden_layer_to_output_layer_shape_columns = self.hidden_layer_to_output_layer.shape[1] #Used for weights mutation
        self.b2 = np.zeros((10, 1)) #Biases for output layer
        self.b2_lines = self.b2.shape[0] #Used for biases mutation

        self.network = [[self.input_layer_to_hidden_layer], [self.hidden_layer_to_output_layer], [self.b1], [self.b2]] #Network structure with in


    def fitness(self, network):
        """Return accuracy, which is our fitness function value after the first evolution."""

        return network.accuracy


    def create_set(self, network):
        """Set network properties.
        Args:
            network (list): The network parameters
            Used in the mutation process after the first evolution
        """

        self.network = network



    def grade(self, pop):
        """Find average fitness for a population.
        Args:
            pop (list): The population of networks
        Returns:
            (float): The average accuracy of the population
        """
        summed = reduce(add, (self.fitness(network) for network in pop))
        return summed / float((len(pop)))


    def breed(self, mother, father):
        """Make one child as part as their parents.
        Args:
            mother (list): Optimizer() object parameters
            father (lit): Optimizer() object parameters
        Returns:
            (list): One network object as an Optimizer() object
        """



        child = [0,0,0,0]


        # Loop through the parameters and pick params for the kid.

        child[0] = random.choice([mother.input_layer_to_hidden_layer, father.input_layer_to_hidden_layer])

        child[1] = random.choice([mother.hidden_layer_to_output_layer, father.hidden_layer_to_output_layer])

        child[2] = random.choice([mother.b1, father.b1])

        child[3] = random.choice([mother.b2, father.b2])


        #Create a network object and assign child[list] values to it
        network = Optimizer()
        network.create_set(child)
        network.input_layer_to_hidden_layer = child[0]
        network.hidden_layer_to_output_layer = child[1]
        network.b1 = child[2]
        network.b2 = child[3]


        #Mutate
        if self.mutation_rate > random.random():
            network.input_layer_to_hidden_layer = self.mutate(self.input_layer_to_hidden_layer_shape_lines, self.input_layer_to_hidden_layer_shape_columns, network.input_layer_to_hidden_layer)
            network.hidden_layer_to_output_layer = self.mutate(self.hidden_layer_to_output_layer_shape_lines, self.hidden_layer_to_output_layer_shape_columns, network.hidden_layer_to_output_layer)
            network.b1 = self.mutate_biases(self.b1_lines, network.b1)
            network.b2 = self.mutate_biases(self.b2_lines, network.b2)





        return network



    def mutate(self, shape_array_lines, shape_array_columns, weights_array):
        """Two ways of operating mutation on weights.
            Mutate every single weight by multiplying each weight by a random number.
            Mutate an arbitrary random number of weights (e.g., from 1 to 100) by multiplying each mutated weight by a random number.
            The second technique does not seem to work for an undetermined reason"""


        #First technique
        mutation_weights = np.random.random((shape_array_lines, shape_array_columns))

        return mutation_weights*weights_array




    def mutate_biases(self, shape_array_columns, biases_array):
        """Mutate biases using the same technique as the second thechnique used for weights"""



        random_number_mutated_biases = np.random.randint(low = 1, high = 100)

        list_random_indices_lines = np.random.randint(low = 0, high = shape_array_columns, size = (random_number_mutated_biases))

        d = 0

        for _ in range(random_number_mutated_biases):

            i = np.random.uniform(low=-1, high=+1.1)#random number (with arbitrary values between range)
                                                        #used for multiplying mutated weight

            #Call a particular bias by calling it by its indices, and modify it by adding i
            biases_array[list_random_indices_lines[d]][0] = biases_array[list_random_indices_lines[d]][0] + i

            d +=1

        return biases_array





    def evolve(self, pop):
        """Evolve a population of networks.
        Args:
            pop (list): A list of network parameters
            x: a Differentiatior between first phase before first evolution and second phase after first evolution
        Returns:
            (list): The evolved population of networks
        """


        # Get scores for each network.
        graded = [(network.fitness(network), network) for network in pop]




        for network in pop:
            print("accuracy before =", network.fitness(network))

        # Sort on the scores.
        graded = [x[1] for x in sorted(graded, key=lambda x: x[0], reverse=False)]



        # Get the number we want to keep for the next gen.
        retain_length = int(len(graded)*self.retain)

        # The parents are every network we want to keep.
        parents = graded[:retain_length]


        # For those we aren't keeping, randomly keep some anyway.
        for individual in graded[retain_length:]:
            if self.random_select > random.random():
                parents.append(individual)

        # Now find out how many spots we have left to fill.
        parents_length = len(parents)
        desired_length = len(pop) - parents_length
        children = []

        # Add children, which are bred from two remaining networks.
        while len(children) < desired_length:

            # Get a random mom and dad.
            male = random.randint(0, parents_length-1)
            female = random.randint(0, parents_length-1)

            # Assuming they aren't the same network...
            if male != female:
                male = parents[male]
                female = parents[female]

                # Breed them.
                baby = self.breed(male, female)

                # Add the children one at a time.
                if len(children) < desired_length:
                        children.append(baby)



        parents.extend(children)



        total_nbr_values = 0
        for i in pop:
            for j in parents:
                if i == j:
                    total_nbr_values +=1
                    print("same value")

        print("total =", total_nbr_values)


        for network in parents:
            print("accuracy after =", network.fitness(network))




        return parents





    def create_population(self, count, dataset):
        """Create a population of random networks.
        Args:
            count (int): Number of networks to generate, aka the
                size of the population
            dataset (string): dataset used for the experiment
        Returns:
            (list): Population of network objects
        """
        pop = []

        for _ in range(0, count):

            # Create a random network.
            network = Optimizer()
            network.create_neural_network(dataset)
            # Add the network to our population.
            pop.append(network)



        return pop

    def evaluate_neural_network(self):
        """ Get result of the chosen fitness function as an Optimizer() object.
            Accuracy is just a name and does not necessarily mean the actual accuracy."""

        self.accuracy = neural_network_evaluator(self.input_layer_to_hidden_layer,
                            self.hidden_layer_to_output_layer, self.b1, self.b2)

        print(self.accuracy) #Display network cost value.


    def print_network(self):
        """Print out a network and its cost value in the 'log.txt' file."""
        logging.info(self.network)
        logging.info("Network accuracy: %.2f%%" % (self.accuracy)) 

Третий модуль:

"""Entry point to evolving the neural network. Start here."""
import logging
from optimizer_Neuroevolution_ameliored import Optimizer
from tqdm import tqdm

# Setup logging.
logging.basicConfig(
    format='%(asctime)s - %(levelname)s - %(message)s',
    datefmt='%m/%d/%Y %I:%M:%S %p',
    level=logging.DEBUG,
    filename='log.txt'
)


def train_networks(networks):
    """Train each network.
    Args:
        networks (list): Current population of networks
    """

    pbar = tqdm(total=len(networks))
    for network in networks:
        network.evaluate_neural_network()
        pbar.update(1)
    pbar.close()



def get_average_accuracy(networks):
    """Get the average cost value for a group of networks.
    Args:
        networks (list): List of networks
    Returns:
        float: The average cost value of a population of networks.
    """

    total_accuracy = 0
    for network in networks:
        total_accuracy += network.accuracy

    return total_accuracy / len(networks)



def generate(generations, population, dataset):
    """Generate a network with the genetic algorithm.
    Args:
        generations (int): Number of times to evole the population
        population (int): Number of networks in each generation
        dataset (str): Dataset to use for training/evaluating
    """

    #Create an initial population of random networks
    optimizer = Optimizer()
    networks = optimizer.create_population(population, dataset)

    #Train them
    train_networks(networks)

    #Print out generation number
    logging.info("***Doing generation %d of %d***" %
                     (1, generations))

    print("generation", 1)

        # Get the average cost value for this generation.
    average_accuracy = get_average_accuracy(networks)

        # Print out the average cost value for this generation.
    logging.info("Generation average: %.2f%%" % (average_accuracy))
    logging.info('-'*80)

    # Evolve the first generation.
    networks = optimizer.evolve(networks)
    train_networks(networks)

    #Print out generation number
    logging.info("***Doing generation %d of %d***" %
                     (2, generations))


    # Get the average cost value for this generation.
    average_accuracy = get_average_accuracy(networks)

    # Print out the average cost value for this generation.
    logging.info("Generation average: %.2f%%" % (average_accuracy))
    logging.info('-'*80)


    # Evolve, except on the last iteration.
    for i in range(generations-2):

        print("generation", i+2)

        print("Before evolving process")

        print("values of weights from input layer to hidden layer are:")

        print(networks[0].input_layer_to_hidden_layer[0][0])
        print(networks[0].input_layer_to_hidden_layer[0][2])
        print(networks[0].input_layer_to_hidden_layer[5][2])
        print(networks[0].input_layer_to_hidden_layer[5][8])

        print("values of weights from hidden layer to output layer are:")

        print(networks[0].hidden_layer_to_output_layer[0][0])
        print(networks[0].hidden_layer_to_output_layer[3][6])
        print(networks[0].hidden_layer_to_output_layer[1][56])
        print(networks[0].hidden_layer_to_output_layer[7][23])

        networks = optimizer.evolve(networks)
        #networks = optimizer.create_population(population, networks, x)

        print("After evolving process")

        print("values of weights from input layer to hidden layer are:")

        print(networks[0].input_layer_to_hidden_layer[0][0])
        print(networks[0].input_layer_to_hidden_layer[0][2])
        print(networks[0].input_layer_to_hidden_layer[5][2])
        print(networks[0].input_layer_to_hidden_layer[5][8])

        print("values of weights from hidden layer to output layer are:")

        print(networks[0].hidden_layer_to_output_layer[0][0])
        print(networks[0].hidden_layer_to_output_layer[3][6])
        print(networks[0].hidden_layer_to_output_layer[1][56])
        print(networks[0].hidden_layer_to_output_layer[7][23])





        train_networks(networks)

        print("Novelty accuracy is:")

        for network in networks:
            network.evaluate_neural_network()

        print("Novelty Novelty accuracy is:")

        for network in networks:
            network.evaluate_neural_network()

        #Print out generation number
        logging.info("***Doing generation %d of %d***" %
                     (i + 3, generations))

        # Get the average cost value for generations starting from third generation
        average_accuracy_pop = get_average_accuracy(networks)

        # Print out the average cost value for generations starting from third generation
        logging.info("Generation average: %.2f%%" % (average_accuracy))
        logging.info('-'*80)

    # Sort our final population of networks aka the last generation.
    networks = sorted(networks, key=lambda x: x.accuracy, reverse=False)

    # Print out the top 5 networks of the last generation.
    print_networks(networks[:5])


def print_networks(networks):
    """Print a list of networks.
    Args:
        networks (list): The population of networks
    """

    logging.info('-'*80)
    for network in networks:
        network.print_network()


def main():
    """Evolve a network."""
    generations = 30  # Number of times to evole the population.
    population = 10  # Number of networks in each generation.
    dataset = 'mnist' # Dataset

    # Print out the number of generations and number of individuals chosen
    logging.info("***Evolving %d generations with population %d***" %
                 (generations, population))

    generate(generations, population, dataset)

if __name__ == '__main__':
    main() 

Таким образом, я ожидал, что выходные данные значений потерь для тех же сетевых объектов будут одинаковыми от поколения к другому, но при пересчете значения потерь с помощью функции neural_network_evaluator () он отображает совершенно разные значения.

На самом деле значения потерь десяти сетей от населения от одного поколения к другому должны только уменьшаться или оставаться наТо же самое, но они растут, чего я не понимаю.

Спасибо за помощь.

1 Ответ

0 голосов
/ 26 марта 2019

Каждый раз, когда вы вызываете neural_network_evaluator () из вашего первого модуля, он использует новый обучающий набор, сгенерированный из get_mnist (), который рисуется случайным образом.

#Shuffle train set to randomize it as it is organized from digits 0 to 9
shuffle_index = np.random.permutation(m)
X_train, Y_train = X_train[:, shuffle_index], Y_train[:, shuffle_index]

Если вы хотите, чтобы оценка возвращала то же самоерезультат каждый раз, когда вы вызываете его, вам нужно сравнивать его с одним и тем же обучающим набором, а не рисовать новый.

(Отказ от ответственности: я не слышал об использовании эволюционных алгоритмов в нейронных сетях, поэтому я могу 'вообще не комментируйте, насколько хорошо это работает, или как выбрать для него тренировочные наборы.)

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...