невозможно перетасовать строки матрицы - PullRequest
1 голос
/ 16 апреля 2019

Я кодирую простую нейронную сеть с нуля.Нейронная сеть реализована в методе def simple_1_layer_classification_NN, который принимает входную матрицу, выходные метки и другие параметры.Перед циклом каждой Эпохи я хотел перетасовать входную матрицу только ее строками (то есть наблюдениями), просто как один из способов избежать перетекания.Я пытался random.shuffle(dataset_input_matrix).Произошли две странные вещи.Я сделал снимок матрицы до и после шага перемешивания (используя приведенный ниже код с точками останова, чтобы увидеть значение матрицы до и после, ожидая, что она будет перемешиваться).Таким образом, matrix_input должно давать значение матрицы перед тасованием, а matrix_input1 должно давать значение после, т. Е. Тасованной матрицы.


input_matrix = dataset_input_matrix
# shuffle our matrix observation samples, to decrease the chance of overfitting
random.shuffle(dataset_input_matrix)
input_matrix1 = dataset_input_matrix

Когда я печатал оба значения, я получал одинаковую матрицу без изменений.

ipdb> input_matrix
array([[3. , 1.5],
       [3. , 1.5],
       [2. , 1. ],
       [3. , 1.5],
       [3. , 1.5],
       [3. , 1. ]])

ipdb> input_matrix1
array([[3. , 1.5],
       [3. , 1.5],
       [2. , 1. ],
       [3. , 1.5],
       [3. , 1.5],
       [3. , 1. ]])

ipdb> 

Не уверен, что я здесь что-то не так делаю.Вторая странная вещь - когда я запускал нейронную сеть (после случайного воспроизведения), ее точность резко упала.До того, как я получил точность в диапазоне от 60% до 95% (с очень небольшим количеством 50%).

После выполнения шага шага для входной матрицы я едва достигал точности выше 50%, независимо от того, сколькораз я запускаю модель.Что странно, учитывая, что кажется, что перемешивание даже не сработало, рассматривая его с контрольными точками.И вообще, почему сетевая точность должна сильно падать.Если я не делаю перетасовку совершенно неправильно.

Итак, 2 вопроса:

1- Как перетасовать только строки матрицы (так как мне нужно только рандомизировать наблюдения (строки),не особенности (столбцы) набора данных).

2- Во-вторых, почему, когда я сделал случайное перемешивание, он настолько снизил точность, что нейронная сеть не может получить что-либо выше 50%.В конце концов, что-то рекомендуется перетасовывать данные в качестве шага предварительной обработки, чтобы избежать чрезмерной подгонки.

Пожалуйста, обратитесь к полному коду ниже, и приносим извинения за большую часть кода.

Большое спасибо заранее за любую помощь.

# --- neural network structure diagram --- 

    #    O  output prediction
    #   / \   w1, w2, b
    #  O   O  datapoint 1, datapoint 2

    def simple_1_layer_classification_NN(self, dataset_input_matrix, output_data_labels, input_dimension, epochs, activation_func='sigmoid', learning_rate=0.2, cost_func='squared_error'):
        weights = []
        bias = int()
        cost = float()
        costs = []
        dCost_dWeights = []
        chosen_activation_func_derivation = None
        chosen_cost_func = None
        chosen_cost_func_derivation = None
        correct_pred = int()
        incorrect_pred = int()

        # store the chosen activation function to use to it later on in the activation calculation section and in the 'predict' method
        # Also the same goes for the derivation section.        
        if activation_func == 'sigmoid':
            self.chosen_activation_func = NN_classification.sigmoid
            chosen_activation_func_derivation = NN_classification.sigmoid_derivation
        elif activation_func == 'relu':
            self.chosen_activation_func = NN_classification.relu
            chosen_activation_func_derivation = NN_classification.relu_derivation
        else:
            print("Exception error - no activation function utilised, in training method", file=sys.stderr)
            return   

        # store the chosen cost function to use to it later on in the cost calculation section.
        # Also the same goes for the cost derivation section.    
        if cost_func == 'squared_error':
            chosen_cost_func = NN_classification.squared_error
            chosen_cost_func_derivation = NN_classification.squared_error_derivation
        else:
           print("Exception error - no cost function utilised, in training method", file=sys.stderr)
           return

        # Set initial network parameters (weights & bias):
        # Will initialise the weights to a uniform distribution and ensure the numbers are small close to 0.
        # We need to loop through all the weights to set them to a random value initially.
        for i in range(input_dimension):
            # create random numbers for our initial weights (connections) to begin with. 'rand' method creates small random numbers. 
            w = np.random.rand()
            weights.append(w)

        # create a random number for our initial bias to begin with.
        bias = np.random.rand()

        '''
        I tried adding the shuffle step, where the matrix is shuffled only in terms of its observations (i.e. rows)
        but this has dropped the accuracy dramaticaly, to the point where the 50% range was the best the model can achieve.
        '''
        input_matrix = dataset_input_matrix
        # shuffle our matrix observation samples, to decrease the chance of overfitting
        random.shuffle(dataset_input_matrix)
        input_matrix1 = dataset_input_matrix

        # We perform the training based on the number of epochs specified
        for i in range(epochs):

            #reset average accuracy with every epoch
            self.train_average_accuracy = 0

            for ri in range(len(dataset_input_matrix)): 

                # reset weighted sum value at the beginning of every epoch to avoid incrementing the previous observations weighted-sums on top. 
                weighted_sum = 0

                input_observation_vector = dataset_input_matrix[ri]
                # Loop through all the independent variables (x) in the observation
                for x in range(len(input_observation_vector)):
                    # Weighted_sum: we take each independent variable in the entire observation, add weight to it then add it to the subtotal of weighted sum
                    weighted_sum += input_observation_vector[x] * weights[x]

                # Add Bias: add bias to weighted sum
                weighted_sum += bias

                # Activation: process weighted_sum through activation function
                activation_func_output = self.chosen_activation_func(weighted_sum)    

                # Prediction: Because this is a single layer neural network, so the activation output will be the same as the prediction
                pred = activation_func_output

                # Cost: the cost function to calculate the prediction error margin
                cost = chosen_cost_func(pred, output_data_labels[ri])
                # Also calculate the derivative of the cost function with respect to prediction
                dCost_dPred = chosen_cost_func_derivation(pred, output_data_labels[ri])

                # Derivative: bringing derivative from prediction output with respect to the activation function used for the weighted sum.
                dPred_dWeightSum = chosen_activation_func_derivation(weighted_sum)

                # Bias is just a number on its own added to the weighted sum, so its derivative is just 1
                dWeightSum_dB = 1

                # The derivative of the Weighted Sum with respect to each weight is the input data point / independant variable it's multiplied by. 
                # Therefore I simply assigned the input data array to another variable I called 'dWeightedSum_dWeights'
                # to represent the array of the derivative of all the weights involved. I could've used the 'input_sample'
                # array variable itself, but for the sake of readibility, I created a separate variable to represent the derivative of each of the weights.
                dWeightedSum_dWeights = input_observation_vector

                # Derivative chaining rule: chaining all the derivative functions together (chaining rule)
                # Loop through all the weights to workout the derivative of the cost with respect to each weight:
                for dWeightedSum_dWeight in dWeightedSum_dWeights:
                    dCost_dWeight = dCost_dPred * dPred_dWeightSum * dWeightedSum_dWeight
                    dCost_dWeights.append(dCost_dWeight)

                dCost_dB = dCost_dPred * dPred_dWeightSum * dWeightSum_dB

                # Backpropagation: update the weights and bias according to the derivatives calculated above.
                # In other word we update the parameters of the neural network to correct parameters and therefore 
                # optimise the neural network prediction to be as accurate to the real output as possible
                # We loop through each weight and update it with its derivative with respect to the cost error function value. 
                for ind in range(len(weights)):
                    weights[ind] = weights[ind] - learning_rate * dCost_dWeights[ind]

                bias = bias - learning_rate * dCost_dB

                # Compare prediction to target
                error_margin = np.sqrt(np.square(pred - output_data_labels[ri]))
                accuracy = (1 - error_margin) * 100
                self.train_average_accuracy += round(accuracy)

                # Evaluate whether guessed correctly or not based on classification binary problem 0 or 1 outcome. So if prediction is above 0.5 it guessed 1 and below 0.5 it guessed incorrectly. If it's dead on 0.5 it is incorrect for either guesses. Because it's no exactly a good guess for either 0 or 1. We need to set a good standard for the neural net model.
                if (error_margin < 0.5) and (error_margin >= 0):
                    correct_pred += 1 
                elif (error_margin >= 0.5) and (error_margin <= 1):
                    incorrect_pred += 1
                else:
                    print("Exception error - 'margin error' for 'predict' method is out of range. Must be between 0 and 1, in training method", file=sys.stderr)
                    return

                costs.append(cost)

            # Calculate average accuracy from the predictions of all obervations in the training dataset
            self.train_average_accuracy = round(self.train_average_accuracy / len(dataset_input_matrix), 1)



        # store the final optimised weights to the weights instance variable so it can be used in the predict method.
        self.weights = weights

        # store the final optimised bias to the weights instance variable so it can be used in the predict method.
        self.bias = bias


        # Print out results 
        print('Average Accuracy: {}'.format(self.train_average_accuracy))
        print('Correct predictions: {}, Incorrect Predictions: {}'.format(correct_pred, incorrect_pred))


from numpy import array
#define array of dataset
# each observation vector has 3 datapoints or 3 columns: length, width, and outcome label (0, 1 to represent blue flower and red flower respectively).  
data = array([[3,   1.5, 1],
        [2,   1,   0],
        [4,   1.5, 1],
        [3,   1,   0],
        [3.5, 0.5, 1],
        [2,   0.5, 0],
        [5.5, 1,   1],
        [1,   1,   0]])

# separate data: split input, output, train and test data.
X_train, y_train, X_test, y_test = data[:6, :-1], data[:6, -1], data[6:, :-1], data[6:, -1]

nn_model = NN_classification()

nn_model.simple_1_layer_classification_NN(X_train, y_train, 2, 10000, learning_rate=0.2)

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...