Правильное обратное распространение в простом персептроне - PullRequest
7 голосов
/ 10 мая 2019

Учитывая простую задачу ИЛИ:

or_input = np.array([[0,0], [0,1], [1,0], [1,1]])
or_output = np.array([[0,1,1,1]]).T

Если мы обучаем простой однослойный персептрон (без обратного распространения), мы можем сделать что-то вроде этого:

import numpy as np
np.random.seed(0)

def sigmoid(x): # Returns values that sums to one.
    return 1 / (1 + np.exp(-x))

def cost(predicted, truth):
    return (truth - predicted)**2

or_input = np.array([[0,0], [0,1], [1,0], [1,1]])
or_output = np.array([[0,1,1,1]]).T

# Define the shape of the weight vector.
num_data, input_dim = or_input.shape
# Define the shape of the output vector. 
output_dim = len(or_output.T)

num_epochs = 50 # No. of times to iterate.
learning_rate = 0.03 # How large a step to take per iteration.

# Lets standardize and call our inputs X and outputs Y
X = or_input
Y = or_output
W = np.random.random((input_dim, output_dim))

for _ in range(num_epochs):
    layer0 = X
    # Forward propagation.
    # Inside the perceptron, Step 2. 
    layer1 = sigmoid(np.dot(X, W))

    # How much did we miss in the predictions?
    cost_error = cost(layer1, Y)

    # update weights
    W +=  - learning_rate * np.dot(layer0.T, cost_error)

# Expected output.
print(Y.tolist())
# On the training data
print([[int(prediction > 0.5)] for prediction in layer1])

[вне]:

[[0], [1], [1], [1]]
[[0], [1], [1], [1]]

При обратном распространении для вычисления d(cost)/d(X), правильны ли следующие шаги?

  • вычислить ошибку layer1, умножив ошибку стоимости и производные стоимости

  • , затем вычислите дельту layer1, умножив ошибку layer 1 и производные сигмоиды

  • затем произведите скалярное произведение между входами и дельтой layer1, чтобы получить дифференциал, т. Е. d(cost)/d(X)

Затем d(cost)/d(X) умножается на отрицательное значение скорости обучения для выполнения градиентного спуска.

num_epochs = 0 # No. of times to iterate.
learning_rate = 0.03 # How large a step to take per iteration.

# Lets standardize and call our inputs X and outputs Y
X = or_input
Y = or_output
W = np.random.random((input_dim, output_dim))

for _ in range(num_epochs):
    layer0 = X
    # Forward propagation.
    # Inside the perceptron, Step 2. 
    layer1 = sigmoid(np.dot(X, W))

    # How much did we miss in the predictions?
    cost_error = cost(layer1, Y)

    # Back propagation.
    # multiply how much we missed from the gradient/slope of the cost for our prediction.
    layer1_error = cost_error * cost_derivative(cost_error)

    # multiply how much we missed by the gradient/slope of the sigmoid at the values in layer1
    layer1_delta = layer1_error * sigmoid_derivative(layer1)

    # update weights
    W +=  - learning_rate * np.dot(layer0.T, layer1_delta)

В этом случае должна выглядеть реализация ниже с cost_derivative и sigmoid_derivative?

import numpy as np
np.random.seed(0)

def sigmoid(x): # Returns values that sums to one.
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(sx):
    # See https://math.stackexchange.com/a/1225116
    return sx * (1 - sx)

def cost(predicted, truth):
    return (truth - predicted)**2

def cost_derivative(y):
    # If the cost is:
    # cost = y - y_hat
    # What's the derivative of d(cost)/d(y)
    # d(cost)/d(y) = 1
    return 2*y


or_input = np.array([[0,0], [0,1], [1,0], [1,1]])
or_output = np.array([[0,1,1,1]]).T

# Define the shape of the weight vector.
num_data, input_dim = or_input.shape
# Define the shape of the output vector. 
output_dim = len(or_output.T)

num_epochs = 5 # No. of times to iterate.
learning_rate = 0.03 # How large a step to take per iteration.

# Lets standardize and call our inputs X and outputs Y
X = or_input
Y = or_output
W = np.random.random((input_dim, output_dim))

for _ in range(num_epochs):
    layer0 = X
    # Forward propagation.
    # Inside the perceptron, Step 2. 
    layer1 = sigmoid(np.dot(X, W))

    # How much did we miss in the predictions?
    cost_error = cost(layer1, Y)

    # Back propagation.
    # multiply how much we missed from the gradient/slope of the cost for our prediction.
    layer1_error = cost_error * cost_derivative(cost_error)

    # multiply how much we missed by the gradient/slope of the sigmoid at the values in layer1
    layer1_delta = layer1_error * sigmoid_derivative(layer1)

    # update weights
    W +=  - learning_rate * np.dot(layer0.T, layer1_delta)

# Expected output.
print(Y.tolist())
# On the training data
print([[int(prediction > 0.5)] for prediction in layer1])

[выход]:

[[0], [1], [1], [1]]
[[0], [1], [1], [1]]

Кстати, с учетом случайных входных начальных чисел, даже без W и градиентного спуска или персептрона, прогноз может быть еще верным:

import numpy as np
np.random.seed(0)

# Lets standardize and call our inputs X and outputs Y
X = or_input
Y = or_output
W = np.random.random((input_dim, output_dim))

# On the training data
predictions = sigmoid(np.dot(X, W))
[[int(prediction > 0.5)] for prediction in predictions]
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...