Учитывая простую задачу ИЛИ:
or_input = np.array([[0,0], [0,1], [1,0], [1,1]])
or_output = np.array([[0,1,1,1]]).T
Если мы обучаем простой однослойный персептрон (без обратного распространения), мы можем сделать что-то вроде этого:
import numpy as np
np.random.seed(0)
def sigmoid(x): # Returns values that sums to one.
return 1 / (1 + np.exp(-x))
def cost(predicted, truth):
return (truth - predicted)**2
or_input = np.array([[0,0], [0,1], [1,0], [1,1]])
or_output = np.array([[0,1,1,1]]).T
# Define the shape of the weight vector.
num_data, input_dim = or_input.shape
# Define the shape of the output vector.
output_dim = len(or_output.T)
num_epochs = 50 # No. of times to iterate.
learning_rate = 0.03 # How large a step to take per iteration.
# Lets standardize and call our inputs X and outputs Y
X = or_input
Y = or_output
W = np.random.random((input_dim, output_dim))
for _ in range(num_epochs):
layer0 = X
# Forward propagation.
# Inside the perceptron, Step 2.
layer1 = sigmoid(np.dot(X, W))
# How much did we miss in the predictions?
cost_error = cost(layer1, Y)
# update weights
W += - learning_rate * np.dot(layer0.T, cost_error)
# Expected output.
print(Y.tolist())
# On the training data
print([[int(prediction > 0.5)] for prediction in layer1])
[вне]:
[[0], [1], [1], [1]]
[[0], [1], [1], [1]]
При обратном распространении для вычисления d(cost)/d(X)
, правильны ли следующие шаги?
вычислить ошибку layer1, умножив ошибку стоимости и производные стоимости
, затем вычислите дельту layer1, умножив ошибку layer 1 и производные сигмоиды
затем произведите скалярное произведение между входами и дельтой layer1, чтобы получить дифференциал, т. Е. d(cost)/d(X)
Затем d(cost)/d(X)
умножается на отрицательное значение скорости обучения для выполнения градиентного спуска.
num_epochs = 0 # No. of times to iterate.
learning_rate = 0.03 # How large a step to take per iteration.
# Lets standardize and call our inputs X and outputs Y
X = or_input
Y = or_output
W = np.random.random((input_dim, output_dim))
for _ in range(num_epochs):
layer0 = X
# Forward propagation.
# Inside the perceptron, Step 2.
layer1 = sigmoid(np.dot(X, W))
# How much did we miss in the predictions?
cost_error = cost(layer1, Y)
# Back propagation.
# multiply how much we missed from the gradient/slope of the cost for our prediction.
layer1_error = cost_error * cost_derivative(cost_error)
# multiply how much we missed by the gradient/slope of the sigmoid at the values in layer1
layer1_delta = layer1_error * sigmoid_derivative(layer1)
# update weights
W += - learning_rate * np.dot(layer0.T, layer1_delta)
В этом случае должна выглядеть реализация ниже с cost_derivative
и sigmoid_derivative
?
import numpy as np
np.random.seed(0)
def sigmoid(x): # Returns values that sums to one.
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(sx):
# See https://math.stackexchange.com/a/1225116
return sx * (1 - sx)
def cost(predicted, truth):
return (truth - predicted)**2
def cost_derivative(y):
# If the cost is:
# cost = y - y_hat
# What's the derivative of d(cost)/d(y)
# d(cost)/d(y) = 1
return 2*y
or_input = np.array([[0,0], [0,1], [1,0], [1,1]])
or_output = np.array([[0,1,1,1]]).T
# Define the shape of the weight vector.
num_data, input_dim = or_input.shape
# Define the shape of the output vector.
output_dim = len(or_output.T)
num_epochs = 5 # No. of times to iterate.
learning_rate = 0.03 # How large a step to take per iteration.
# Lets standardize and call our inputs X and outputs Y
X = or_input
Y = or_output
W = np.random.random((input_dim, output_dim))
for _ in range(num_epochs):
layer0 = X
# Forward propagation.
# Inside the perceptron, Step 2.
layer1 = sigmoid(np.dot(X, W))
# How much did we miss in the predictions?
cost_error = cost(layer1, Y)
# Back propagation.
# multiply how much we missed from the gradient/slope of the cost for our prediction.
layer1_error = cost_error * cost_derivative(cost_error)
# multiply how much we missed by the gradient/slope of the sigmoid at the values in layer1
layer1_delta = layer1_error * sigmoid_derivative(layer1)
# update weights
W += - learning_rate * np.dot(layer0.T, layer1_delta)
# Expected output.
print(Y.tolist())
# On the training data
print([[int(prediction > 0.5)] for prediction in layer1])
[выход]:
[[0], [1], [1], [1]]
[[0], [1], [1], [1]]
Кстати, с учетом случайных входных начальных чисел, даже без W
и градиентного спуска или персептрона, прогноз может быть еще верным:
import numpy as np
np.random.seed(0)
# Lets standardize and call our inputs X and outputs Y
X = or_input
Y = or_output
W = np.random.random((input_dim, output_dim))
# On the training data
predictions = sigmoid(np.dot(X, W))
[[int(prediction > 0.5)] for prediction in predictions]