Моя функция Q-Learning находит оптимальную последовательность из одного места в другое. Но я хотел бы создать действительные последовательности. Скажите, что это мои последовательности:
['Hi,there,how,have,you,been,noreply,?',
'Hi,there,where,have,you,been,noreply,?',
'Hi,there,who,have,you,been,reply,?',
'Hi,there,yes,have,you,been,reply,?']
Моя функция:
def get_optimal_route(start_location,end_location):
rewards_new = np.copy(rewards)
ending_state = location_to_state[end_location]
rewards_new[ending_state,ending_state] = 999
Q = np.array(np.zeros([12,12]))
for i in range(1000):
current_state = np.random.randint(0,12) # Python excludes the upper bound
playable_actions = []
for j in range(12):
if rewards_new[current_state,j] > 0:
playable_actions.append(j)
next_state = np.random.choice(playable_actions)
TD = rewards_new[current_state,next_state] + gamma * Q[next_state, np.argmax(Q[next_state,])] - Q[current_state,next_state]
Q[current_state,next_state] += alpha * TD
route = [start_location]
next_location = start_location
while(next_location != end_location):
starting_state = location_to_state[start_location]
next_state = np.argmax(Q[starting_state,])
next_location = state_to_location[next_state]
route.append(next_location)
start_location = next_location
return route
Скажите, если два передаваемых мной параметра - «Привет» и «?», Я бы хотел сгенерировать все 4 из последовательностей.
This is my rewards matrix:
array([[0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[1., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0., 1.],
[0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 1., 0., 0., 0., 1., 1., 0., 1.],
[0., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 0.],
[0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0.],
[0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.]])
based on this tokenized sequence:
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 0, 1, 8, 3, 4, 5, 6, 7],
[ 0, 1, 9, 3, 4, 5, 10, 7],
[ 0, 1, 11, 3, 4, 5, 10, 7]])