Question

Я сталкиваюсь с проблемой, которую раньше не видел.Я работаю в Байесовском машинном обучении и поэтому широко использую дистрибутивы в PyTorch.Одна общая вещь, которую нужно сделать, - это определить некоторые параметры распределений в терминах журнала их параметров, чтобы при оптимизации они не могли стать отрицательными (например, стандартное отклонение нормального распределения).

В порядкеоднако, чтобы быть независимым от распределения, я не хочу вручную пересчитывать преобразование этого параметра.Для демонстрации на примере:

Следующий код НЕ будет работать.После первого обратного прохода часть графика, по которой вычисляется экспонента параметра, автоматически удаляется и не добавляется повторно.

import torch
import torch.nn as nn
import torch.distributions as dd

log_std = nn.Parameter(torch.Tensor([1])) # Define the log of the parameter as an nn.Parameter, this is what we want to optimise
std = torch.exp(log_std) # Define the transformation we want to apply to the parameter to using it in the distribution
mean = nn.Parameter(torch.Tensor([1])) # A normal parameter
dist = dd.Normal(loc=mean, scale=std) # Define the distribution. From here I want to ONLY refer to this, not the other variables

optim = torch.optim.SGD([log_std, mean], lr=0.01) # Standard optimiser
target = dd.Normal(5,5) # Target distribution to match

for i in range(50):
    optim.zero_grad()

    samples = dist.rsample((1000,)) # Sample our model, note no reference to log_std

    cost = -(target.log_prob(samples) - dist.log_prob(samples)).sum() # KLdivergence cost metric
    cost.backward()
    optim.step()
    print(i)
    print(log_std, mean, cost)
    print()

Будет запущен следующий набор кода, но я должен явно указатьпараметр log_std в цикле и воссоздать распределение.Если бы я захотел изменить тип распределения, это было бы невозможно без учета конкретного случая.

import torch
import torch.nn as nn
import torch.distributions as dd

log_std = nn.Parameter(torch.Tensor([1])) # Define the log of the parameter as an nn.Parameter, this is what we want to optimise
mean = nn.Parameter(torch.Tensor([1])) # A normal parameter

optim = torch.optim.SGD([log_std, mean], lr=0.001) # Standard optimiser
target = dd.Normal(5,5) # Target distribution to match

for i in range(50):
    optim.zero_grad()

    std = torch.exp(log_std)  # Define the transformation we want to apply to the parameter to using it in the distribution
    dist = dd.Normal(loc=mean, scale=std)  # Define the distribution.

    samples = dist.rsample((1000,)) # Sample our model, note no reference to log_std

    cost = -(target.log_prob(samples) - dist.log_prob(samples)).sum() # KL divergence cost metric
    cost.backward()
    optim.step()
    print(i)
    print(mean, std, cost)
    print()

Первый пример, тем не менее, работает в Tensorflow, поскольку графики там статичны.У кого-нибудь есть идеи, как мне это исправить?Если бы можно было сохранить только ту часть графика, которая определяет отношение std = torch.exp(log_std), то это могло бы сработать.Я также пытался поиграть с обратными градиентными хуками, но, к сожалению, чтобы правильно рассчитать новый градиент, вам нужен доступ к значению параметра и скорости обучения.

Заранее спасибо!Майкл

РЕДАКТИРОВАТЬ

Меня попросили привести пример того, как я могу изменить дистрибутив.Взяв код, который в настоящее время НЕ будет работать, и изменив дистрибутивы на гамма-дистрибутивы:

import torch
import torch.nn as nn
import torch.distributions as dd

log_rate = nn.Parameter(torch.Tensor([1])) # Define the log of the parameter as an nn.Parameter, this is what we want to optimise
rate = torch.exp(log_std) # Define the transformation we want to apply to the parameter to usi it in the distribution
concentration = nn.Parameter(torch.Tensor([1])) # A normal parameter
dist = dd.Gamma(concentration=concentration, rate=std) # Define the distribution. From here I want to ONLY refer to this, not the other variables

optim = torch.optim.SGD([log_rate, concentration], lr=0.01) # Standard optimiser
target = dd.Gamma(5,5) # Target distribution to match

for i in range(50):
    optim.zero_grad()

    samples = dist.rsample((1000,)) # Sample our model, note no reference to log_std

    cost = -(target.log_prob(samples) - dist.log_prob(samples)).sum() # KL divergence cost metric
    cost.backward()
    optim.step()
    print(i)
    print(log_std, mean, cost)
    print()

Однако, глядя на код, который в настоящее время работает:

import torch
import torch.nn as nn
import torch.distributions as dd

log_rate = nn.Parameter(torch.Tensor([1])) # Define the log of the parameter as an nn.Parameter, this is what we want to optimise
mean = nn.Parameter(torch.Tensor([1])) # A normal parameter

optim = torch.optim.SGD([log_rate, concentration], lr=0.001) # Standard optimiser
target = dd.Gamma(5,5) # Target distribution to match

for i in range(50):
    optim.zero_grad()

    rate = torch.exp(log_rate)  # Define the transformation we want to apply to the parameter to usi it in the distribution
    dist = dd.Gamma(concentration=concentration, rate=rate)  # Define the distribution.

    samples = dist.rsample((1000,)) # Sample our model, note no reference to log_std

    cost = -(target.log_prob(samples) - dist.log_prob(samples)).sum() # KL divergence cost metric
    cost.backward()
    optim.step()
    print(i)
    print(mean, std, cost)
    print()

И вы видите, что мы должныизмените код внутри цикла, чтобы алгоритм работал.Это не большая проблема в этом небольшом примере, но это просто демонстрация для гораздо больших алгоритмов, где было бы невероятно полезно не беспокоиться о

iacolippo · Answer 1 · 10 июля 2019

Простое исправление - добавление retain_graph=True к cost.backward().

import torch
import torch.nn as nn
import torch.distributions as dd

log_std = nn.Parameter(torch.Tensor([1])) # Define the log of the parameter as an nn.Parameter, this is what we want to optimise
std = torch.exp(log_std) # Define the transformation we want to apply to the parameter to using it in the distribution
mean = nn.Parameter(torch.Tensor([1])) # A normal parameter
dist = dd.Normal(loc=mean, scale=std) # Define the distribution. From here I want to ONLY refer to this, not the other variables

optim = torch.optim.SGD([log_std, mean], lr=0.01) # Standard optimiser
target = dd.Normal(5,5) # Target distribution to match

for i in range(50):
    optim.zero_grad()

    samples = dist.rsample((1000,)) # Sample our model, note no reference to log_std

    cost = -(target.log_prob(samples) - dist.log_prob(samples)).sum() # KLdivergence cost metric
    cost.backward(retain_graph=True)
    optim.step()
    print(i, log_std, mean, cost)

Вы можете освободить часть графиков с помощью del <variable name>, например, del cost удалит часть графика вcompute cost.

Оптимальным решением было бы отсоединить samples от параметров распределения.К сожалению, я не нашел способа сделать это, .detach(), похоже, не работает при работе с выводами .rsample().

Определение параметров с помощью некоторого преобразования ИЛИ Сохранение подграфов, но не весь граф

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

1 Ответ

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Определение параметров с помощью некоторого преобразования ИЛИ Сохранение подграфов, но не весь граф

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

1 Ответ

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Похожие темы