Проблемы с преобразованием кода Keras в код PyTorch (формирование) - PullRequest
1 голос
/ 11 апреля 2019

У меня есть код keras, который мне нужно преобразовать в Pytorch. Я новичок в pytorch, и у меня возникли проблемы, когда я не могу понять, как вводить данные так же, как в keras. Я потратил много часов на это любые советы или помощь очень ценится.

Вот код keras, с которым я имею дело. Форма ввода (5000,1)

    def build(input_shape, classes):
        model = Sequential()

        filter_num = ['None',32,64,128,256]
        kernel_size = ['None',8,8,8,8]
        conv_stride_size = ['None',1,1,1,1]
        pool_stride_size = ['None',4,4,4,4]
        pool_size = ['None',8,8,8,8]


        # Block1
        model.add(Conv1D(filters=filter_num[1], kernel_size=kernel_size[1], input_shape=input_shape,
                         strides=conv_stride_size[1], padding='same',
                         name='block1_conv1'))
        model.add(BatchNormalization(axis=-1))
        model.add(ELU(alpha=1.0, name='block1_adv_act1'))
        model.add(Conv1D(filters=filter_num[1], kernel_size=kernel_size[1],
                         strides=conv_stride_size[1], padding='same',
                         name='block1_conv2'))
        model.add(BatchNormalization(axis=-1))
        model.add(ELU(alpha=1.0, name='block1_adv_act2'))
        model.add(MaxPooling1D(pool_size=pool_size[1], strides=pool_stride_size[1],
                               padding='same', name='block1_pool'))
        model.add(Dropout(0.1, name='block1_dropout'))



        # Block 2
        model.add(Conv1D(filters=filter_num[2], kernel_size=kernel_size[2],
                         strides=conv_stride_size[2], padding='same',
                         name='block2_conv1'))
        model.add(BatchNormalization())
        model.add(Activation('relu', name='block2_act1'))

        model.add(Conv1D(filters=filter_num[2], kernel_size=kernel_size[2],
                         strides=conv_stride_size[2], padding='same',
                         name='block2_conv2'))
        model.add(BatchNormalization())
        model.add(Activation('relu', name='block2_act2'))
        model.add(MaxPooling1D(pool_size=pool_size[2], strides=pool_stride_size[3],
                               padding='same', name='block2_pool'))
        model.add(Dropout(0.1, name='block2_dropout'))



        # Block 3
        model.add(Conv1D(filters=filter_num[3], kernel_size=kernel_size[3],
                         strides=conv_stride_size[3], padding='same',
                         name='block3_conv1'))
        model.add(BatchNormalization())
        model.add(Activation('relu', name='block3_act1'))
        model.add(Conv1D(filters=filter_num[3], kernel_size=kernel_size[3],
                         strides=conv_stride_size[3], padding='same',
                         name='block3_conv2'))
        model.add(BatchNormalization())
        model.add(Activation('relu', name='block3_act2'))
        model.add(MaxPooling1D(pool_size=pool_size[3], strides=pool_stride_size[3],
                               padding='same', name='block3_pool'))
        model.add(Dropout(0.1, name='block3_dropout'))



        # Block 4
        model.add(Conv1D(filters=filter_num[4], kernel_size=kernel_size[4],
                         strides=conv_stride_size[4], padding='same',
                         name='block4_conv1'))
        model.add(BatchNormalization())
        model.add(Activation('relu', name='block4_act1'))
        model.add(Conv1D(filters=filter_num[4], kernel_size=kernel_size[4],
                         strides=conv_stride_size[4], padding='same',
                         name='block4_conv2'))
        model.add(BatchNormalization())
        model.add(Activation('relu', name='block4_act2'))
        model.add(MaxPooling1D(pool_size=pool_size[4], strides=pool_stride_size[4],
                               padding='same', name='block4_pool'))
        model.add(Dropout(0.1, name='block4_dropout'))




        # FC #1
        model.add(Flatten(name='flatten'))
        model.add(Dense(512, kernel_initializer=glorot_uniform(seed=0), name='fc1'))
        model.add(BatchNormalization())
        model.add(Activation('relu', name='fc1_act'))

        model.add(Dropout(0.7, name='fc1_dropout'))


        #FC #2
        model.add(Dense(512, kernel_initializer=glorot_uniform(seed=0), name='fc2'))
        model.add(BatchNormalization())
        model.add(Activation('relu', name='fc2_act'))

        model.add(Dropout(0.5, name='fc2_dropout'))


        # Classification
        model.add(Dense(classes, kernel_initializer=glorot_uniform(seed=0), name='fc3'))
        model.add(Activation('softmax', name="softmax"))
        return model

Вот результаты model.summary () из кода keras

Layer (type)                 Output Shape              Param #   
=================================================================
block1_conv1 (Conv1D)        (None, 5000, 32)          288       
_________________________________________________________________
batch_normalization_1 (Batch (None, 5000, 32)          128       
_________________________________________________________________
block1_adv_act1 (ELU)        (None, 5000, 32)          0         
_________________________________________________________________
block1_conv2 (Conv1D)        (None, 5000, 32)          8224      
_________________________________________________________________
batch_normalization_2 (Batch (None, 5000, 32)          128       
_________________________________________________________________
block1_adv_act2 (ELU)        (None, 5000, 32)          0         
_________________________________________________________________
block1_pool (MaxPooling1D)   (None, 1250, 32)          0         
_________________________________________________________________
block1_dropout (Dropout)     (None, 1250, 32)          0         
_________________________________________________________________
block2_conv1 (Conv1D)        (None, 1250, 64)          16448     
_________________________________________________________________
batch_normalization_3 (Batch (None, 1250, 64)          256       
_________________________________________________________________
block2_act1 (Activation)     (None, 1250, 64)          0         
_________________________________________________________________
block2_conv2 (Conv1D)        (None, 1250, 64)          32832     
_________________________________________________________________
batch_normalization_4 (Batch (None, 1250, 64)          256       
_________________________________________________________________
block2_act2 (Activation)     (None, 1250, 64)          0         
_________________________________________________________________
block2_pool (MaxPooling1D)   (None, 313, 64)           0         
_________________________________________________________________
block2_dropout (Dropout)     (None, 313, 64)           0         
_________________________________________________________________
block3_conv1 (Conv1D)        (None, 313, 128)          65664     
_________________________________________________________________
batch_normalization_5 (Batch (None, 313, 128)          512       
_________________________________________________________________
block3_act1 (Activation)     (None, 313, 128)          0         
_________________________________________________________________
block3_conv2 (Conv1D)        (None, 313, 128)          131200    
_________________________________________________________________
batch_normalization_6 (Batch (None, 313, 128)          512       
_________________________________________________________________
block3_act2 (Activation)     (None, 313, 128)          0         
_________________________________________________________________
block3_pool (MaxPooling1D)   (None, 79, 128)           0         
_________________________________________________________________
block3_dropout (Dropout)     (None, 79, 128)           0         
_________________________________________________________________
block4_conv1 (Conv1D)        (None, 79, 256)           262400    
_________________________________________________________________
batch_normalization_7 (Batch (None, 79, 256)           1024      
_________________________________________________________________
block4_act1 (Activation)     (None, 79, 256)           0         
_________________________________________________________________
block4_conv2 (Conv1D)        (None, 79, 256)           524544    
_________________________________________________________________
batch_normalization_8 (Batch (None, 79, 256)           1024      
_________________________________________________________________
block4_act2 (Activation)     (None, 79, 256)           0         
_________________________________________________________________
block4_pool (MaxPooling1D)   (None, 20, 256)           0         
_________________________________________________________________
block4_dropout (Dropout)     (None, 20, 256)           0         
_________________________________________________________________
flatten (Flatten)            (None, 5120)              0         
_________________________________________________________________
fc1 (Dense)                  (None, 512)               2621952   
_________________________________________________________________
batch_normalization_9 (Batch (None, 512)               2048      
_________________________________________________________________
fc1_act (Activation)         (None, 512)               0         
_________________________________________________________________
fc1_dropout (Dropout)        (None, 512)               0         
_________________________________________________________________
fc2 (Dense)                  (None, 512)               262656    
_________________________________________________________________
batch_normalization_10 (Batc (None, 512)               2048      
_________________________________________________________________
fc2_act (Activation)         (None, 512)               0         
_________________________________________________________________
fc2_dropout (Dropout)        (None, 512)               0         
_________________________________________________________________
fc3 (Dense)                  (None, 101)               51813     
_________________________________________________________________
softmax (Activation)         (None, 101)               0         
=================================================================
Total params: 3,985,957
Trainable params: 3,981,989
Non-trainable params: 3,968

Вот что я сделал в pytorch

class model(torch.nn.Module):
    def __init__(self, input_channels, kernel_size, stride, pool_kernel, pool_stride, dropout_p, dropout_inplace=False):
        super(model, self).__init__()
        self.encoder = nn.Sequential(
            BasicBlock1(input_channels, kernel_size, stride, pool_kernel, pool_stride, dropout_p),
            BasicBlock(input_channels//4, kernel_size, stride, pool_kernel, pool_stride, dropout_p),
            BasicBlock(input_channels//16, kernel_size, stride, pool_kernel, pool_stride, dropout_p),
            BasicBlock(input_channels//16//4, kernel_size, stride, pool_kernel, pool_stride, dropout_p)
        )


        self.decoder = nn.Sequential(
            nn.Linear(5120, 512),
            nn.BatchNorm1d(512),
            nn.ReLU(),
            nn.Dropout(p=dropout_p, inplace=dropout_inplace),
            nn.Linear(512, 512),
            nn.BatchNorm1d(512),
            nn.ReLU(),
            nn.Dropout(p=dropout_p, inplace=dropout_inplace),
            nn.Linear(512, 101),
            nn.Softmax(dim=101)
        )
    def forward(self, x):
        x = self.encoder(x)

        x = x.view(x.size(0), -1)  # flatten

        x = self.decoder(x)
        return x


def BasicBlock(input_channels, kernel_size, stride, pool_kernel, pool_stride, dropout_p, dropout_inplace=False):
    return nn.Sequential(
        nn.Conv1d(in_channels=input_channels, out_channels=input_channels, kernel_size=kernel_size, stride=stride,
                  padding=get_pad_size(input_channels, input_channels, kernel_size)),
        nn.BatchNorm1d(32),
        nn.ReLU(),
        nn.Conv1d(in_channels=input_channels, out_channels=input_channels, kernel_size=kernel_size, stride=stride,
                  padding=get_pad_size(input_channels, input_channels, kernel_size)),
        nn.BatchNorm1d(32),
        nn.ReLU(),
        nn.MaxPool1d(kernel_size=pool_kernel, stride=pool_stride,
                     padding=get_pad_size(input_channels, input_channels/4, kernel_size)),
        nn.Dropout(p=dropout_p, inplace=dropout_inplace)
    )


def BasicBlock1(input_channels, kernel_size, stride, pool_kernel, pool_stride, dropout_p, dropout_inplace=False):
    return nn.Sequential(
        nn.Conv1d(in_channels=1, out_channels=input_channels, kernel_size=kernel_size, stride=stride,
                  padding=get_pad_size(input_channels, input_channels, kernel_size)),
        nn.BatchNorm1d(32),
        nn.ReLU(),
        nn.Conv1d(in_channels=input_channels, out_channels=input_channels, kernel_size=kernel_size, stride=stride,
                  padding=get_pad_size(input_channels, input_channels, kernel_size)),
        nn.BatchNorm1d(32),
        nn.ReLU(),
        nn.MaxPool1d(kernel_size=pool_kernel, stride=pool_stride,
                     padding=get_pad_size(input_channels, input_channels/4, kernel_size)),
        nn.Dropout(p=dropout_p, inplace=dropout_inplace)
    )


def get_pad_size(input_shape, output_shape, kernel_size, stride=1, dilation=1):
    """
    Gets the right padded needed to maintain same shape in the conv layers
    BEWARE: works only on odd size kernel size
    :param input_shape: the input shape to the conv layer
    :param output_shape: the desired output shape of the conv layer
    :param kernel_size: the size of the kernel window, has to be odd
    :param stride: Stride of the convolution
    :param dilation: Spacing between kernel elements
    :return: the appropriate pad size for the needed configuration
    :Author: Aneesh
    """

    if kernel_size % 2 == 0:
        raise ValueError(
            "Kernel size has to be odd for this function to work properly. Current Value is %d." % kernel_size)

    return (int((output_shape * stride - stride + kernel_size - input_shape + (kernel_size - 1) * (dilation - 1)) / 2))

Наконец, вот краткое изложение модели, которую создает моя модель pytorch

model(
  (encoder): Sequential(
    (0): Sequential(
      (0): Conv1d(1, 5000, kernel_size=(7,), stride=(1,), padding=(3,))
      (1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
      (3): Conv1d(5000, 5000, kernel_size=(7,), stride=(1,), padding=(3,))
      (4): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU()
      (6): MaxPool1d(kernel_size=8, stride=4, padding=-1872, dilation=1, ceil_mode=False)
      (7): Dropout(p=0.1)
    )
    (1): Sequential(
      (0): Conv1d(1250, 1250, kernel_size=(7,), stride=(1,), padding=(3,))
      (1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
      (3): Conv1d(1250, 1250, kernel_size=(7,), stride=(1,), padding=(3,))
      (4): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU()
      (6): MaxPool1d(kernel_size=8, stride=4, padding=-465, dilation=1, ceil_mode=False)
      (7): Dropout(p=0.1)
    )
    (2): Sequential(
      (0): Conv1d(312, 312, kernel_size=(7,), stride=(1,), padding=(3,))
      (1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
      (3): Conv1d(312, 312, kernel_size=(7,), stride=(1,), padding=(3,))
      (4): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU()
      (6): MaxPool1d(kernel_size=8, stride=4, padding=-114, dilation=1, ceil_mode=False)
      (7): Dropout(p=0.1)
    )
    (3): Sequential(
      (0): Conv1d(78, 78, kernel_size=(7,), stride=(1,), padding=(3,))
      (1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
      (3): Conv1d(78, 78, kernel_size=(7,), stride=(1,), padding=(3,))
      (4): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU()
      (6): MaxPool1d(kernel_size=8, stride=4, padding=-26, dilation=1, ceil_mode=False)
      (7): Dropout(p=0.1)
    )
  )
  (decoder): Sequential(
    (0): Linear(in_features=5120, out_features=512, bias=True)
    (1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Dropout(p=0.1)
    (4): Linear(in_features=512, out_features=512, bias=True)
    (5): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (6): ReLU()
    (7): Dropout(p=0.1)
    (8): Linear(in_features=512, out_features=101, bias=True)
    (9): Softmax()
  )
)

1 Ответ

0 голосов
/ 12 апреля 2019

Я думаю, что ваша основная проблема в том, что вы путаете in_channels и out_channels с формами Кераса.Давайте просто возьмем первый сверточный слой в качестве примера.В Keras у вас есть:

Conv1D(filters=32, kernel_size=8, input_shape=(5000,1), strides=1, padding='same')

Эквивалент PyTorch должен быть (изменив размер ядра на 7, как вы сделали, мы вернемся к нему позже):

nn.Conv1d(in_channels=1, out_channels=32, kernel_size=7, stride=1, padding=3) # different kernel size

Примечаниечто вам не нужно задавать форму вашей входной последовательности для pytorch.Теперь давайте посмотрим, как это можно сравнить с тем, что вы сделали:

nn.Conv1d(in_channels=1, out_channels=5000, kernel_size=7, stride=1, padding=0) # note padding

Вы только что создали огромную сеть.В то время как правильная реализация выдает вывод [b, 32, 5000], где b - размер пакета, вы выводите [b, 5000, 5000].

Надеюсь, этот пример поможет вам исправить остальную часть вашей реализации.

Наконец, некоторые заметки о репликации same заполнения в pytorch.При равных размерах ядра, чтобы сохранить размер ввода, вам нужно асимметричное заполнение.Я думаю, что это может быть недоступно при создании слоя.Я вижу, вы вместо этого изменили размер ядра на 7, но на самом деле это можно сделать с исходным размером ядра 8. Вы можете использовать заполнение в вашей функции forward() для создания требуемого асимметричного заполнения.

layer = nn.Conv1d(in_channels=1, out_channels=32, kernel_size=8, stride=1, padding=0) # layer without padding
x = torch.empty(1, 1, 5000).normal_()  # random input

# forward run
x_padded = torch.nn.functional.pad(x, (3,4))
y = layer(x_padded).shape
print(y.shape)  # torch.Size([1, 32, 5000])
...