Кому-нибудь удалось (намеренно) перегрузить нейронную сеть MNIST? - PullRequest
0 голосов
/ 22 апреля 2019

Я в настоящее время изучаю себя с предметом «репрезентативная (выразительная сила) нейронной сети» и пытаюсь преднамеренно полностью соответствовать нейронной сети, что означает, что, по крайней мере, модель способна идеально построить отображение при вводе обучающих данных / выход.

Мои текущие данные для этого эксперимента - MNIST, и я пытаюсь использовать структуру AutoEncoder / Decoder, чтобы проверить, могу ли я преднамеренно переписать нейронную сеть, какой структурой сети.

Меня обычно интересовало, какая комбинация размера скрытого измерения и сколько ReLU являются лучшей комбинацией для увеличения выразительной силы нейронной сети, что означает, что комбинация минимально достигает потери обучения (в этом случае я бы использовал двоичное пересечение энтропия между х и развед_х)

Проблема в том, что мне не удалось преднамеренно одеть (проигрыш почти равен 0).

Я пробовал несколько глубоких / неглубоких FCN с другим скрытым измерением, мое лучшее минимальное достижение потерь - 55, что выглядит слишком много по сравнению с 0.

import torch
import torch.nn as nn


class AE(nn.Module):

    def __init__(self,
                 encoder_layer_sizes,
                 latent_size,
                 decoder_layer_sizes,
                 num_labels=0):

        super().__init__()

        assert type(encoder_layer_sizes) == list
        assert type(latent_size) == int
        assert type(decoder_layer_sizes) == list

        self.latent_size = latent_size

        self.encoder = Encoder(
            encoder_layer_sizes,
            latent_size,
            num_labels)
        self.decoder = Decoder(
            decoder_layer_sizes,
            latent_size,
            num_labels)

    def forward(self,
                x,
                c=None):

        if x.dim() > 2:
            x = x.view(-1, 28*28)

        z = self.encoder(x, c)

        recon_x = self.decoder(z, c)

        return recon_x, z

    def inference(self, device, n=1, c=None):

        batch_size = n
        z = torch.randn([batch_size,
                         self.latent_size]).to(device)

        recon_x = self.decoder(z, c)

        return recon_x


class Encoder(nn.Module):

    def __init__(self,
                 layer_sizes,
                 latent_size,
                 num_labels):

        super().__init__()


        self.MLP = nn.Sequential()

        for i, (in_size, out_size) in enumerate(zip(layer_sizes[:-1],
                                                    layer_sizes[1:])):
            print(i, ": ", in_size, out_size)
            self.MLP.add_module(name="L{:d}".format(i),
                                module=nn.Linear(in_size, out_size))
            if i != len(layer_sizes):
                print("ReLU added @ Encoder")
                self.MLP.add_module(name="A{:d}".format(i),
                                    module=nn.ReLU())
                # self.MLP.add_module(name="BN{:d}".format(i),
                #                     module=nn.BatchNorm1d(out_size))

        self.linear = nn.Linear(layer_sizes[-1], latent_size)

    def forward(self, x, c=None):


        x = self.MLP(x)

        z = self.linear(x)

        return z


class Decoder(nn.Module):

    def __init__(self,
                 layer_sizes,
                 latent_size,
                 num_labels):

        super().__init__()

        self.MLP = nn.Sequential()
        input_size = latent_size

        for i, (in_size, out_size) in enumerate(
                zip([input_size]+layer_sizes[:-1], layer_sizes)):
            print(i, ": ", in_size, out_size)
            self.MLP.add_module(
                name="L{:d}".format(i), module=nn.Linear(in_size, out_size))
            if i+1 < len(layer_sizes):
                if i != 0:
                    print("ReLU added @ Decoder")
                    self.MLP.add_module(name="A{:d}".format(i), module=nn.ReLU())
                    # self.MLP.add_module(name="BN{:d}".format(i),
                    #                     module=nn.BatchNorm1d(out_size))

            else:
                print("Sig step")
                self.MLP.add_module(name="sigmoid", module=nn.Sigmoid())

    def forward(self, z, c):

        x = self.MLP(z)

        return x

Это код модели, который я использовал, и если я помещу [784, 256, 256] в переменную «layer_sizes», модель генерирует декодер кодера симметрично с заданным входным / выходным линейным преобразованием размеров с ReLU между ними.

Я экспериментировал с большим количеством "layer_sizes", и я прилагаю его журналы для справок.

## Goal of the Project
The project goal is about the way to determine the `optimal number of latent dimension`. 

First, the project introduces the linearity and non-linearity and postulates the assumption that linearity corresponds to `one` dimension. Then, this linearity could be split into `two` non-overlapping dimension by one ReLU based non-linearity. 

Therefore, this project shows that the determination of optimal number of latent dimension
 preliminarily `not depend on the data distribution itself`, but depends on `the network structure`, 
 more specifically, depends on the `total number of dimension that the model 
about to express`. The paper will call this total number of dimension that the 
model about to express as **model dimension**.

After the model dimension being set, one can train the network and check whether 
it's possible to over-fit the network with the data given. If the data points 
over-fit in some point of train epochs, this network can be thought as "enough to 
express the data distribution". However, if not over-fit, one can consider to 
enlarge the **model dimension** and re-try the over-fit process.

## To-do
Define the over-fit.
The classification threshold of over-fit depends on the experiment. 
- In which epoch of training process one should determine over-fit? 

## Caution
It's better to use whole data when to determine the "model dimension" since 
it's about how much non-linearity is required for the collected or targeted 
data domain.

## Convergence Determination Metric
When the EpochAVGLoss doe not change more than 1 % for 5 epochs from the first epoch, we determine the training loss being converged 

## Experiment Workflow

##### Exp_1 : 1 ReLU applied to 256 dimension. (Then Linear Transformation to LatentDim)

By the assumption, the **model dimension** is 512(256*2). Thus, we verify the assumption by

1) check the sequential decrease of Loss at certain train epoch while sequentially increase the LatentDim

with `1 * (MLP + ReLU) + LatentDim 1` 

    Epoch 09/10 Batch 0937/937, Loss  165.5437

with `1 * (MLP + ReLU) + LatentDim 2` 

    Epoch 09/10 Batch 0937/937, Loss  150.2990

with `1 * (MLP + ReLU) + LatentDim 3` 

    Epoch 09/10 Batch 0937/937, Loss  133.2206

with `1 * (MLP + ReLU) + LatentDim 4` 

    Epoch 09/10 Batch 0937/937, Loss  138.1151

with `1 * (MLP + ReLU) + LatentDim 8` 

    Epoch 09/10 Batch 0937/937, Loss  110.9839

with `1 * (MLP + ReLU) + LatentDim 16` 

    Epoch 09/10 Batch 0937/937, Loss 89.6707

with `1 * (MLP + ReLU) + LatentDim 32` 

    Epoch 09/10 Batch 0937/937, Loss 72.5663

with `1 * (MLP + ReLU) + LatentDim 64` 

    Epoch 09/10 Batch 0937/937, Loss 54.2545

> ... since the model converges at LatentDim 64 with Loss 52, we shrink down the ReLU_InputDim to 32 (go to Exp3)

with `1 * (MLP + ReLU) + LatentDim 128` 

    Epoch 09/10 Batch 0937/937, Loss   54.3565

with `1 * (MLP + ReLU) + LatentDim 256` 

    Epoch 09/10 Batch 0937/937, Loss   52.3050

> ... must keep decreasing. write the code to automatically does this job 

with `1 * (MLP + ReLU) + LatentDim 512` 

    Epoch 09/10 Batch 0937/937, Loss   53.2412

> ... Check whether at any LatentDim > 512, no decrease of Loss at fixed train epoch. 


with `1 * (MLP + ReLU) + LatentDim 1024` 

    Epoch 09/10 Batch 0937/937, Loss   54.3255

> As you see, with the expansion of LatentDim `doubled`, still the LossAtFixedStep is not decreased, 
which means model dimension already being saturated. 
#### Exp_2: Now Introduce the Twice more model dimension by ReLU 

with `2 * (MLP + ReLU) + LatentDim 1024`

> Epoch 09/10 Batch 0937/937, Loss   57.9039 


(without Bias.. the sequential ReLU doesn't work)


### Exp_3 : Shrink down ReLU InputDim to 32 maintaining latentDim 64

### Summary of Algorithm

    If convgeLoss != 0:
        if modelDim > latentDim:
            enlarge latentDim 
        if modelDim =< latentDim:
            increase #ReLU 

    * modelDim = 2* num_ReLUs 

To verify this, 

@ exp latentDim 64, convergeLoss 80, layerSize [784, 32], 
if one increase the latentDim, convergeLoss should not be below 80

Let's Check! 
@ exp latentDim 128, convergeLoss 80, layerSize [784, 32], convergeLoss 80 

now, let's add stack the double ReLU layers, [784, 32, 32], which is assumably represents 128 dimension 
@ exp latentDim 128, convergeLoss 80, layerSize [784, 32, 32], convergeLoss 80 (still same)

As you see, without enlarge of foremost dimension, the deeper ReLU does not work. This is reference from Raghu(2017)

Now make it wide, such as [784, 64],
@ exp_1555829642 latentDim 128, convergeLoss 80, layerSize [784, 64], the convergeLoss 65 < 80  

moreover, make it more wide, such as [784, 128],
@ exp_1555829642 latentDim 128, convergeLoss 55, layerSize [784, 128], the convergeLoss 55 < 80  

moreover, make it more wide, such as [784, 256],
@ exp_1555832143 latentDim 128, convergeLoss 55, layerSize [784, 256], the convergeLoss  55 = 55  

The problem is, latentDim. Make sure the latentDim is sufficient 
@ exp_1555832638 latentDim 256, convergeLoss 55, layerSize [784, 256], the convergeLoss  55 = 55  

===> Question! How to determine latentDim with less effort not getting through this cumbersome experimental step?

The problem is, latentDim. Make sure the latentDim is sufficient 
@ exp_1555832638 latentDim 128, convergeLoss 65, layerSize [784, 256, 256], the convergeLoss  65 > 55  

The problem is, latentDim. Make sure the latentDim is sufficient 
@ exp_1555832638 latentDim 256, convergeLoss 65, layerSize [784, 256, 256], the convergeLoss  68 > 55  

The problem is, latentDim. Make sure the latentDim is sufficient 
@ exp_1555832638 latentDim 64, convergeLoss 65, layerSize [784, 256, 256], the convergeLoss  68 > 55  

The problem is, latentDim. Make sure the latentDim is sufficient 
@ exp_1555832638 latentDim 128, convergeLoss 60, layerSize [784, 256, 128], the convergeLoss  60 > 55  

The problem is, latentDim. Make sure the latentDim is sufficient 
@ exp_1555834546 latentDim 64, convergeLoss 65, layerSize [784, 256, 256], the convergeLoss  55 = 55  

=====> decrease the latentDim makes the model to learn better (Q1) 

The problem is, latentDim. Make sure the latentDim is sufficient 
@ exp_1555834546 latentDim 32, convergeLoss 65, layerSize [784, 256, 256], the convergeLoss  60 > 55  


If one check the currently get 55, 

    listed as:
          [784, 128], ld 128 
          [784, 128], ld 256 
          [784, 256, 256], ld 64

@ 1555843696, ld64 [784, 128, 128] convergeLoss 60>55
@ 1555844254, ld128 [784, 128, 128] convergeLoss 64>55
@ 1555844254, ld32 [784, 128, 128] convergeLoss 66>55


Dont know why, but if the network is deeper, too many latent space decrease the learning efficiency (Q1)




The problem is, latentDim. Make sure the latentDim is sufficient 
@ exp_1555832638 latentDim 32, convergeLoss 65, layerSize [784, 256, 256], the convergeLoss  55 = 55  


Maybe, if the modelDim is too big and latentDim is too small, as seen in exp [784, 32, 32], 
training might be not working. Thus, we have leverage up the latentDim at the same setting from 128 to 256 
@ exp_1555830495 convergeLoss 80 (still same)

Если кто-то преуспел или увидел воспроизводимые коды / отчет, которым удалось изучить строгое отображение идентификаторов MNIST со структурой автоэнкодера, пожалуйста, обратитесь ко мне!

...