Недостаточно памяти для запуска VGG-19 на Keras и тензорного потока на GPU 11 ГБ - PullRequest
0 голосов
/ 10 февраля 2020

Я использую keras + tensflowflow (1.14) на (cuda-10.0). У меня RTX 2080 TI GPU. Я пытаюсь запустить модель VGG-19, чтобы тренироваться на изображениях размером 640 * 480 * 1. Я запускаю код для определения объема памяти, необходимого графическому процессору для проведения обучения с размером пакета 10. Он говорит, что необходимая память составляет ~ 6 ГБ. Тем не менее он выдает ошибку памяти на 11 ГБ GPU только с размером пакета 1. Что мне здесь не хватает? Спасибо и всего наилучшего,

Модель, которую я использую, выглядит следующим образом:

    model = Sequential()
    model.add(Conv2D(input_shape=(IMG_SIZE_HEIGHT,IMG_SIZE_WIDTH,1),filters=64,kernel_size=(3,3),padding="same", activation="relu"))
    model.add(Conv2D(filters=64,kernel_size=(3,3),padding="same", activation="relu"))
    model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
    model.add(Convolution2D(128, 3, 3, activation='relu'))
    model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))
    model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))
    model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
    model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
    model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
    model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
    model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
    model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
    model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
    model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
    model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
    model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
    model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
    model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
    model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
    model.add(Flatten())
    model.add(Dense(units=4096,activation="relu"))
    model.add(Dense(units=2048,activation="relu"))
    model.add(Dropout(0.5))
    model.add(Dense(3, activation='softmax'))

This model cannot even train dataset of batch size 1! I get an out of memory error.
I am running the following piece of code to determine how much memory it takes to 
run training with batch size of 10 :

    get_model_memory_usage: Conv2D
        get_model_memory_usage:s:  480
        get_model_memory_usage:s:  640
        get_model_memory_usage:s:  64
      get_model_memory_usage: for layer:  Conv2D , memory_usage in MB is:  75.0
    get_model_memory_usage: Conv2D
        get_model_memory_usage:s:  480
        get_model_memory_usage:s:  640
        get_model_memory_usage:s:  64
      get_model_memory_usage: for layer:  Conv2D , memory_usage in MB is:  75.0
    get_model_memory_usage: MaxPooling2D
        get_model_memory_usage:s:  240
        get_model_memory_usage:s:  320
        get_model_memory_usage:s:  64
      get_model_memory_usage: for layer:  MaxPooling2D , memory_usage in MB is:  18.75
    get_model_memory_usage: Conv2D
        get_model_memory_usage:s:  238
        get_model_memory_usage:s:  318
        get_model_memory_usage:s:  128
      get_model_memory_usage: for layer:  Conv2D , memory_usage in MB is:  36.955
    get_model_memory_usage: Conv2D
        get_model_memory_usage:s:  238
        get_model_memory_usage:s:  318
        get_model_memory_usage:s:  128
      get_model_memory_usage: for layer:  Conv2D , memory_usage in MB is:  36.955
    get_model_memory_usage: Conv2D
        get_model_memory_usage:s:  238
        get_model_memory_usage:s:  318
        get_model_memory_usage:s:  128
      get_model_memory_usage: for layer:  Conv2D , memory_usage in MB is:  36.955
    get_model_memory_usage: MaxPooling2D
        get_model_memory_usage:s:  119
        get_model_memory_usage:s:  159
        get_model_memory_usage:s:  128
      get_model_memory_usage: for layer:  MaxPooling2D , memory_usage in MB is:  9.239
    get_model_memory_usage: Conv2D
        get_model_memory_usage:s:  119
        get_model_memory_usage:s:  159
        get_model_memory_usage:s:  256
      get_model_memory_usage: for layer:  Conv2D , memory_usage in MB is:  18.478
    get_model_memory_usage: Conv2D
        get_model_memory_usage:s:  119
        get_model_memory_usage:s:  159
        get_model_memory_usage:s:  256
      get_model_memory_usage: for layer:  Conv2D , memory_usage in MB is:  18.478
    get_model_memory_usage: Conv2D
        get_model_memory_usage:s:  119
        get_model_memory_usage:s:  159
        get_model_memory_usage:s:  256
      get_model_memory_usage: for layer:  Conv2D , memory_usage in MB is:  18.478
    get_model_memory_usage: Conv2D
        get_model_memory_usage:s:  119
        get_model_memory_usage:s:  159
        get_model_memory_usage:s:  256
      get_model_memory_usage: for layer:  Conv2D , memory_usage in MB is:  18.478
    get_model_memory_usage: MaxPooling2D
        get_model_memory_usage:s:  59
        get_model_memory_usage:s:  79
        get_model_memory_usage:s:  256
      get_model_memory_usage: for layer:  MaxPooling2D , memory_usage in MB is:  4.552
    get_model_memory_usage: Conv2D
        get_model_memory_usage:s:  59
        get_model_memory_usage:s:  79
        get_model_memory_usage:s:  512
      get_model_memory_usage: for layer:  Conv2D , memory_usage in MB is:  9.104
    get_model_memory_usage: Conv2D
        get_model_memory_usage:s:  59
        get_model_memory_usage:s:  79
        get_model_memory_usage:s:  512
      get_model_memory_usage: for layer:  Conv2D , memory_usage in MB is:  9.104
    get_model_memory_usage: Conv2D
        get_model_memory_usage:s:  59
        get_model_memory_usage:s:  79
        get_model_memory_usage:s:  512
      get_model_memory_usage: for layer:  Conv2D , memory_usage in MB is:  9.104
    get_model_memory_usage: MaxPooling2D
        get_model_memory_usage:s:  29
        get_model_memory_usage:s:  39
        get_model_memory_usage:s:  512
      get_model_memory_usage: for layer:  MaxPooling2D , memory_usage in MB is:  2.209
    get_model_memory_usage: Conv2D
        get_model_memory_usage:s:  29
        get_model_memory_usage:s:  39
        get_model_memory_usage:s:  512
      get_model_memory_usage: for layer:  Conv2D , memory_usage in MB is:  2.209
    get_model_memory_usage: Conv2D
        get_model_memory_usage:s:  29
        get_model_memory_usage:s:  39
        get_model_memory_usage:s:  512
      get_model_memory_usage: for layer:  Conv2D , memory_usage in MB is:  2.209
    get_model_memory_usage: MaxPooling2D
        get_model_memory_usage:s:  14
        get_model_memory_usage:s:  19
        get_model_memory_usage:s:  512
      get_model_memory_usage: for layer:  MaxPooling2D , memory_usage in MB is:  0.52
    get_model_memory_usage: Flatten
        get_model_memory_usage:s:  136192
      get_model_memory_usage: for layer:  Flatten , memory_usage in MB is:  0.52
    get_model_memory_usage: Dense
        get_model_memory_usage:s:  4096
      get_model_memory_usage: for layer:  Dense , memory_usage in MB is:  0.016
    get_model_memory_usage: Dense
        get_model_memory_usage:s:  2048
      get_model_memory_usage: for layer:  Dense , memory_usage in MB is:  0.008
    get_model_memory_usage: Dropout
        get_model_memory_usage:s:  2048
      get_model_memory_usage: for layer:  Dropout , memory_usage in MB is:  0.008
    get_model_memory_usage: Dense
        get_model_memory_usage:s:  3
      get_model_memory_usage: for layer:  Dense , memory_usage in MB is:  0.0
    get_model_memory_usage: trainable_count:  579334723  non-trainable count  0.0
    get_model_memory_usage: final size of the model with batch size:  10  is:  6.087 GB

Код для определения использования памяти:

def get_model_memory_usage(batch_size, model):
    number_size = 4.0
    if K.floatx() == 'float16':
         number_size = 2.0
    if K.floatx() == 'float64':
         number_size = 8.0

    shapes_mem_count = 0
    internal_model_mem_count = 0
    for l in model.layers:
        layer_type = l.__class__.__name__
        print("get_model_memory_usage:", layer_type)
        if layer_type == 'Model':
            internal_model_mem_count += get_model_memory_usage(batch_size, l)
        single_layer_mem = 1
        for s in l.output_shape:
            if s is None:
                continue
            print("    get_model_memory_usage:s: ", s)
            single_layer_mem *= s
        print("  get_model_memory_usage: for layer: ", layer_type, ", memory_usage in MB is: ", np.round(single_layer_mem * number_size / (1024.0 ** 2), 3))
        shapes_mem_count += single_layer_mem

    trainable_count = np.sum([K.count_params(p) for p in set(model.trainable_weights)])
    non_trainable_count = np.sum([K.count_params(p) for p in set(model.non_trainable_weights)])
    print("get_model_memory_usage: trainable_count: ", trainable_count, " non-trainable count ", non_trainable_count)

    total_memory = number_size*(batch_size*shapes_mem_count + trainable_count + non_trainable_count)
    gbytes = np.round(total_memory / (1024.0 ** 3), 3) + internal_model_mem_count
    print("get_model_memory_usage: final size of the model with batch size: ", batch_size, " is: ", gbytes)
    return gbytes

1 Ответ

1 голос
/ 07 апреля 2020

Вы можете следить за сетью ниже, чтобы избежать проблемы нехватки памяти, добавив слой maxpooling после каждых двух слоев свертки.

model = Sequential()
model.add(Conv_Base)
model.add(Conv2D(input_shape=(32,32,3),filters=64,kernel_size=(3,3),padding="same", activation="relu"))
model.add(Conv2D(filters=64,kernel_size=(3,3),padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2),padding="same"))
model.add(Conv2D(filters=128, kernel_size=(3,3), activation='relu',padding="same"))
model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2),padding="same"))
model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2),padding="same"))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2),padding="same"))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2),padding="same"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2),padding="same"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2),padding="same"))
model.add(Flatten())
model.add(Dense(units=4096,activation="relu"))
model.add(Dense(units=2048,activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

model.summary()

Вывод:

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
vgg19 (Model)                (None, 1, 1, 512)         20024384  
_________________________________________________________________
conv2d_16 (Conv2D)           (None, 1, 1, 64)          294976    
_________________________________________________________________
conv2d_17 (Conv2D)           (None, 1, 1, 64)          36928     
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 1, 1, 64)          0         
_________________________________________________________________
conv2d_18 (Conv2D)           (None, 1, 1, 128)         73856     
_________________________________________________________________
conv2d_19 (Conv2D)           (None, 1, 1, 128)         147584    
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 1, 1, 128)         0         
_________________________________________________________________
conv2d_20 (Conv2D)           (None, 1, 1, 128)         147584    
_________________________________________________________________
conv2d_21 (Conv2D)           (None, 1, 1, 256)         295168    
_________________________________________________________________
max_pooling2d_10 (MaxPooling (None, 1, 1, 256)         0         
_________________________________________________________________
conv2d_22 (Conv2D)           (None, 1, 1, 256)         590080    
_________________________________________________________________
conv2d_23 (Conv2D)           (None, 1, 1, 256)         590080    
_________________________________________________________________
max_pooling2d_11 (MaxPooling (None, 1, 1, 256)         0         
_________________________________________________________________
conv2d_24 (Conv2D)           (None, 1, 1, 256)         590080    
_________________________________________________________________
conv2d_25 (Conv2D)           (None, 1, 1, 512)         1180160   
_________________________________________________________________
max_pooling2d_12 (MaxPooling (None, 1, 1, 512)         0         
_________________________________________________________________
conv2d_26 (Conv2D)           (None, 1, 1, 512)         2359808   
_________________________________________________________________
conv2d_27 (Conv2D)           (None, 1, 1, 512)         2359808   
_________________________________________________________________
max_pooling2d_13 (MaxPooling (None, 1, 1, 512)         0         
_________________________________________________________________
conv2d_28 (Conv2D)           (None, 1, 1, 512)         2359808   
_________________________________________________________________
conv2d_29 (Conv2D)           (None, 1, 1, 512)         2359808   
_________________________________________________________________
max_pooling2d_14 (MaxPooling (None, 1, 1, 512)         0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 512)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 4096)              2101248   
_________________________________________________________________
dense_4 (Dense)              (None, 2048)              8390656   
_________________________________________________________________
dropout_1 (Dropout)          (None, 2048)              0         
_________________________________________________________________
dense_5 (Dense)              (None, 10)                20490     
=================================================================
Total params: 43,922,506
Trainable params: 43,922,506
Non-trainable params: 0
...