tf.keras.fit
не соблюдая размер пакета, продолжает получать OOM для выделения всего тензора в памяти GPU.
Я пытаюсь подобрать модель DNN для набора данных mnist:
mnist_model = tf.keras.Sequential([
tf.keras.layers.Conv2D(filters=35,kernel_size=(3,3), strides=(1,1), padding='same',
activation='relu', input_shape = (1, 28, 28), data_format="channels_first",
use_bias=True, bias_initializer=tf.keras.initializers.constant(0.01),
kernel_initializer='glorot_normal'),
# tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPool2D(pool_size=(2,2), padding='same', data_format='channels_first'),
tf.keras.layers.Conv2D(filters=36,kernel_size=(3,3), strides=(1,1), padding='same',
activation='relu', data_format="channels_first", use_bias=True,
bias_initializer=tf.keras.initializers.constant(0.01), kernel_initializer='glorot_normal'),
# tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPool2D(pool_size=(2,2), padding='same', data_format='channels_first'),
tf.keras.layers.Conv2D(filters=36,kernel_size=(3,3), strides=(1,1), padding='same',
activation='relu', data_format="channels_first", use_bias=True,
bias_initializer=tf.keras.initializers.constant(0.01), kernel_initializer='glorot_normal'),
# tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPool2D(pool_size=(2,2), padding='same', data_format='channels_first'),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(576, activation='relu'),
tf.keras.layers.Dense(10, activation='relu')
])
(mnist_images, mnist_labels), _ = tf.keras.datasets.mnist.load_data()
dataset = tf.data.Dataset.from_tensor_slices(
(tf.cast(mnist_images[...,tf.newaxis]/255, tf.float16),
tf.cast(mnist_labels,tf.int8)))
dataset = dataset.shuffle(1000)
mnist_images = tf.convert_to_tensor(np.expand_dims(mnist_images, axis = 1))
mnist_model.compile(optimizer=tf.keras.optimizers.Adam(), loss="categorical_crossentropy", metrics=['accuracy'])
mnist_model.fit(mnist_images, tf.one_hot(mnist_labels, depth=10), epochs=2, steps_per_epoch=100)
Я ожидаю, что размер партии будет 600000/100 = 6000, однако Keras продолжает распределять тензоры по форме [60000,35,28,28]. Параметр steps_per_epoch
не учитывается. Я получаю эту ошибку:
ResourceExhaustedError: OOM when allocating tensor with shape[60000,35,28,28] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node conv2d_19/Conv2D}} = Conv2D[T=DT_FLOAT, _class=["loc:@training_6/Adam/gradients/conv2d_19/Conv2D_grad/Conv2DBackpropFilter"], data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_identity_conv2d_19_input_0, conv2d_19/Conv2D/ReadVariableOp)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[{{node ConstantFoldingCtrl/loss_6/dense_13_loss/broadcast_weights/assert_broadcastable/AssertGuard/Switch_0/_912}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_324_C...d/Switch_0", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.