Tensorflow: тренировка нескольких графических процессоров медленнее, чем на одном GPU - PullRequest
0 голосов
/ 04 мая 2020

Я пытаюсь запустить обучение нескольких графических процессоров с Tensorflow 2 (nightly) и Keras с помощью tf.distribute.MirroredStrategy ().

Проблема состоит в том, что при использовании в 2 раза большего общего размера пакета с 2 графическими процессорами (16), время тренировки становится в 3 раза медленнее за эпоху. У кого-нибудь есть подсказка, где я могу найти проблему. tf.distribute.MirroredStrategy уже выполняет всю работу за нас и быстро моделирует поезда на одном графическом процессоре.

Это модель с одним входом и двумя выходами. Что является наиболее вероятной причиной такого сценария.

Вот краткое описание модели:

Model: "Joined_Model_2"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
inp (InputLayer)                [(None, None, 257)]  0                                            
__________________________________________________________________________________________________
time_distributed_2 (TimeDistrib (None, None, 512)    131584      inp[0][0]                        
__________________________________________________________________________________________________
layer_normalization_1 (LayerNor (None, None, 512)    1024        time_distributed_2[0][0]         
__________________________________________________________________________________________________
re_lu_1 (ReLU)                  (None, None, 512)    0           layer_normalization_1[0][0]      
__________________________________________________________________________________________________
lstm_5 (LSTM)                   (None, 512)          2099200     re_lu_1[0][0]                    
__________________________________________________________________________________________________
add_5 (Add)                     (None, None, 512)    0           re_lu_1[0][0]                    
                                                                 lstm_5[0][0]                     
__________________________________________________________________________________________________
lstm_6 (LSTM)                   (None, 512)          2099200     add_5[0][0]                      
__________________________________________________________________________________________________
add_6 (Add)                     (None, None, 512)    0           add_5[0][0]                      
                                                                 lstm_6[0][0]                     
__________________________________________________________________________________________________
lstm_7 (LSTM)                   (None, 512)          2099200     add_6[0][0]                      
__________________________________________________________________________________________________
add_7 (Add)                     (None, None, 512)    0           add_6[0][0]                      
                                                                 lstm_7[0][0]                     
__________________________________________________________________________________________________
lstm_8 (LSTM)                   (None, 512)          2099200     add_7[0][0]                      
__________________________________________________________________________________________________
add_8 (Add)                     (None, None, 512)    0           add_7[0][0]                      
                                                                 lstm_8[0][0]                     
__________________________________________________________________________________________________
lstm_9 (LSTM)                   (None, 512)          2099200     add_8[0][0]                      
__________________________________________________________________________________________________
add_9 (Add)                     (None, None, 512)    0           add_8[0][0]                      
                                                                 lstm_9[0][0]                     
__________________________________________________________________________________________________
time_distributed_3 (TimeDistrib (None, None, 257)    131841      add_9[0][0]                      
__________________________________________________________________________________________________
out1(Activation)         (None, None, 257)    0           time_distributed_3[0][0]         
__________________________________________________________________________________________________
tf_op_layer_Add_1 (TensorFlowOp [(None, None, 257)]  0           out1[0][0]                 
__________________________________________________________________________________________________
tf_op_layer_RealDiv_1 (TensorFl [(None, None, 257)]  0           out1[0][0]                 
                                                                 tf_op_layer_Add_1[0][0]          
__________________________________________________________________________________________________
tf_op_layer_Sqrt_1 (TensorFlowO [(None, None, 257)]  0           tf_op_layer_RealDiv_1[0][0]      
__________________________________________________________________________________________________
tf_op_layer_Mul_1 (TensorFlowOp [(None, None, 257)]  0           inp[0][0]                        
                                                                 tf_op_layer_Sqrt_1[0][0]         
__________________________________________________________________________________________________
log_mel_spectrogram_1 (LogMelSp (None, None, 160)    0           tf_op_layer_Mul_1[0][0]          
__________________________________________________________________________________________________
lambda_1 (Lambda)               (None, None, 160, 1) 0           log_mel_spectrogram_1[0][0]      
__________________________________________________________________________________________________
conv_1 (Conv2D)                 (None, None, 80, 32) 14432       lambda_1[0][0]                   
__________________________________________________________________________________________________
conv_1_bn (BatchNormalization)  (None, None, 80, 32) 128         conv_1[0][0]                     
__________________________________________________________________________________________________
conv_1_relu (ReLU)              (None, None, 80, 32) 0           conv_1_bn[0][0]                  
__________________________________________________________________________________________________
conv_2 (Conv2D)                 (None, None, 40, 32) 236544      conv_1_relu[0][0]                
__________________________________________________________________________________________________
conv_2_bn (BatchNormalization)  (None, None, 40, 32) 128         conv_2[0][0]                     
__________________________________________________________________________________________________
conv_2_relu (ReLU)              (None, None, 40, 32) 0           conv_2_bn[0][0]                  
__________________________________________________________________________________________________
after_conv (Reshape)            (None, None, 1280)   0           conv_2_relu[0][0]                
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, None, 1600)   9993600     after_conv[0][0]                 
__________________________________________________________________________________________________
dropout_4 (Dropout)             (None, None, 1600)   0           bidirectional_1[0][0]            
__________________________________________________________________________________________________
bidirectional_2 (Bidirectional) (None, None, 1600)   11529600    dropout_4[0][0]                  
__________________________________________________________________________________________________
dropout_5 (Dropout)             (None, None, 1600)   0           bidirectional_2[0][0]            
__________________________________________________________________________________________________
bidirectional_3 (Bidirectional) (None, None, 1600)   11529600    dropout_5[0][0]                  
__________________________________________________________________________________________________
dropout_6 (Dropout)             (None, None, 1600)   0           bidirectional_3[0][0]            
__________________________________________________________________________________________________
bidirectional_4 (Bidirectional) (None, None, 1600)   11529600    dropout_6[0][0]                  
__________________________________________________________________________________________________
dropout_7 (Dropout)             (None, None, 1600)   0           bidirectional_4[0][0]            
__________________________________________________________________________________________________
bidirectional_5 (Bidirectional) (None, None, 1600)   11529600    dropout_7[0][0]                  
__________________________________________________________________________________________________
dense_1 (TimeDistributed)       (None, None, 1600)   2561600     bidirectional_5[0][0]            
__________________________________________________________________________________________________
dense_1_relu (ReLU)             (None, None, 1600)   0           dense_1[0][0]                    
__________________________________________________________________________________________________
dense_1_bn (BatchNormalization) (None, None, 1600)   6400        dense_1_relu[0][0]               
__________________________________________________________________________________________________
out2(TimeDistributed)       (None, None, 29)     46429       dense_1_bn[0][0]                 
==================================================================================================
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...