Я пытаюсь предсказать жанр mov ie из набора данных для многоуровневых данных. Входные данные выглядят так:
Id Genre Action Adventure Animation Biography Comedy Crime Documentary Drama Family Fantasy History Horror Music
tt0086425 ['Comedy', 'Drama'] 0 0 0 0 1 0 0 1 0 0 0 0 0
like this 25 columns of genre are there for each movie poster
Я провел EDA и сейчас пытаюсь создать прогностическую модель для этих многоуровневых данных, для которых разделение моих тренировок и тестов выглядит следующим образом
y = np.array(movies.drop(['Id', 'Genre'],axis=1))
from skmultilearn.model_selection import iterative_train_test_split
X_train, X_test, y_train, y_test = iterative_train_test_split(X, y, test_size=0.2)
X_train.shape, X_test.shape, y_train.shape, y_test.shape
o/p : ((5791, 224, 224, 3), (5791, 25), (1463, 224, 224, 3), (1463, 25))
y_test[0]
o/p :array([0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0], dtype=int64)
pd.DataFrame({
'train': Counter(str(combination) for row in get_combination_wise_output_matrix(X_test, order=2) for combination in row),
'test' : Counter(str(combination) for row in get_combination_wise_output_matrix(y_test, order=2) for combination in row)
}).T.fillna(0.0)
o/p:
(0, 0) (0, 1) (0, 10) (0, 11) (0, 12) (0, 13) (0, 14) (0, 18) (0, 19) (0, 2) ... (9, 13) (9, 14) (9, 18) (9, 19) (9, 20) (9, 21) (9, 22) (9, 23) (9, 24) (9, 9)
train 1074.0 323.0 11.0 56.0 2.0 4.0 41.0 40.0 148.0 33.0 ... 4.0 18.0 32.0 7.0 3.0 1.0 14.0 1.0 0.0 370.0
test 269.0 81.0 2.0 14.0 1.0 0.0 7.0 10.0 37.0 4.0 ... 2.0 4.0 20.0 2.0 0.0 0.0 4.0 0.0 1.0 97.0
2 rows × 228 columns
Теперь для создания прогнозирующего классификатора изображения я написал этот код:
model = Sequential()
model.add(Conv2D(filters=16, kernel_size=(5, 5), activation="relu", input_shape=(224,224,3)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(filters=32, kernel_size=(5, 5), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(filters=64, kernel_size=(5, 5), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(filters=64, kernel_size=(5, 5), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(25, activation='sigmoid'))
model.summary()
Model: "sequential_12"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_41 (Conv2D) (None, 220, 220, 16) 1216
_________________________________________________________________
max_pooling2d_41 (MaxPooling (None, 110, 110, 16) 0
_________________________________________________________________
dropout_47 (Dropout) (None, 110, 110, 16) 0
_________________________________________________________________
conv2d_42 (Conv2D) (None, 106, 106, 32) 12832
_________________________________________________________________
max_pooling2d_42 (MaxPooling (None, 53, 53, 32) 0
_________________________________________________________________
dropout_48 (Dropout) (None, 53, 53, 32) 0
_________________________________________________________________
conv2d_43 (Conv2D) (None, 49, 49, 64) 51264
_________________________________________________________________
max_pooling2d_43 (MaxPooling (None, 24, 24, 64) 0
_________________________________________________________________
dropout_49 (Dropout) (None, 24, 24, 64) 0
_________________________________________________________________
conv2d_44 (Conv2D) (None, 20, 20, 64) 102464
_________________________________________________________________
max_pooling2d_44 (MaxPooling (None, 10, 10, 64) 0
_________________________________________________________________
dropout_50 (Dropout) (None, 10, 10, 64) 0
_________________________________________________________________
flatten_12 (Flatten) (None, 6400) 0
_________________________________________________________________
dense_29 (Dense) (None, 128) 819328
_________________________________________________________________
dropout_51 (Dropout) (None, 128) 0
_________________________________________________________________
dense_30 (Dense) (None, 64) 8256
_________________________________________________________________
dropout_52 (Dropout) (None, 64) 0
_________________________________________________________________
dense_31 (Dense) (None, 25) 1625
=================================================================
Total params: 996,985
Trainable params: 996,985
Non-trainable params: 0
_________________________________________________________________
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
Но как только я запустил подбор модели, я получаю ошибку значения:
model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test), batch_size=64)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\keras\engine\training.py in _standardize_user_data(self, x, y, sample_weight, class_weight, check_array_lengths, batch_size)
793 feed_output_shapes,
794 check_batch_axis=False, # Don't enforce the batch size.
--> 795 exception_prefix='target')
796
797 # Generate sample-wise weight values given the `sample_weight` and
~\AppData\Local\Continuum\anaconda3\lib\site-packages\keras\engine\training_utils.py in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
129 ': expected ' + names[i] + ' to have ' +
130 str(len(shape)) + ' dimensions, but got array '
--> 131 'with shape ' + str(data_shape))
132 if not check_batch_axis:
133 data_shape = data_shape[1:]
ValueError: Error when checking target: expected dense_31 to have 2 dimensions, but got array with shape (1463, 224, 224, 3)
Я не могу понять последний шаг, как это исправить и где я совершил ошибку, как будто уже добавил сплющенный слой между conv2d и плотным слоем, но это не исправило. любая помощь будет высоко ценится