Проблема:
Я пытаюсь построить модель для бинарной классификации меланомы 'MEL' и невуса 'NV'. Набор данных взят из архива ISIC ISIC 2019 , но для 8 различных типов поражений кожи, я использую только два типа как Я упоминал (бинарная классификация).
Два класса несбалансированы: 10000 для «NV» и 3000 для «MEL».
набор данных разбит на поезд и проверку.
папка проверки содержит 904 для 'MEL' и 2200 для 'NV'
папка обучения содержит 3600 для «MEL» и 7100 для «NV»
Я выполняю два решения проблемы: уменьшение выборки для набора обучения и проверки и Увеличение изображения только для набора обучения.
папка проверки теперь содержит 904 изображения для каждого класса.
папка поезда теперь содержит две подпапки 'MEL' и 'NV' с изображением 10000
используя keras, я настроил Densenet201 и использовал функцию предварительной обработки для densenet
из keras.applications.densenet import DenseNet201, preprocess_input
densenet_model = DenseNet201(input_shape=(224, 224, 3), include_top=False, weights="imagenet")
затем я добавляю несколько слоев в конце предварительно подготовленной модели
#get the last layer shape
last_layer = densenet_model.get_layer('relu')
print('last layer output shape:', last_layer.output_shape)
last_output = last_layer.output
# Flatten the output layer to 1 dimension
x = layers.GlobalMaxPooling2D()(last_output)
# Add a fully connected layer with 512 hidden units and ReLU activation
x = layers.Dense(512, activation='relu')(x)
# Add a dropout rate of 0.5
x = layers.Dropout(0.5)(x)
# Add a final sigmoid layer for classification
x = layers.Dense(2, activation='sigmoid')(x)
# Configure and compile the model
model = Model(densenet_model.input, x)
Я не замораживал ни один слой, затем компилирую модель
optimizer = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=None,
decay=0.0, amsgrad=True)
model.compile(loss='binary_crossentropy',
optimizer=optimizer,
metrics=['accuracy','binary_accuracy'])
filepath = "densenet.h5"
# Declare a checkpoint to save the best version of the model
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1,
save_best_only=True, mode='max')
# Reduce the learning rate as the learning stagnates
reduce_lr = ReduceLROnPlateau(monitor='val_acc', factor=0.5, patience=2,
verbose=1, mode='max', min_lr=0.00001)
callbacks_list = [checkpoint, reduce_lr]
# Fit the model
history = model.fit_generator(train_batches,
steps_per_epoch=train_steps,
validation_data=val_batches,
validation_steps=val_steps,
epochs=20,
verbose=1,
callbacks=callbacks_list)
Результат:
Epoch 1/20
1701/1701 [==============================] - 793s 466ms/step - loss: 0.4436 - acc: 0.7890 - binary_accuracy: 0.7890 - val_loss: 0.3416 - val_acc: 0.8404 - val_binary_accuracy: 0.8404
Epoch 00001: val_acc improved from -inf to 0.84043, saving model to densenet.h5
Epoch 2/20
1701/1701 [==============================] - 720s 423ms/step - loss: 0.3447 - acc: 0.8450 - binary_accuracy: 0.8450 - val_loss: 0.3564 - val_acc: 0.8446 - val_binary_accuracy: 0.8446
Epoch 00002: val_acc improved from 0.84043 to 0.84458, saving model to densenet.h5
Epoch 3/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.2718 - acc: 0.8835 - binary_accuracy: 0.8835 - val_loss: 0.3785 - val_acc: 0.8487 - val_binary_accuracy: 0.8487
Epoch 00003: val_acc improved from 0.84458 to 0.84873, saving model to densenet.h5
Epoch 4/20
1701/1701 [==============================] - 726s 427ms/step - loss: 0.2051 - acc: 0.9172 - binary_accuracy: 0.9172 - val_loss: 0.3779 - val_acc: 0.8581 - val_binary_accuracy: 0.8581
Epoch 00004: val_acc improved from 0.84873 to 0.85813, saving model to densenet.h5
Epoch 5/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.1529 - acc: 0.9403 - binary_accuracy: 0.9403 - val_loss: 0.3923 - val_acc: 0.8581 - val_binary_accuracy: 0.8581
Epoch 00005: val_acc did not improve from 0.85813
Epoch 6/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.1163 - acc: 0.9553 - binary_accuracy: 0.9553 - val_loss: 0.4813 - val_acc: 0.8498 - val_binary_accuracy: 0.8498
Epoch 00006: val_acc did not improve from 0.85813
Epoch 00006: ReduceLROnPlateau reducing learning rate to 4.999999873689376e-05.
Epoch 7/20
1701/1701 [==============================] - 727s 427ms/step - loss: 0.0407 - acc: 0.9864 - binary_accuracy: 0.9864 - val_loss: 0.5726 - val_acc: 0.8667 - val_binary_accuracy: 0.8667
Epoch 00007: val_acc improved from 0.85813 to 0.86670, saving model to densenet.h5
Epoch 8/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.0287 - acc: 0.9904 - binary_accuracy: 0.9904 - val_loss: 0.5919 - val_acc: 0.8711 - val_binary_accuracy: 0.8711
Epoch 00008: val_acc improved from 0.86670 to 0.87113, saving model to densenet.h5
Epoch 9/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.0253 - acc: 0.9909 - binary_accuracy: 0.9909 - val_loss: 0.5453 - val_acc: 0.8720 - val_binary_accuracy: 0.8720
Epoch 00009: val_acc improved from 0.87113 to 0.87196, saving model to densenet.h5
Epoch 10/20
1701/1701 [==============================] - 730s 429ms/step - loss: 0.0216 - acc: 0.9927 - binary_accuracy: 0.9927 - val_loss: 0.5498 - val_acc: 0.8706 - val_binary_accuracy: 0.8706
Epoch 00010: val_acc did not improve from 0.87196
Epoch 11/20
1701/1701 [==============================] - 729s 428ms/step - loss: 0.0145 - acc: 0.9954 - binary_accuracy: 0.9954 - val_loss: 0.6332 - val_acc: 0.8822 - val_binary_accuracy: 0.8822
Epoch 00011: val_acc improved from 0.87196 to 0.88219, saving model to densenet.h5
Epoch 12/20
1701/1701 [==============================] - 731s 430ms/step - loss: 0.0176 - acc: 0.9939 - binary_accuracy: 0.9939 - val_loss: 0.6256 - val_acc: 0.8756 - val_binary_accuracy: 0.8756
Epoch 00012: val_acc did not improve from 0.88219
Epoch 13/20
1701/1701 [==============================] - 734s 432ms/step - loss: 0.0117 - acc: 0.9965 - binary_accuracy: 0.9965 - val_loss: 0.5959 - val_acc: 0.8838 - val_binary_accuracy: 0.8838
Epoch 00013: val_acc improved from 0.88219 to 0.88385, saving model to densenet.h5
Epoch 14/20
1701/1701 [==============================] - 736s 433ms/step - loss: 0.0132 - acc: 0.9958 - binary_accuracy: 0.9958 - val_loss: 0.7139 - val_acc: 0.8598 - val_binary_accuracy: 0.8598
Epoch 00014: val_acc did not improve from 0.88385
Epoch 15/20
1701/1701 [==============================] - 735s 432ms/step - loss: 0.0109 - acc: 0.9963 - binary_accuracy: 0.9963 - val_loss: 0.6139 - val_acc: 0.8720 - val_binary_accuracy: 0.8720
Epoch 00015: val_acc did not improve from 0.88385
Epoch 00015: ReduceLROnPlateau reducing learning rate to 2.499999936844688e-05.
Epoch 16/20
1701/1701 [==============================] - 734s 431ms/step - loss: 0.0048 - acc: 0.9980 - binary_accuracy: 0.9980 - val_loss: 0.6759 - val_acc: 0.8764 - val_binary_accuracy: 0.8764
Epoch 00016: val_acc did not improve from 0.88385
Epoch 17/20
1701/1701 [==============================] - 733s 431ms/step - loss: 0.0028 - acc: 0.9992 - binary_accuracy: 0.9992 - val_loss: 0.7179 - val_acc: 0.8805 - val_binary_accuracy: 0.8805
Epoch 00017: val_acc did not improve from 0.88385
Epoch 00017: ReduceLROnPlateau reducing learning rate to 1.249999968422344e-05.
Epoch 18/20
1701/1701 [==============================] - 734s 432ms/step - loss: 0.0014 - acc: 0.9996 - binary_accuracy: 0.9996 - val_loss: 0.7525 - val_acc: 0.8816 - val_binary_accuracy: 0.8816
Epoch 00018: val_acc did not improve from 0.88385
Epoch 19/20
1701/1701 [==============================] - 734s 431ms/step - loss: 0.0011 - acc: 0.9997 - binary_accuracy: 0.9997 - val_loss: 0.7580 - val_acc: 0.8803 - val_binary_accuracy: 0.8803
Epoch 00019: val_acc did not improve from 0.88385
Epoch 00019: ReduceLROnPlateau reducing learning rate to 1e-05.
Epoch 20/20
1701/1701 [==============================] - 733s 431ms/step - loss: 8.0918e-04 - acc: 0.9997 - binary_accuracy: 0.9997 - val_loss: 0.7667 - val_acc: 0.8800 - val_binary_accuracy: 0.8800
Epoch 00020: val_acc did not improve from 0.88385
Матрица путаницы:
Классификационный отчет от sklearn.metrics
precision recall f1-score support
MEL 0.48 0.47 0.48 904
NV 0.48 0.49 0.49 904
accuracy 0.48 1808
macro avg 0.48 0.48 0.48 1808
weighted avg 0.48 0.48 0.48 1808
Дилемма
Как видите, это полная модель мусора.
в чем проблема? Пожалуйста, любые предложения, это происходит в течение месяца без каких-либо улучшений.
если вы хотите взглянуть на другие части кода, я опубликую их.
это код для увеличения
class_list = ['MEL','NV']
for item in class_list:
# Create a temporary directory for the augmented images
aug_dir = 'aug_dir'
os.mkdir(aug_dir)
# Create a directory within the base dir to store images of the same class
img_dir = os.path.join(aug_dir, 'img_dir')
os.mkdir(img_dir)
# Choose a class
img_class = item
# List all the images in the directory
img_list = os.listdir('base_dir/train_dir/' + img_class)
# Copy images from the class train dir to the img_dir
for fname in img_list:
# source path to image
src = os.path.join('base_dir/train_dir/' + img_class, fname)
# destination path to image
dst = os.path.join(img_dir, fname)
# copy the image from the source to the destination
shutil.copyfile(src, dst)
# point to a dir containing the images and not to the images themselves
path = aug_dir
save_path = 'base_dir/train_dir/' + img_class
# Create a data generator to augment the images in real time
datagen = ImageDataGenerator(
rotation_range=60,
width_shift_range=0.1,
height_shift_range=0.1,
#zoom_range=0.1,
shear_range= 0.2,
horizontal_flip=True,
vertical_flip=True,
brightness_range=(0.9,1.1),
fill_mode='nearest')
batch_size = 50
aug_datagen = datagen.flow_from_directory(path,
save_to_dir=save_path,
save_format='jpg',
target_size=(224, 224),
batch_size=batch_size)
# Generate the augmented images and add them to the training folders
num_aug_images_wanted = 10000 # total number of images we want to have in each class
num_files = len(os.listdir(img_dir))
num_batches = int(np.ceil((num_aug_images_wanted - num_files) / batch_size))
# run the generator and create about 6000 augmented images
for i in range(0, num_batches):
imgs, labels = next(aug_datagen)
# delete temporary directory with the raw image files
shutil.rmtree('aug_dir')
код предварительной обработки
# Declare a few useful values
num_train_samples = train_len
num_val_samples = val_len
train_batch_size = 16
val_batch_size = 100
image_height = 224
image_width = 224
# Declare how many steps are needed in an iteration
train_steps = np.ceil(num_train_samples / train_batch_size)
val_steps = np.ceil(num_val_samples / val_batch_size)
# Set up generators
datagenr = ImageDataGenerator(
preprocessing_function= \
keras.applications.densenet.preprocess_input)
train_batches = datagenr.flow_from_directory(
train_path,
target_size=(image_height, image_width),
batch_size=train_batch_size)
val_batches = datagenr.flow_from_directory(
val_path,
target_size=(image_height, image_width),
batch_size=val_batch_size)
# Note: shuffle=False causes the test dataset to not be shuffled
test_batches = datagenr.flow_from_directory(
val_path,
target_size=(image_height, image_width),
batch_size=val_batch_size,
shuffle=False)