очень плохая точность, отзыв, f1-оценка и матрица путаницы для бинарной модели классификации, которые указывают, что она была хорошо обучена - PullRequest
0 голосов
/ 23 июня 2019

Проблема: Я пытаюсь построить модель для бинарной классификации меланомы 'MEL' и невуса 'NV'. Набор данных взят из архива ISIC ISIC 2019 , но для 8 различных типов поражений кожи, я использую только два типа как Я упоминал (бинарная классификация).

Два класса несбалансированы: 10000 для «NV» и 3000 для «MEL».

набор данных разбит на поезд и проверку.

папка проверки содержит 904 для 'MEL' и 2200 для 'NV'

папка обучения содержит 3600 для «MEL» и 7100 для «NV»

Я выполняю два решения проблемы: уменьшение выборки для набора обучения и проверки и Увеличение изображения только для набора обучения.

папка проверки теперь содержит 904 изображения для каждого класса.

папка поезда теперь содержит две подпапки 'MEL' и 'NV' с изображением 10000

enter image description here

используя keras, я настроил Densenet201 и использовал функцию предварительной обработки для densenet

из keras.applications.densenet import DenseNet201, preprocess_input

densenet_model = DenseNet201(input_shape=(224, 224, 3), include_top=False, weights="imagenet")

затем я добавляю несколько слоев в конце предварительно подготовленной модели

#get the last layer shape
last_layer = densenet_model.get_layer('relu')
print('last layer output shape:', last_layer.output_shape)
last_output = last_layer.output

# Flatten the output layer to 1 dimension
x = layers.GlobalMaxPooling2D()(last_output)
# Add a fully connected layer with 512 hidden units and ReLU activation
x = layers.Dense(512, activation='relu')(x)
# Add a dropout rate of 0.5
x = layers.Dropout(0.5)(x)
# Add a final sigmoid layer for classification
x = layers.Dense(2, activation='sigmoid')(x)
# Configure and compile the model
model = Model(densenet_model.input, x)

Я не замораживал ни один слой, затем компилирую модель

optimizer = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=None, 

decay=0.0, amsgrad=True)
model.compile(loss='binary_crossentropy',
              optimizer=optimizer,
              metrics=['accuracy','binary_accuracy'])

filepath = "densenet.h5"

# Declare a checkpoint to save the best version of the model
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1,
                             save_best_only=True, mode='max')

# Reduce the learning rate as the learning stagnates
reduce_lr = ReduceLROnPlateau(monitor='val_acc', factor=0.5, patience=2,
                              verbose=1, mode='max', min_lr=0.00001)

callbacks_list = [checkpoint, reduce_lr]

# Fit the model
history = model.fit_generator(train_batches,
                              steps_per_epoch=train_steps,
                              validation_data=val_batches,
                              validation_steps=val_steps,
                              epochs=20,
                              verbose=1,
                              callbacks=callbacks_list)   

Результат:

    Epoch 1/20
1701/1701 [==============================] - 793s 466ms/step - loss: 0.4436 - acc: 0.7890 - binary_accuracy: 0.7890 - val_loss: 0.3416 - val_acc: 0.8404 - val_binary_accuracy: 0.8404

Epoch 00001: val_acc improved from -inf to 0.84043, saving model to densenet.h5
Epoch 2/20
1701/1701 [==============================] - 720s 423ms/step - loss: 0.3447 - acc: 0.8450 - binary_accuracy: 0.8450 - val_loss: 0.3564 - val_acc: 0.8446 - val_binary_accuracy: 0.8446

Epoch 00002: val_acc improved from 0.84043 to 0.84458, saving model to densenet.h5
Epoch 3/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.2718 - acc: 0.8835 - binary_accuracy: 0.8835 - val_loss: 0.3785 - val_acc: 0.8487 - val_binary_accuracy: 0.8487

Epoch 00003: val_acc improved from 0.84458 to 0.84873, saving model to densenet.h5
Epoch 4/20
1701/1701 [==============================] - 726s 427ms/step - loss: 0.2051 - acc: 0.9172 - binary_accuracy: 0.9172 - val_loss: 0.3779 - val_acc: 0.8581 - val_binary_accuracy: 0.8581

Epoch 00004: val_acc improved from 0.84873 to 0.85813, saving model to densenet.h5
Epoch 5/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.1529 - acc: 0.9403 - binary_accuracy: 0.9403 - val_loss: 0.3923 - val_acc: 0.8581 - val_binary_accuracy: 0.8581

Epoch 00005: val_acc did not improve from 0.85813
Epoch 6/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.1163 - acc: 0.9553 - binary_accuracy: 0.9553 - val_loss: 0.4813 - val_acc: 0.8498 - val_binary_accuracy: 0.8498

Epoch 00006: val_acc did not improve from 0.85813

Epoch 00006: ReduceLROnPlateau reducing learning rate to 4.999999873689376e-05.
Epoch 7/20
1701/1701 [==============================] - 727s 427ms/step - loss: 0.0407 - acc: 0.9864 - binary_accuracy: 0.9864 - val_loss: 0.5726 - val_acc: 0.8667 - val_binary_accuracy: 0.8667

Epoch 00007: val_acc improved from 0.85813 to 0.86670, saving model to densenet.h5
Epoch 8/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.0287 - acc: 0.9904 - binary_accuracy: 0.9904 - val_loss: 0.5919 - val_acc: 0.8711 - val_binary_accuracy: 0.8711

Epoch 00008: val_acc improved from 0.86670 to 0.87113, saving model to densenet.h5
Epoch 9/20
1701/1701 [==============================] - 728s 428ms/step - loss: 0.0253 - acc: 0.9909 - binary_accuracy: 0.9909 - val_loss: 0.5453 - val_acc: 0.8720 - val_binary_accuracy: 0.8720

Epoch 00009: val_acc improved from 0.87113 to 0.87196, saving model to densenet.h5
Epoch 10/20
1701/1701 [==============================] - 730s 429ms/step - loss: 0.0216 - acc: 0.9927 - binary_accuracy: 0.9927 - val_loss: 0.5498 - val_acc: 0.8706 - val_binary_accuracy: 0.8706

Epoch 00010: val_acc did not improve from 0.87196
Epoch 11/20
1701/1701 [==============================] - 729s 428ms/step - loss: 0.0145 - acc: 0.9954 - binary_accuracy: 0.9954 - val_loss: 0.6332 - val_acc: 0.8822 - val_binary_accuracy: 0.8822

Epoch 00011: val_acc improved from 0.87196 to 0.88219, saving model to densenet.h5
Epoch 12/20
1701/1701 [==============================] - 731s 430ms/step - loss: 0.0176 - acc: 0.9939 - binary_accuracy: 0.9939 - val_loss: 0.6256 - val_acc: 0.8756 - val_binary_accuracy: 0.8756

Epoch 00012: val_acc did not improve from 0.88219
Epoch 13/20
1701/1701 [==============================] - 734s 432ms/step - loss: 0.0117 - acc: 0.9965 - binary_accuracy: 0.9965 - val_loss: 0.5959 - val_acc: 0.8838 - val_binary_accuracy: 0.8838

Epoch 00013: val_acc improved from 0.88219 to 0.88385, saving model to densenet.h5
Epoch 14/20
1701/1701 [==============================] - 736s 433ms/step - loss: 0.0132 - acc: 0.9958 - binary_accuracy: 0.9958 - val_loss: 0.7139 - val_acc: 0.8598 - val_binary_accuracy: 0.8598

Epoch 00014: val_acc did not improve from 0.88385
Epoch 15/20
1701/1701 [==============================] - 735s 432ms/step - loss: 0.0109 - acc: 0.9963 - binary_accuracy: 0.9963 - val_loss: 0.6139 - val_acc: 0.8720 - val_binary_accuracy: 0.8720

Epoch 00015: val_acc did not improve from 0.88385

Epoch 00015: ReduceLROnPlateau reducing learning rate to 2.499999936844688e-05.
Epoch 16/20
1701/1701 [==============================] - 734s 431ms/step - loss: 0.0048 - acc: 0.9980 - binary_accuracy: 0.9980 - val_loss: 0.6759 - val_acc: 0.8764 - val_binary_accuracy: 0.8764

Epoch 00016: val_acc did not improve from 0.88385
Epoch 17/20
1701/1701 [==============================] - 733s 431ms/step - loss: 0.0028 - acc: 0.9992 - binary_accuracy: 0.9992 - val_loss: 0.7179 - val_acc: 0.8805 - val_binary_accuracy: 0.8805

Epoch 00017: val_acc did not improve from 0.88385

Epoch 00017: ReduceLROnPlateau reducing learning rate to 1.249999968422344e-05.
Epoch 18/20
1701/1701 [==============================] - 734s 432ms/step - loss: 0.0014 - acc: 0.9996 - binary_accuracy: 0.9996 - val_loss: 0.7525 - val_acc: 0.8816 - val_binary_accuracy: 0.8816

Epoch 00018: val_acc did not improve from 0.88385
Epoch 19/20
1701/1701 [==============================] - 734s 431ms/step - loss: 0.0011 - acc: 0.9997 - binary_accuracy: 0.9997 - val_loss: 0.7580 - val_acc: 0.8803 - val_binary_accuracy: 0.8803

Epoch 00019: val_acc did not improve from 0.88385

Epoch 00019: ReduceLROnPlateau reducing learning rate to 1e-05.
Epoch 20/20
1701/1701 [==============================] - 733s 431ms/step - loss: 8.0918e-04 - acc: 0.9997 - binary_accuracy: 0.9997 - val_loss: 0.7667 - val_acc: 0.8800 - val_binary_accuracy: 0.8800

Epoch 00020: val_acc did not improve from 0.88385

Матрица путаницы:

enter image description here

Классификационный отчет от sklearn.metrics

                    precision    recall  f1-score   support

 MEL                 0.48      0.47      0.48       904
 NV                  0.48      0.49      0.49       904

accuracy                                 0.48      1808
macro avg            0.48      0.48      0.48      1808
weighted avg         0.48      0.48      0.48      1808

Дилемма

Как видите, это полная модель мусора. в чем проблема? Пожалуйста, любые предложения, это происходит в течение месяца без каких-либо улучшений. если вы хотите взглянуть на другие части кода, я опубликую их.

это код для увеличения

    class_list = ['MEL','NV']

    for item in class_list:

    # Create a temporary directory for the augmented images
    aug_dir = 'aug_dir'
    os.mkdir(aug_dir)

    # Create a directory within the base dir to store images of the same class
    img_dir = os.path.join(aug_dir, 'img_dir')
    os.mkdir(img_dir)

    # Choose a class
    img_class = item

    # List all the images in the directory
    img_list = os.listdir('base_dir/train_dir/' + img_class)

    # Copy images from the class train dir to the img_dir
    for fname in img_list:
        # source path to image
        src = os.path.join('base_dir/train_dir/' + img_class, fname)
        # destination path to image
        dst = os.path.join(img_dir, fname)
        # copy the image from the source to the destination
        shutil.copyfile(src, dst)

    # point to a dir containing the images and not to the images themselves
    path = aug_dir
    save_path = 'base_dir/train_dir/' + img_class

    # Create a data generator to augment the images in real time
    datagen = ImageDataGenerator(
        rotation_range=60,
        width_shift_range=0.1,
        height_shift_range=0.1,
        #zoom_range=0.1,
        shear_range= 0.2,
        horizontal_flip=True,
        vertical_flip=True,
        brightness_range=(0.9,1.1),
        fill_mode='nearest')

    batch_size = 50

    aug_datagen = datagen.flow_from_directory(path,
                                              save_to_dir=save_path,
                                              save_format='jpg',
                                              target_size=(224, 224),
                                              batch_size=batch_size)

    # Generate the augmented images and add them to the training folders
    num_aug_images_wanted = 10000  # total number of images we want to have in each class
    num_files = len(os.listdir(img_dir))
    num_batches = int(np.ceil((num_aug_images_wanted - num_files) / batch_size))

    # run the generator and create about 6000 augmented images
    for i in range(0, num_batches):
        imgs, labels = next(aug_datagen)

    # delete temporary directory with the raw image files
    shutil.rmtree('aug_dir')

код предварительной обработки

    # Declare a few useful values
num_train_samples = train_len
num_val_samples = val_len
train_batch_size = 16
val_batch_size = 100
image_height = 224
image_width = 224
# Declare how many steps are needed in an iteration
train_steps = np.ceil(num_train_samples / train_batch_size)
val_steps = np.ceil(num_val_samples / val_batch_size)
    # Set up generators

datagenr = ImageDataGenerator(
    preprocessing_function= \
    keras.applications.densenet.preprocess_input)


train_batches = datagenr.flow_from_directory(
    train_path,
    target_size=(image_height, image_width),
    batch_size=train_batch_size)

val_batches = datagenr.flow_from_directory(
    val_path,
    target_size=(image_height, image_width),
    batch_size=val_batch_size)

# Note: shuffle=False causes the test dataset to not be shuffled
test_batches = datagenr.flow_from_directory(
    val_path,
    target_size=(image_height, image_width),
    batch_size=val_batch_size,
    shuffle=False)
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...