Отказ от ответственности: новичок в Keras и Python.
Привет всем, я пытаюсь реализовать нейронную сеть в Keras, следуя спецификациям, представленным в этой статье: https://arxiv.org/pdf/1605.09507.pdf.
Прежде всегоУ меня есть некоторые сомнения относительно сетевой архитектуры (раздел III, подраздел B статьи).Фактически, выходная форма моей сети не совпадает с формой, представленной в Таблице I статьи, даже если я следовал спецификациям, написанным в подразделе B.
Вот код моей сети:
filtersNumber = 32
filtersReceptiveField = (3, 3)
filtersStride = (1, 1)
maxpoolSize = (3, 3)
maxpoolStride = (1, 1)
layersNumber = 4
myActivation = 'relu'
inputShape = (128,43,1)
classesNumber = 11
def myActivationFunction(model, activation):
if activation == 'tanh' or activation == 'relu':
model.add(Activation(activation))
elif activation == 'prelu':
model.add(PReLU())
elif activation == 'lrelu_0.01':
model.add(LeakyReLU(alpha=0.01))
elif activation == 'lrelu_0.33':
model.add(LeakyReLU(alpha=0.33))
return model
model = Sequential()
for index in range(layersNumber):
if index == 0:
model.add(Conv2D(filtersNumber,filtersReceptiveField,strides=filtersStride,padding='same',input_shape=inputShape))
else:
model.add(Conv2D(filtersNumber,filtersReceptiveField,strides=filtersStride, padding='same'))
model = myActivationFunction(model, myActivation)
model.add(Conv2D(filtersNumber,filtersReceptiveField,strides=filtersStride, padding='same'))
model = myActivationFunction(model,myActivation)
if index != (layersNumber-1):
model.add(MaxPooling2D(pool_size=maxpoolSize,strides=maxpoolStride))
model.add(Dropout(0.25))
filtersNumber = filtersNumber*2
else:
model.add(GlobalMaxPooling2D())
model.add(Dense(1024))
model = myActivationFunction(model, myActivation)
model.add(Dropout(0.50))
model.add(Dense(classesNumber))
model.add(Activation('sigmoid'))
model.summary()
А вот модель.summary ():
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 128, 43, 32) 320
_________________________________________________________________
activation_1 (Activation) (None, 128, 43, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 128, 43, 32) 9248
_________________________________________________________________
activation_2 (Activation) (None, 128, 43, 32) 0
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 126, 41, 32) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 126, 41, 32) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 126, 41, 64) 18496
_________________________________________________________________
activation_3 (Activation) (None, 126, 41, 64) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 126, 41, 64) 36928
_________________________________________________________________
activation_4 (Activation) (None, 126, 41, 64) 0
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 124, 39, 64) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 124, 39, 64) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 124, 39, 128) 73856
_________________________________________________________________
activation_5 (Activation) (None, 124, 39, 128) 0
_________________________________________________________________
conv2d_6 (Conv2D) (None, 124, 39, 128) 147584
_________________________________________________________________
activation_6 (Activation) (None, 124, 39, 128) 0
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 122, 37, 128) 0
_________________________________________________________________
dropout_3 (Dropout) (None, 122, 37, 128) 0
_________________________________________________________________
conv2d_7 (Conv2D) (None, 122, 37, 256) 295168
_________________________________________________________________
activation_7 (Activation) (None, 122, 37, 256) 0
_________________________________________________________________
conv2d_8 (Conv2D) (None, 122, 37, 256) 590080
_________________________________________________________________
activation_8 (Activation) (None, 122, 37, 256) 0
_________________________________________________________________
global_max_pooling2d_1 (Glob (None, 256) 0
_________________________________________________________________
dense_1 (Dense) (None, 1024) 263168
_________________________________________________________________
activation_9 (Activation) (None, 1024) 0
_________________________________________________________________
dropout_4 (Dropout) (None, 1024) 0
_________________________________________________________________
dense_2 (Dense) (None, 11) 11275
_________________________________________________________________
activation_10 (Activation) (None, 11) 0
=================================================================
Total params: 1,446,123
Trainable params: 1,446,123
Non-trainable params: 0
_________________________________________________________________
После некоторых попыток мне удалось получить точно такие же размеры таблицы I с помощью следующего кода.Как видите, мне пришлось вставлять дополнительный слой заполнения нулями перед каждым слоем свертки в дополнение к "padding = same", и мне пришлось удалить шаг максимального пула (это означает, что шаг максимального пула по умолчанию будет равен размеру пула).согласно документации keras).
filtersNumber = 32
filtersReceptiveField = (3, 3)
filtersStride = (1, 1)
maxpoolSize = (3, 3)
maxpoolStride = (1, 1)
zeroPadding = (1, 1)
layersNumber = 4
myActivation = 'relu'
inputShape = (128,43,1)
classesNumber = 11
def myActivationFunction(model, activation):
if activation == 'tanh' or activation == 'relu':
model.add(Activation(activation))
elif activation == 'prelu':
model.add(PReLU())
elif activation == 'lrelu_0.01':
model.add(LeakyReLU(alpha=0.01))
elif activation == 'lrelu_0.33':
model.add(LeakyReLU(alpha=0.33))
return model
model = Sequential()
for index in range(layersNumber):
if index == 0: # for the first layer, specify input shape
model.add(ZeroPadding2D(zeroPadding,input_shape=inputShape))
else:
model.add(ZeroPadding2D(zeroPadding))
model.add(Conv2D(filtersNumber, filtersReceptiveField, strides=filtersStride, padding='same'))
model = myActivationFunction(model, myActivation)
model.add(ZeroPadding2D(zeroPadding))
model.add(Conv2D(filtersNumber,filtersReceptiveField,strides=filtersStride, padding='same'))
model = myActivationFunction(model, myActivation)
if index != (layersNumber-1):
model.add(MaxPooling2D(pool_size=maxpoolSize))
model.add(Dropout(0.25))
filtersNumber = filtersNumber*2
else: # for the last layer
model.add(GlobalMaxPooling2D())
model.add(Dense(1024))
model = myActivationFunction(model, myActivation)
model.add(Dropout(0.50))
model.add(Dense(classesNumber))
model.add(Activation('sigmoid'))
model.summary()
Первый вопрос : Разве «padding = same» не должно быть достаточно для заполнения нулями, учитывая, что автор говорит «входные данные для каждого слоя свертки»дополняется нулями до 1 × 1 для сохранения пространственного разрешения "?Max-pool stepde = 1 - ошибка автора или я что-то упустил?
Кстати, используя эти новые спецификации, я пытался обучить сеть, но, к сожалению, потери и val_loss не изменились, и «поезд был остановлен, потому что val_loss не уменьшался более трех эпох", как указано в документе.
Train on 5699 samples, validate on 1006 samples
Epoch 1/1000
5699/5699 [==============================] - 559s 98ms/step - loss: 2.4453 - acc: 0.0635 - val_loss: 2.3979 - val_acc: 0.0447
Epoch 2/1000
5699/5699 [==============================] - 583s 102ms/step - loss: 2.9140 - acc: 0.0602 - val_loss: 3.4699 - val_acc: 0.0447
Epoch 3/1000
5699/5699 [==============================] - 571s 100ms/step - loss: 3.4037 - acc: 0.0604 - val_loss: 3.4699 - val_acc: 0.0447
Epoch 4/1000
5699/5699 [==============================] - 592s 104ms/step - loss: 4.2809 - acc: 0.0598 - val_loss: 4.5773 - val_acc: 0.0447
Вот мой тренировочный код (спецификации взяты из подраздела C раздела III):
import numpy as np
import os
import keras
from keras.models import Sequential
from keras.layers import ZeroPadding2D,Conv2D,Activation,MaxPooling2D,Dropout,GlobalMaxPooling2D,Dense,PReLU,LeakyReLU
from keras.callbacks import EarlyStopping
def myActivationFunction(model, convAct):
if convAct == 'tanh' or convAct == 'relu':
model.add(Activation(convAct))
elif convAct == 'prelu':
model.add(PReLU())
elif convAct == 'lrelu_0.01':
model.add(LeakyReLU(alpha=0.01))
elif convAct == 'lrelu_0.33':
model.add(LeakyReLU(alpha=0.33))
return model
def buildCNN(inputShape, classesNumber, myActivation):
# Paper: section III, subsection B: Network Architecture
filtersNumber = 32
filtersReceptiveField = (3, 3)
filtersStride = (1, 1)
zeroPadding = (1, 1)
maxpoolSize = (3, 3)
maxpoolStride = (1, 1)
layersNumber = 4
model = Sequential()
for index in range(layersNumber):
if index == 0:
model.add(ZeroPadding2D(zeroPadding, input_shape=inputShape))
else:
model.add(ZeroPadding2D(zeroPadding))
model.add(Conv2D(filtersNumber, filtersReceptiveField, strides=filtersStride, padding='same'))
model = myActivationFunction(model, myActivation)
model.add(ZeroPadding2D(zeroPadding))
model.add(Conv2D(filtersNumber, filtersReceptiveField, strides=filtersStride, padding='same'))
model = myActivationFunction(model, myActivation)
if index != (layersNumber - 1):
model.add(MaxPooling2D(
pool_size=maxpoolSize))
model.add(Dropout(0.25))
filtersNumber = filtersNumber * 2
else:
model.add(GlobalMaxPooling2D())
model.add(Dense(1024))
model = myActivationFunction(model, myActivation)
model.add(Dropout(0.50))
model.add(Dense(classesNumber))
model.add(Activation('sigmoid'))
return model
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser(description="Trains the network using training dataset")
parser.add_argument("-w", "--window", type=float, default=3.0, choices=[0.5, 1.0, 1.5, 3.0],
help="Analysis window size. Choose from 0.5, 1.0, 1.5, 3.0. Default: 1.0")
parser.add_argument("-t", "--threshold", type=float, default=0.55, choices=[0.20, 0.25, 0.30, 0.35, 0.40, 0.45,
0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80],
metavar="[0.20:0.05:0.80]",
help="Identification threshold. Choose from 0.20 to 0.80 (step size 0.05). Default: 0.55")
parser.add_argument("-a", metavar=" ", default="relu",
choices=["tanh", "relu", "prelu", "lrelu_0.01", "lrelu_0.33"],
help="activation function. Choose from tanh, relu, prelu, lrelu_0.01, lrelu_0.33. Default: relu")
parser.add_argument("-p", "--path", default="Preproc", help="path of preprocessed files (default: Preproc)")
args = parser.parse_args()
X_train = np.load(args.path+"/X_train_"+str(args.window)+"s.npy")
Y_train = np.load(args.path+"/Y_train_"+str(args.window)+"s.npy")
batchSize = 128
epochsNum = 1000
model = buildCNN((X_train.shape[1],X_train.shape[2],X_train.shape[3]),Y_train.shape[1], args.a)
# model.summary()
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
earlyStopping = EarlyStopping(monitor='val_loss', patience=3)
model.fit(X_train, Y_train, batch_size=batchSize, epochs=epochsNum, validation_split=0.15, callbacks=[earlyStopping])
В этот момент я подумал, что, возможно, естьчто-то не так в предварительной обработке данных тренировок, но после нескольких проверок мне не удалось обнаружить ошибку.Вот мой код предварительной обработки (вы можете найти спецификации в разделе III, подраздел A статьи):
import librosa
import librosa.display
import numpy as np
import os
import shutil
import keras
# import matplotlib.pyplot as plt
# Paper: section III, subsection A: Audio Preprocessing
def preprocess_dataset(input_path, output_path):
for root, directories, filenames in os.walk(input_path):
for directory in directories:
if not os.path.exists(output_path + os.path.join(os.path.relpath(root, input_path), directory)):
os.makedirs(output_path + os.path.join(os.path.relpath(root, input_path), directory))
else:
return
for filename in filenames:
if filename.endswith(".wav"):
audio_signal, sample_rate = librosa.load(
os.path.join(root, filename)) # audio is mixed to mono and resampled to 22050 Hz
normalized_audio_signal = librosa.util.normalize(audio_signal) # audio normalization by its max value
# Compute mel-spectrogram with the following specs:
# - STFT window lenght: 1024 samples
# - hop size: 512 samples
# - mel frequency bins: 128
mel_spect = librosa.feature.melspectrogram(normalized_audio_signal, sample_rate, n_fft=1024, hop_length=512,
n_mels=128)
log_mel_spect = np.log(np.maximum(1e-10, mel_spect)) # add a threshold to avoid -inf results
log_mel_spect = log_mel_spect[:,:,np.newaxis] # add new axis for keras channel last mode
filename, fileExtension = os.path.splitext(filename) # split file name from extension
np.save(output_path + os.path.join(os.path.relpath(root, input_path), filename), log_mel_spect) # save as .npy file
# librosa.display.specshow(log_mel_spect, y_axis='mel', x_axis='time')
# plt.show()
elif filename.endswith(".txt"): # copy files containing testing labels
shutil.copy(os.path.join(root, filename), output_path + os.path.join(os.path.relpath(root, input_path), filename))
def training_vectors_init(training_path, chunks_numb):
classes_names = sorted(os.listdir(training_path))
total_classes = len(classes_names)
audio_path = training_path + classes_names[0] + '/'
infilename = os.listdir(audio_path)[0]
melgram = np.load(audio_path + infilename)
melgram_dimensions = melgram.shape
for dirpath, dirnames, filenames in os.walk(training_path):
total_training_files = total_training_files + len(filenames)
melgram_chunk_length = int(melgram_dimensions[1] / chunks_numb)
x_train = np.zeros(((total_training_files * chunks_numb), melgram_dimensions[0], melgram_chunk_length, melgram_dimensions[2]))
y_train = np.zeros(((total_training_files * chunks_numb), total_classes))
return classes_names,total_classes,x_train,y_train,melgram_chunk_length
def shuffle_xy(x, y):
assert x.shape[0] == y.shape[0], "Dimensions problem"
idx = np.array(range(y.shape[0]))
np.random.shuffle(idx)
new_x = np.copy(x)
new_y = np.copy(y)
for i in range(len(idx)):
new_x[i] = x[idx[i], :, :, :]
new_y[i] = y[idx[i], :]
return new_x, new_y
def build_training_dataset(preproc_path, training_win_len):
training_path = preproc_path + "Training/"
training_audio_length = 3 # training audio length (seconds)
chunks_numb = int(training_audio_length / training_win_len)
classes_names,total_classes,x_train,y_train,melgram_chunk_length = training_vectors_init(training_path, chunks_numb)
count = 0
for class_index, class_name in enumerate(classes_names):
one_hot_label = keras.utils.to_categorical(class_index,
num_classes=total_classes)
file_names = os.listdir(training_path + class_name)
for file_name in file_names:
audio_path = training_path + class_name + '/' + file_name
mel = np.load(audio_path)
for i in range(chunks_numb):
x_train[count,:,:,:] = mel[:,(melgram_chunk_length*i):(melgram_chunk_length*(i+1)),:]
y_train[count,:] = one_hot_label
count = count + 1
x_train, y_train = shuffle_xy(x_train, y_train)
np.save(preproc_path + "X_train_" + str(training_win_len) + 's', x_train)
np.save(preproc_path + "Y_train_" + str(training_win_len) + 's', y_train)
return melgram_chunk_length,classes_names
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser(
description="preprocess_data: convert samples to .npy data format for faster loading")
parser.add_argument("-i", "--inpath", help="input directory for audio samples (default: IRMAS-Sample)",
default="IRMAS-Sample")
parser.add_argument("-o", "--outpath", help="output directory for preprocessed files (default: Preproc)",
default="Preproc")
args = parser.parse_args()
preprocess_dataset(args.inpath + '/', args.outpath + '/')
winLengths = [0.5, 1.0, 1.5, 3.0]
for winLen in winLengths:
melChunkLen, classesNames = build_training_dataset(args.outpath + '/', winLen)
...
Второй вопрос : в чем может быть проблема сети?Я также попытался обучить сеть с несколькими выборками и использовать те же образцы, что и данные проверки, но val_loss остается неизменным, как вы можете видеть здесь.
Epoch 1/1000
3/3 [==============================] - 1s 256ms/step - loss: 2.3653 - acc: 0.3333 - val_loss: 2.1726 - val_acc: 0.3333
Epoch 2/1000
3/3 [==============================] - 0s 108ms/step - loss: 2.0382 - acc: 0.3333 - val_loss: 1.5727 - val_acc: 0.3333
Epoch 3/1000
3/3 [==============================] - 0s 104ms/step - loss: 1.3635 - acc: 0.3333 - val_loss: 1.1036 - val_acc: 0.6667
Epoch 4/1000
3/3 [==============================] - 0s 109ms/step - loss: 1.1281 - acc: 0.3333 - val_loss: 1.0986 - val_acc: 0.3333
Epoch 5/1000
3/3 [==============================] - 0s 102ms/step - loss: 1.0986 - acc: 0.6667 - val_loss: 1.0986 - val_acc: 0.3333
Epoch 6/1000
3/3 [==============================] - 0s 104ms/step - loss: 1.0986 - acc: 0.3333 - val_loss: 1.0986 - val_acc: 0.3333
Кто-нибудь знает, что происходит в этой сети?