Я новичок в нейронных сетях, и я пытаюсь создать модель, которая угадывает ранг / значение доменных имен. У меня есть список доменов с их рейтингом (от 10 до 4,9).
Во-первых, я добавляю некоторые показатели, такие как количество гласных и c.
Но тогда, когда модель тренируется, она показывает потери: nan и точность: 0,000 после первой эпохи. Я не уверен, где проблема. Буду признателен за любые советы. Я предполагаю, что одна из проблем заключается в том, что мой вывод не является двоичным.
from keras.layers import Dense
from keras.models import Sequential
import pandas as pd
import sklearn as sklearn
from sklearn.model_selection import train_test_split
import tld
TOP_DOMAINS_PATH = 'domainrank.csv'
domains_df = pd.read_csv(TOP_DOMAINS_PATH, nrows=100000)
# add features
domains_df['tld'] = domains_df['Domain'].apply(lambda x: tld.get_tld(x, fix_protocol=True,fail_silently=True))
domains_df['sld'] = domains_df['Domain'].apply(lambda x: getattr(tld.get_tld(x, fix_protocol=True, as_object=True,fail_silently=True),'domain',None))
domains_df['dots'] = domains_df['sld'].str.count('\.')
domains_df['vowels_count'] = domains_df['sld'].str.count('[aeiouy]')
domains_df['cons_count'] = domains_df['sld'].str.lower().str.count(r'[a-z]') - domains_df['vowels_count']
domains_df['length'] = domains_df['sld'].str.len()
domains_df['rank_normalized'] = domains_df['Open Page Rank'].apply(lambda x: x/10)
# remove not used columns
domains_df.pop('Open Page Rank')
domains_df.pop('Domain')
domains_df.pop('tld')
domains_df.pop('sld')
dataset = domains_df.values
X = dataset[:, 0:len(domains_df.columns) - 1, ]
Y = dataset[:, len(domains_df.columns) - 1]
min_max_scaler = sklearn.preprocessing.MinMaxScaler()
X_scale = min_max_scaler.fit_transform(X)
X_train, X_val_and_test, Y_train, Y_val_and_test = train_test_split(X_scale, Y, test_size=0.3)
X_val, X_test, Y_val, Y_test = train_test_split(X_val_and_test, Y_val_and_test, test_size=0.5)
model = Sequential([
Dense(32, activation='relu', input_shape=(5,)),
Dense(32, activation='relu'),
Dense(1, activation='sigmoid'),
])
model.compile(optimizer='sgd',
loss='binary_crossentropy',
metrics=['accuracy'])
hist = model.fit(X_train, Y_train,
batch_size=32, epochs=100,
validation_data=(X_val, Y_val))
После изменения domains_df
это выглядит так:
dots vowels_count cons_count length rank_normalized
0 0.0 1.0 4.0 5.0 1.00
1 0.0 4.0 4.0 8.0 1.00
2 0.0 2.0 5.0 7.0 1.00
3 0.0 5.0 2.0 7.0 1.00
4 0.0 3.0 3.0 6.0 1.00
... ... ... ... ...
99995 0.0 2.0 2.0 4.0 0.49
99996 0.0 6.0 10.0 18.0 0.49
99997 0.0 4.0 4.0 8.0 0.49
99998 0.0 6.0 10.0 16.0 0.49
99999 0.0 3.0 7.0 10.0 0.49
И вывод из обучения:
Epoch 1/100
32/70000 [..............................] - ETA: 2:05 - loss: 0.6929 - accuracy: 0.0000e+00
3264/70000 [>.............................] - ETA: 2s - loss: 0.6911 - accuracy: 3.0637e-04
6624/70000 [=>............................] - ETA: 1s - loss: 0.6903 - accuracy: 3.0193e-04
10080/70000 [===>..........................] - ETA: 1s - loss: 0.6899 - accuracy: 1.9841e-04
13472/70000 [====>.........................] - ETA: 1s - loss: 0.6896 - accuracy: 1.4846e-04
16832/70000 [======>.......................] - ETA: 0s - loss: 0.6895 - accuracy: 1.1882e-04
20384/70000 [=======>......................] - ETA: 0s - loss: 0.6894 - accuracy: 9.8116e-05
23808/70000 [=========>....................] - ETA: 0s - loss: 0.6892 - accuracy: 8.4005e-05
27328/70000 [==========>...................] - ETA: 0s - loss: 0.6892 - accuracy: 1.0978e-04
30784/70000 [============>.................] - ETA: 0s - loss: 0.6891 - accuracy: 9.7453e-05
34144/70000 [=============>................] - ETA: 0s - loss: 0.6891 - accuracy: 1.1715e-04
37536/70000 [===============>..............] - ETA: 0s - loss: 0.6890 - accuracy: 1.0656e-04
40992/70000 [================>.............] - ETA: 0s - loss: nan - accuracy: 9.7580e-05
44480/70000 [==================>...........] - ETA: 0s - loss: nan - accuracy: 8.9928e-05
47968/70000 [===================>..........] - ETA: 0s - loss: nan - accuracy: 8.3389e-05
51296/70000 [====================>.........] - ETA: 0s - loss: nan - accuracy: 7.7979e-05
54688/70000 [======================>.......] - ETA: 0s - loss: nan - accuracy: 7.3142e-05
58112/70000 [=======================>......] - ETA: 0s - loss: nan - accuracy: 6.8833e-05
61440/70000 [=========================>....] - ETA: 0s - loss: nan - accuracy: 6.5104e-05
64832/70000 [==========================>...] - ETA: 0s - loss: nan - accuracy: 6.1698e-05
68288/70000 [============================>.] - ETA: 0s - loss: nan - accuracy: 5.8575e-05
70000/70000 [==============================] - 1s 18us/step - loss: nan - accuracy: 5.7143e-05 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 2/100
32/70000 [..............................] - ETA: 3s - loss: nan - accuracy: 0.0000e+00
3392/70000 [>.............................] - ETA: 1s - loss: nan - accuracy: 0.0000e+00
6848/70000 [=>............................] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
10336/70000 [===>..........................] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
13792/70000 [====>.........................] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
17216/70000 [======>.......................] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
20544/70000 [=======>......................] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
23968/70000 [=========>....................] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
27456/70000 [==========>...................] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
30912/70000 [============>.................] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
34368/70000 [=============>................] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
37824/70000 [===============>..............] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
41280/70000 [================>.............] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
44672/70000 [==================>...........] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
48064/70000 [===================>..........] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
51392/70000 [=====================>........] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
54848/70000 [======================>.......] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
58208/70000 [=======================>......] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
61376/70000 [=========================>....] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
64384/70000 [==========================>...] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
67456/70000 [===========================>..] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
70000/70000 [==============================] - 1s 17us/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 3/100