Керас Н.Н. - угадывание ранга доменного имени (потеря = нан) - PullRequest
0 голосов
/ 16 марта 2020

Я новичок в нейронных сетях, и я пытаюсь создать модель, которая угадывает ранг / значение доменных имен. У меня есть список доменов с их рейтингом (от 10 до 4,9).

Во-первых, я добавляю некоторые показатели, такие как количество гласных и c.

Но тогда, когда модель тренируется, она показывает потери: nan и точность: 0,000 после первой эпохи. Я не уверен, где проблема. Буду признателен за любые советы. Я предполагаю, что одна из проблем заключается в том, что мой вывод не является двоичным.

from keras.layers import Dense
from keras.models import Sequential
import pandas as pd
import sklearn as sklearn
from sklearn.model_selection import train_test_split
import tld

TOP_DOMAINS_PATH = 'domainrank.csv'
domains_df = pd.read_csv(TOP_DOMAINS_PATH, nrows=100000)

# add features
domains_df['tld'] = domains_df['Domain'].apply(lambda x: tld.get_tld(x, fix_protocol=True,fail_silently=True))
domains_df['sld'] = domains_df['Domain'].apply(lambda x: getattr(tld.get_tld(x, fix_protocol=True, as_object=True,fail_silently=True),'domain',None))
domains_df['dots'] = domains_df['sld'].str.count('\.')
domains_df['vowels_count'] = domains_df['sld'].str.count('[aeiouy]')
domains_df['cons_count'] = domains_df['sld'].str.lower().str.count(r'[a-z]') - domains_df['vowels_count']
domains_df['length'] = domains_df['sld'].str.len()
domains_df['rank_normalized'] = domains_df['Open Page Rank'].apply(lambda x: x/10)

# remove not used columns
domains_df.pop('Open Page Rank')
domains_df.pop('Domain')
domains_df.pop('tld')
domains_df.pop('sld')

dataset = domains_df.values
X = dataset[:, 0:len(domains_df.columns) - 1, ]
Y = dataset[:, len(domains_df.columns) - 1]

min_max_scaler = sklearn.preprocessing.MinMaxScaler()
X_scale = min_max_scaler.fit_transform(X)
X_train, X_val_and_test, Y_train, Y_val_and_test = train_test_split(X_scale, Y, test_size=0.3)
X_val, X_test, Y_val, Y_test = train_test_split(X_val_and_test, Y_val_and_test, test_size=0.5)

model = Sequential([
    Dense(32, activation='relu', input_shape=(5,)),
    Dense(32, activation='relu'),
    Dense(1, activation='sigmoid'),
])

model.compile(optimizer='sgd',
              loss='binary_crossentropy',
              metrics=['accuracy'])

hist = model.fit(X_train, Y_train,
          batch_size=32, epochs=100,
          validation_data=(X_val, Y_val))

После изменения domains_df это выглядит так:

       dots  vowels_count  cons_count  length  rank_normalized
0       0.0           1.0         4.0     5.0             1.00
1       0.0           4.0         4.0     8.0             1.00
2       0.0           2.0         5.0     7.0             1.00
3       0.0           5.0         2.0     7.0             1.00
4       0.0           3.0         3.0     6.0             1.00
     ...           ...         ...     ...              ...
99995   0.0           2.0         2.0     4.0             0.49
99996   0.0           6.0        10.0    18.0             0.49
99997   0.0           4.0         4.0     8.0             0.49
99998   0.0           6.0        10.0    16.0             0.49
99999   0.0           3.0         7.0    10.0             0.49

И вывод из обучения:

Epoch 1/100
   32/70000 [..............................] - ETA: 2:05 - loss: 0.6929 - accuracy: 0.0000e+00
 3264/70000 [>.............................] - ETA: 2s - loss: 0.6911 - accuracy: 3.0637e-04  
 6624/70000 [=>............................] - ETA: 1s - loss: 0.6903 - accuracy: 3.0193e-04
10080/70000 [===>..........................] - ETA: 1s - loss: 0.6899 - accuracy: 1.9841e-04
13472/70000 [====>.........................] - ETA: 1s - loss: 0.6896 - accuracy: 1.4846e-04
16832/70000 [======>.......................] - ETA: 0s - loss: 0.6895 - accuracy: 1.1882e-04
20384/70000 [=======>......................] - ETA: 0s - loss: 0.6894 - accuracy: 9.8116e-05
23808/70000 [=========>....................] - ETA: 0s - loss: 0.6892 - accuracy: 8.4005e-05
27328/70000 [==========>...................] - ETA: 0s - loss: 0.6892 - accuracy: 1.0978e-04
30784/70000 [============>.................] - ETA: 0s - loss: 0.6891 - accuracy: 9.7453e-05
34144/70000 [=============>................] - ETA: 0s - loss: 0.6891 - accuracy: 1.1715e-04
37536/70000 [===============>..............] - ETA: 0s - loss: 0.6890 - accuracy: 1.0656e-04
40992/70000 [================>.............] - ETA: 0s - loss: nan - accuracy: 9.7580e-05   
44480/70000 [==================>...........] - ETA: 0s - loss: nan - accuracy: 8.9928e-05
47968/70000 [===================>..........] - ETA: 0s - loss: nan - accuracy: 8.3389e-05
51296/70000 [====================>.........] - ETA: 0s - loss: nan - accuracy: 7.7979e-05
54688/70000 [======================>.......] - ETA: 0s - loss: nan - accuracy: 7.3142e-05
58112/70000 [=======================>......] - ETA: 0s - loss: nan - accuracy: 6.8833e-05
61440/70000 [=========================>....] - ETA: 0s - loss: nan - accuracy: 6.5104e-05
64832/70000 [==========================>...] - ETA: 0s - loss: nan - accuracy: 6.1698e-05
68288/70000 [============================>.] - ETA: 0s - loss: nan - accuracy: 5.8575e-05
70000/70000 [==============================] - 1s 18us/step - loss: nan - accuracy: 5.7143e-05 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 2/100
   32/70000 [..............................] - ETA: 3s - loss: nan - accuracy: 0.0000e+00
 3392/70000 [>.............................] - ETA: 1s - loss: nan - accuracy: 0.0000e+00
 6848/70000 [=>............................] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
10336/70000 [===>..........................] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
13792/70000 [====>.........................] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
17216/70000 [======>.......................] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
20544/70000 [=======>......................] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
23968/70000 [=========>....................] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
27456/70000 [==========>...................] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
30912/70000 [============>.................] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
34368/70000 [=============>................] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
37824/70000 [===============>..............] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
41280/70000 [================>.............] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
44672/70000 [==================>...........] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
48064/70000 [===================>..........] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
51392/70000 [=====================>........] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
54848/70000 [======================>.......] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
58208/70000 [=======================>......] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
61376/70000 [=========================>....] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
64384/70000 [==========================>...] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
67456/70000 [===========================>..] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
70000/70000 [==============================] - 1s 17us/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 3/100
...