Мне дали данные по озону, NO, NO2 и CO за несколько лет для работы. Задача состоит в том, чтобы использовать эти данные для прогнозирования ценности озона. Предположим, у меня есть данные за 2015, 2016, 2018 и 2019 годы. Мне нужно спрогнозировать значение озона на 2019 год, используя данные 2015,2016,2018, которые есть у меня.
Формат данных записывается ежечасно и присутствует в форма месяцев изображение . Итак, в этом формате данные присутствуют.
Что я сделал: Прежде всего, данные о годах в одном файле Excel, который содержит 4 столбца NO, NO2, CO, O3. И добавлял все данные месяц за месяцем. Итак, это основной файл, который был использован Прикрепленное изображение
Я использовал python. Сначала необходимо очистить данные. Позвольте мне немного объяснить. Нет, No2 и CO являются предшественниками озона, что означает, что образование газообразного озона зависит от этих газов, и данные должны быть очищены заранее, и ограничения заключались в том, чтобы удалить любое отрицательное значение и удалить всю строку, включая другие столбцы, поэтому, если какой-либо из значения Озона, Нет, NO2 и CO недействительны, мы должны удалить всю строку и не считать ее. И данные содержат некоторый строковый формат, который также необходимо удалить. Все было сделано. Затем я применил регрессор MLP из sk learn Вот код, который я сделал.
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import explained_variance_score
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_absolute_error
import pandas as pd
import matplotlib.pyplot as plt
bugs = ['NOx', '* 43.3', '* 312', '11/19', '11/28', '06:00', '09/30', '09/04', '14:00', '06/25', '07:00', '06/02',
'17:00', '04/10', '04/17', '18:00', '02/26', '02/03', '01:00', '11/23', '15:00', '11/12', '24:00', '09/02',
'16:00', '09/28', '* 16.8', '* 121', '12:00', '06/24', '13:00', '06/26', 'Span', 'NoData', 'ppb', 'Zero',
'Samp<', 'RS232']
dataset = pd.read_excel("Testing.xlsx")
dataset = pd.DataFrame(dataset).replace(bugs, 0)
dataset.dropna(subset=["O3"], inplace=True)
dataset.dropna(subset=["NO"], inplace=True)
dataset.dropna(subset=["NO2"], inplace=True)
dataset.dropna(subset=["CO"], inplace=True)
dataset.drop(dataset[dataset['O3'] < 1].index, inplace=True)
dataset.drop(dataset[dataset['O3'] > 160].index, inplace=True)
dataset.drop(dataset[dataset['O3'] == 0].index, inplace=True)
dataset.drop(dataset[dataset['NO'] < 1].index, inplace=True)
dataset.drop(dataset[dataset['NO'] > 160].index, inplace=True)
dataset.drop(dataset[dataset['NO'] == 0].index, inplace=True)
dataset.drop(dataset[dataset['NO2'] < 1].index, inplace=True)
dataset.drop(dataset[dataset['NO2'] > 160].index, inplace=True)
dataset.drop(dataset[dataset['NO2'] == 0].index, inplace=True)
dataset.drop(dataset[dataset['CO'] < 1].index, inplace=True)
dataset.drop(dataset[dataset['CO'] > 4000].index, inplace=True)
dataset.drop(dataset[dataset['CO'] == 0].index, inplace=True)
dataset = dataset.reset_index()
dataset = dataset.drop(['index'], axis=1)
X = dataset[["NO", "NO2", "CO"]].astype(int)
Y = dataset[["O3"]].astype(int)
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.05, random_state=27)
sc_x = StandardScaler()
X_train = sc_x.fit_transform(X_train)
X_test = sc_x.fit_transform(X_test)
clf = MLPRegressor(hidden_layer_sizes=(100,100,100), max_iter=10000,verbose=True,random_state=8)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(explained_variance_score(y_test, y_pred))
print(mean_absolute_error(y_test, y_pred))
y_test = pd.DataFrame(y_test)
y_test = y_test.reset_index(0)
y_test = y_test.drop(['index'], axis=1)
# y_test = y_test.drop([19,20],axis=0)
y_pred = pd.DataFrame(y_pred)
y_pred = y_pred.shift(-1)
# y_pred = y_pred.drop([19,20],axis=0)
plt.figure(figsize=(10, 5))
plt.plot(y_pred, color='r', label='PredictedO3')
plt.plot(y_test, color='g', label='OriginalO3')
plt.legend()
plt.show()
Консоль:
y = column_or_1d(y, warn=True)
Iteration 1, loss = 537.59597297
Iteration 2, loss = 185.33662023
Iteration 3, loss = 159.32122111
Iteration 4, loss = 156.71612690
Iteration 5, loss = 155.05307865
Iteration 6, loss = 154.59351630
Iteration 7, loss = 154.16687592
Iteration 8, loss = 153.69258698
Iteration 9, loss = 153.36140320
Iteration 10, loss = 152.94593665
Iteration 11, loss = 152.75124494
Iteration 12, loss = 152.73893578
Iteration 13, loss = 152.27131771
Iteration 14, loss = 152.08732297
Iteration 15, loss = 151.83197245
Iteration 16, loss = 151.29399626
Iteration 17, loss = 150.96425147
Iteration 18, loss = 150.47673257
Iteration 19, loss = 150.14353009
Iteration 20, loss = 149.74165931
Iteration 21, loss = 149.39158575
Iteration 22, loss = 149.28863163
Iteration 23, loss = 148.95356802
Iteration 24, loss = 148.82618770
Iteration 25, loss = 148.18070387
Iteration 26, loss = 147.79069739
Iteration 27, loss = 147.03057672
Iteration 28, loss = 146.77822749
Iteration 29, loss = 146.47159952
Iteration 30, loss = 145.77185465
Iteration 31, loss = 145.54493110
Iteration 32, loss = 145.58297196
Iteration 33, loss = 145.05848640
Iteration 34, loss = 144.73301133
Iteration 35, loss = 144.04886503
Iteration 36, loss = 143.82328142
Iteration 37, loss = 143.87060411
Iteration 38, loss = 143.84762507
Iteration 39, loss = 142.64434158
Iteration 40, loss = 142.63539287
Iteration 41, loss = 142.55569644
Iteration 42, loss = 142.33659309
Iteration 43, loss = 142.08105262
Iteration 44, loss = 141.84181483
Iteration 45, loss = 143.50650508
Iteration 46, loss = 141.34511656
Iteration 47, loss = 141.26444355
Iteration 48, loss = 140.37034198
Iteration 49, loss = 140.15212097
Iteration 50, loss = 140.21204360
Iteration 51, loss = 140.01652524
Iteration 52, loss = 139.55019562
Iteration 53, loss = 139.96862367
Iteration 54, loss = 139.18904418
Iteration 55, loss = 138.96940532
Iteration 56, loss = 138.74715169
Iteration 57, loss = 138.42219317
Iteration 58, loss = 138.87739582
Iteration 59, loss = 138.48879907
Iteration 60, loss = 138.32348064
Iteration 61, loss = 138.25489777
Iteration 62, loss = 137.35913024
Iteration 63, loss = 137.34553482
Iteration 64, loss = 137.81499126
Iteration 65, loss = 137.24418131
Iteration 66, loss = 138.22142987
Iteration 67, loss = 136.68683284
Iteration 68, loss = 136.80873025
Iteration 69, loss = 136.89557260
Iteration 70, loss = 137.78914828
Iteration 71, loss = 136.39181767
Iteration 72, loss = 136.90698714
Iteration 73, loss = 136.15180171
Iteration 74, loss = 136.29621913
Iteration 75, loss = 136.54671797
Iteration 76, loss = 136.17984691
Iteration 77, loss = 135.46193871
Iteration 78, loss = 135.72399747
Iteration 79, loss = 135.66833438
Iteration 80, loss = 135.59829106
Iteration 81, loss = 134.89759461
Iteration 82, loss = 135.13978950
Iteration 83, loss = 135.13023951
Iteration 84, loss = 134.74279949
Iteration 85, loss = 135.81422214
Iteration 86, loss = 134.91660517
Iteration 87, loss = 134.42552779
Iteration 88, loss = 134.69309963
Iteration 89, loss = 135.12116240
Iteration 90, loss = 134.58731261
Iteration 91, loss = 135.03610330
Iteration 92, loss = 135.49753508
Iteration 93, loss = 134.34645918
Iteration 94, loss = 133.73179994
Iteration 95, loss = 133.63077367
Iteration 96, loss = 133.77330604
Iteration 97, loss = 134.34313391
Iteration 98, loss = 133.89467176
Iteration 99, loss = 134.16270723
Iteration 100, loss = 133.69654234
Iteration 101, loss = 134.06460647
Iteration 102, loss = 133.67570066
Iteration 103, loss = 133.51941546
Iteration 104, loss = 134.44514524
Iteration 105, loss = 133.77755818
Iteration 106, loss = 133.45007788
Iteration 107, loss = 133.07441490
Iteration 108, loss = 134.99803516
Iteration 109, loss = 133.80158058
Iteration 110, loss = 132.86973595
Iteration 111, loss = 132.95281816
Iteration 112, loss = 132.55546679
Iteration 113, loss = 133.89665148
Iteration 114, loss = 132.92319206
Iteration 115, loss = 133.02169313
Iteration 116, loss = 133.23774543
Iteration 117, loss = 132.03027124
Iteration 118, loss = 133.18472212
Iteration 119, loss = 132.34502179
Iteration 120, loss = 132.55417269
Iteration 121, loss = 132.43373273
Iteration 122, loss = 132.26810570
Iteration 123, loss = 133.17705777
Iteration 124, loss = 133.58044956
Iteration 125, loss = 132.12074893
Iteration 126, loss = 131.93800952
Iteration 127, loss = 132.30641181
Iteration 128, loss = 131.81882504
Iteration 129, loss = 132.06413592
Iteration 130, loss = 132.24680375
Iteration 131, loss = 132.12261129
Iteration 132, loss = 132.35714616
Iteration 133, loss = 131.90862418
Iteration 134, loss = 131.73195382
Iteration 135, loss = 131.55302493
Iteration 136, loss = 131.41382323
Iteration 137, loss = 131.62962730
Iteration 138, loss = 132.49231086
Iteration 139, loss = 131.14651158
Iteration 140, loss = 131.46236192
Iteration 141, loss = 131.36319145
Iteration 142, loss = 131.87374996
Iteration 143, loss = 132.08955722
Iteration 144, loss = 131.28997320
Iteration 145, loss = 131.35961909
Iteration 146, loss = 131.20954288
Iteration 147, loss = 131.99304728
Iteration 148, loss = 130.76432171
Iteration 149, loss = 131.42775156
Iteration 150, loss = 131.05940000
Iteration 151, loss = 131.28351430
Iteration 152, loss = 130.74260322
Iteration 153, loss = 130.88466712
Iteration 154, loss = 131.03646775
Iteration 155, loss = 130.34557661
Iteration 156, loss = 130.83447199
Iteration 157, loss = 131.28845939
Iteration 158, loss = 130.65785044
Iteration 159, loss = 130.61223056
Iteration 160, loss = 131.07589679
Iteration 161, loss = 130.64325675
Iteration 162, loss = 129.70704922
Iteration 163, loss = 129.84506370
Iteration 164, loss = 130.61988464
Iteration 165, loss = 130.43265567
Iteration 166, loss = 130.88822404
Iteration 167, loss = 130.76778201
Iteration 168, loss = 130.64819084
Iteration 169, loss = 130.28019987
Iteration 170, loss = 129.95417212
Iteration 171, loss = 131.06510048
Iteration 172, loss = 131.21377407
Iteration 173, loss = 130.17368709
Training loss did not improve more than tol=0.000100 for 10 consecutive epochs. Stopping.
0.2442499851919634
12.796789671568312
вот финальный график здесь Если я делаю что-то не так, поправьте меня. С уважением