Использование простой модели линейной регрессии из scikit-learn
:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
a = {'year':[2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015],'y_true':[0,-0.174,-0.131,0.127,0.566,0.723,0.675,1.171,2.338,2.625,3.746,3.612,4.729,8.156,16.330,27.584]}
df = pd.DataFrame(a)
x = np.array(df['year']).reshape(-1,1)
y_true = df['y_true']
linear_reg = LinearRegression().fit(x,y_true)
y_pred = linear_reg.predict(x)
df['y_pred'] = y_pred
df['difference'] = y_true - y_pred
print(df)
Вывод:
year y_true y_pred difference
0 2000 0.000 -4.366596 4.366596
1 2001 -0.174 -3.183741 3.009741
2 2002 -0.131 -2.000887 1.869887
3 2003 0.127 -0.818032 0.945032
4 2004 0.566 0.364822 0.201178
5 2005 0.723 1.547676 -0.824676
6 2006 0.675 2.730531 -2.055531
7 2007 1.171 3.913385 -2.742385
8 2008 2.338 5.096240 -2.758240
9 2009 2.625 6.279094 -3.654094
10 2010 3.746 7.461949 -3.715949
11 2011 3.612 8.644803 -5.032803
12 2012 4.729 9.827657 -5.098657
13 2013 8.156 11.010512 -2.854512
14 2014 16.330 12.193366 4.136634
15 2015 27.584 13.376221 14.207779