Кажется, я не могу найти ни одной python библиотеки, которая выполняет множественную регрессию. Единственное, что я нахожу, - это простая регрессия. Мне нужно регрессировать мою зависимую переменную (y) по нескольким независимым переменным (x1, x2, x3, et c.).
Не могу найти ошибку в моем коде:
#data['PROFIT','HI','LO'] = data.target
#ycenter=pd.DataFrame(np.c_[data['PROFIT'],data['HI'],data['LO']], columns=['PROFIT','HI','LO'])
#Xcenter=pd.DataFrame(np.c_[data['ROLL'],data['DICE'],data['FREE'],data['STAKE'],data['MULT']], columns=['ROLL','DICE','FREE','STAKE','MULT'])
Xcenter=data['ROLL','DICE','FREE','STAKE','MULT']
Ycenter=data['PROFIT','HI','LO']
# split into 70:30 ration
X_traincenter, X_testcenter, y_traincenter, y_testcenter = train_test_split(Xcenter, ycenter, test_size = 0.3, random_state = 0)
# describes info about traincenter and testcenter set
print("Number transactions X_traincenter dataset: ", X_traincenter.shape)
print("Number transactions y_traincenter dataset: ", y_traincenter.shape)
print("Number transactions X_testcenter dataset: ", X_testcenter.shape)
print("Number transactions y_testcenter dataset: ", y_testcenter.shape)
print("Before OverSampling, counts of positive profitcenter: {}".format(sum(y_traincenter > 0)))
print("Before OverSampling, counts of negative profitcenter: {} \n".format(sum(y_traincenter < 0)))
#Initialize the linear model center
regcenter = LinearRegression()
#Train the model with our training datacenter
regcenter.fit(X_traincenter,y_traincenter)
#Print the coeffiscients/weights for each feature/column of our model
print(regcenter.coef_) #f(x,a) = mx + da + b =y
#Print the predictions on our test datacenter
y_predcenter = regcenter.predict(X_testcenter)
print(y_predcenter)
#Print the actual values center
print(y_testcenter)
#Check the model performance/accuracy using Mean Squared Error (MSE)
print(np.mean((y_predcenter - y_testcenter)**2))
#Check the model performance/accuracy using Mean Squared Error (MSE) and sklearn.metrics
print(mean_squared_error(y_testcenter,y_predcenter))
#check the predictions against the actual values by using the RMSE and R-2 metrics
test_setcenter_rmse = (np.sqrt(mean_squared_error(y_testcenter, y_predcenter)))
print('test_setcenter_rmse = ',test_setcenter_rmse)
test_setcenter_r2 = r2_score(y_testcenter, y_predcenter)
print('test_setcenter_r2 = ',test_setcenter_r2)
Как бы я регрессировал их в python, чтобы получить формулу линейной регрессии:
Y(x3,x5) = a1x1 + a2x2 + a3x4 + a4x6 + a5x7 + a6x8 + +a7x9 + c