Как может r-квадрат быть отрицательным, если корреляция между предсказанием и истиной положительна? - PullRequest
0 голосов
/ 12 июля 2020

Попытка понять, как метрики r-квадрат (а также объясненная дисперсия) могут быть отрицательными (что указывает на несуществующую мощность прогнозирования), когда в то же время коэффициент корреляции между предсказанием и истиной (а также наклон в линейном -регрессия (регресс истины при предсказании)) положительны

1 Ответ

0 голосов
/ 08 августа 2020

R в квадрате может быть отрицательным в редких случаях.

R в квадрате = 1 - (SSR / SST)

Здесь SST расшифровывается как Sum of Squared Total, которая представляет собой не что иное, как то, насколько предсказанные баллы отличаются от среднего значения целевой переменной. Среднее здесь не что иное, как линия регрессии.

SST = Sum (Квадрат (Каждая точка данных - Среднее значение целевой переменной))

Например,

Если мы хотим построить регрессионную модель для предсказания роста ученика с весом в качестве независимой переменной, то возможный прогноз без особых усилий состоит в том, чтобы вычислить средний рост всех текущих учеников и рассматривать его как предсказание.

enter image description here

In the above diagram, red line is the regression line which is nothing but the mean of all heights. This mean calculated without much effort and can be considered as one of the worst method of prediction with poor accuracy. In the diagram itself we can see that the prediction is nowhere near to the original data points. Now come to SSR,

SSR stands for Sum of Squared Residuals. This residual is calculated from the model which we build from our mathematical approach (Linear regression, Bayesian regression, Polynomial regression or any other approach). If we use a sophisticated approach rather than using a naive approach like mean then our accuracy will obviously increase.

SSR = Sum (Square (Each data point - Each corresponding data point in the regression line))

enter image description here

In the above diagram, let's consider that the blue line indicates a sophisticated model with large mathematical analysis. We can see that it has obviously higher accuracy than the red line.

Now come to the formula,

R Squared = 1- (SSR/SST)

Here,

  • SST will be large number because it a very poor model (red line).
  • SSR will be a small number because it is the best model we developed after much mathematical analysis (blue line).
  • So, SSR/SST will be a very small number (It will become very small whenever SSR decreases).
  • So, 1- (SSR/SST) will be large number.
  • So we can infer that whenever R Squared goes higher, it means the model is too good.

This is a generic case but this cannot be applied in many cases where multiple independent variables are present. In the example, we had only one independent variable and one target variable but in real case, we will have 100's of independent variables for a single dependent variable. The actual problem is that, out of 100's of independent variables-

  • Some variables will have very high correlation with target variable.
  • Some variables will have very small correlation with target variable.
  • Also some independent variables will have no correlation at all.

So, RSquared is calculated on an assumption that the average line of the target which is perpendicular line of y axis is the worst fit a model can have at a maximum riskiest case. SST is the squared difference between this average line and original data points. Similarly, SSR is the squared difference between the predicted data points (by the model plane) and original data points.

SSR/SST gives a ratio how SSR is worst with respect to SST. If your model can somewhat build a plane which is a comparatively good than the worst, then in 99% cases SSR

But what if SSR>SST ? This means that your regression plane is worse than the mean line (SST). In this case, R squared will be obviously negative. But it happens only at 1% of cases or smaller.

Answer was originally written in quora by me -

  1. https://qr.ae/pNsLU8
  2. https://qr.ae/pNsLUr
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...