ИЗГОТОВЛЕНИЕ ROC CURVE ИСПОЛЬЗОВАНИЕ ROCR или PROCR - PullRequest
0 голосов
/ 11 октября 2018
library(ROCR)
pred1 <- prediction(predictions=glm.prob2,labels =test_data$Direction)
perf1<-performance(pred1,measure = "TP.rate",x.measure = "FP.rate")
plot(perf1)

Я получаю следующее сообщение об ошибке:

Wrong argument types: First argument must be of type 'prediction'; second and optional third argument must be available performance measures!

Как мне получить кривую Рока для этого?

1 Ответ

0 голосов
/ 11 октября 2018

Как показывает ошибка, ваши аргументы measure и x.measure недопустимы. В документации по функции performance перечислены следующие параметры на выбор:

 ‘acc’: Accuracy. P(Yhat = Y). Estimated as: (TP+TN)/(P+N).

 ‘err’: Error rate. P(Yhat != Y). Estimated as: (FP+FN)/(P+N).

 ‘fpr’: False positive rate. P(Yhat = + | Y = -). Estimated as:
      FP/N.

 ‘fall’: Fallout. Same as ‘fpr’.

 ‘tpr’: True positive rate. P(Yhat = + | Y = +). Estimated as:
      TP/P.

 ‘rec’: Recall. Same as ‘tpr’.

 ‘sens’: Sensitivity. Same as ‘tpr’.

 ‘fnr’: False negative rate. P(Yhat = - | Y = +). Estimated as:
      FN/P.

 ‘miss’: Miss. Same as ‘fnr’.

 ‘tnr’: True negative rate. P(Yhat = - | Y = -).

 ‘spec’: Specificity. Same as ‘tnr’.

 ‘ppv’: Positive predictive value. P(Y = + | Yhat = +). Estimated
      as: TP/(TP+FP).

 ‘prec’: Precision. Same as ‘ppv’.

 ‘npv’: Negative predictive value. P(Y = - | Yhat = -). Estimated
      as: TN/(TN+FN).

 ‘pcfall’: Prediction-conditioned fallout. P(Y = - | Yhat = +).
      Estimated as: FP/(TP+FP).

 ‘pcmiss’: Prediction-conditioned miss. P(Y = + | Yhat = -).
      Estimated as: FN/(TN+FN).

 ‘rpp’: Rate of positive predictions. P(Yhat = +). Estimated as:
      (TP+FP)/(TP+FP+TN+FN).

 ‘rnp’: Rate of negative predictions. P(Yhat = -). Estimated as:
      (TN+FN)/(TP+FP+TN+FN).

 ‘phi’: Phi correlation coefficient. (TP*TN -
      FP*FN)/(sqrt((TP+FN)*(TN+FP)*(TP+FP)*(TN+FN))). Yields a
      number between -1 and 1, with 1 indicating a perfect
      prediction, 0 indicating a random prediction. Values below 0
      indicate a worse than random prediction.
 ‘mat’: Matthews correlation coefficient. Same as ‘phi’.

 ‘mi’: Mutual information. I(Yhat, Y) := H(Y) - H(Y | Yhat), where
      H is the (conditional) entropy. Entropies are estimated
      naively (no bias correction).

 ‘chisq’: Chi square test statistic. ‘?chisq.test’ for details.
      Note that R might raise a warning if the sample size is too
      small.

 ‘odds’: Odds ratio. (TP*TN)/(FN*FP). Note that odds ratio produces
      Inf or NA values for all cutoffs corresponding to FN=0 or
      FP=0. This can substantially decrease the plotted cutoff
      region.

 ‘lift’: Lift value. P(Yhat = + | Y = +)/P(Yhat = +).

 ‘f’: Precision-recall F measure (van Rijsbergen, 1979). Weighted
      harmonic mean of precision (P) and recall (R). F = 1/
      (alpha*1/P + (1-alpha)*1/R). If alpha=1/2, the mean is
      balanced. A frequent equivalent formulation is F = (beta^2+1)
      * P * R / (R + beta^2 * P). In this formulation, the mean is
      balanced if beta=1. Currently, ROCR only accepts the alpha
      version as input (e.g. alpha=0.5). If no value for alpha is
      given, the mean will be balanced by default.

 ‘rch’: ROC convex hull. A ROC (=‘tpr’ vs ‘fpr’) curve with
      concavities (which represent suboptimal choices of cutoff)
      removed (Fawcett 2001). Since the result is already a
      parametric performance curve, it cannot be used in
      combination with other measures.

 ‘auc’: Area under the ROC curve. This is equal to the value of the
      Wilcoxon-Mann-Whitney test statistic and also the probability
      that the classifier will score are randomly drawn positive
      sample higher than a randomly drawn negative sample. Since
      the output of ‘auc’ is cutoff-independent, this measure
      cannot be combined with other measures into a parametric
      curve. The partial area under the ROC curve up to a given
      false positive rate can be calculated by passing the optional
      parameter ‘fpr.stop=0.5’ (or any other value between 0 and 1)
      to ‘performance’.

 ‘prbe’: Precision-recall break-even point. The cutoff(s) where
      precision and recall are equal. At this point, positive and
      negative predictions are made at the same rate as their
      prevalence in the data. Since the output of ‘prbe’ is just a
      cutoff-independent scalar, this measure cannot be combined
      with other measures into a parametric curve.


 ‘cal’: Calibration error. The calibration error is the absolute
      difference between predicted confidence and actual
      reliability. This error is estimated at all cutoffs by
      sliding a window across the range of possible cutoffs. The
      default window size of 100 can be adjusted by passing the
      optional parameter ‘window.size=200’ to ‘performance’. E.g.,
      if for several positive samples the output of the classifier
      is around 0.75, you might expect from a well-calibrated
      classifier that the fraction of them which is correctly
      predicted as positive is also around 0.75. In a
      well-calibrated classifier, the probabilistic confidence
      estimates are realistic. Only for use with probabilistic
      output (i.e. scores between 0 and 1).

 ‘mxe’: Mean cross-entropy. Only for use with probabilistic output.
      MXE := - 1/(P+N) sum_{y_i=+} ln(yhat_i) + sum_{y_i=-}
      ln(1-yhat_i). Since the output of ‘mxe’ is just a
      cutoff-independent scalar, this measure cannot be combined
      with other measures into a parametric curve.

 ‘rmse’: Root-mean-squared error. Only for use with numerical class
      labels. RMSE := sqrt(1/(P+N) sum_i (y_i - yhat_i)^2). Since
      the output of ‘rmse’ is just a cutoff-independent scalar,
      this measure cannot be combined with other measures into a
      parametric curve.

 ‘sar’: Score combinining performance measures of different
      characteristics, in the attempt of creating a more "robust"
      measure (cf. Caruana R., ROCAI2004): SAR = 1/3 * ( Accuracy +
      Area under the ROC curve + Root mean-squared error ).

 ‘ecost’: Expected cost. For details on cost curves, cf.
      Drummond&Holte 2000,2004. ‘ecost’ has an obligatory x axis,
      the so-called 'probability-cost function'; thus it cannot be
      combined with other measures. While using ‘ecost’ one is
      interested in the lower envelope of a set of lines, it might
      be instructive to plot the whole set of lines in addition to
      the lower envelope. An example is given in ‘demo(ROCR)’.

 ‘cost’: Cost of a classifier when class-conditional
      misclassification costs are explicitly given.  Accepts the
      optional parameters ‘cost.fp’ and ‘cost.fn’, by which the
      costs for false positives and negatives can be adjusted,
      respectively. By default, both are set to 1.

Так что вы должны сделать что-то вроде:

perf1 <- performance(pred1, measure = "tpr", x.measure = "fpr")
...