Я читаю главу 11 из «Прикладного прогнозирующего моделирования» Макса Куна и пытаюсь выполнить код на данных GermanCredit для построения ROC-кривой.Я скопировал код в конце этого поста.
В этой строке ниже: упоминается столбец Class, хотя столбца с таким именем нет.
inTrain <- createDataPartition(GermanCredit$Class, p = .8)[[1]]
Я изменилсяимя столбца 20 из "credit_risk" в "Class", а затем строка выше работает нормально.
Хотя следующая строка ниже выдает ошибку:
creditResults$prob <- predict(logisticReg, GermanCreditTest, type = "prob")[, "Bad"]
ошибка:
Error in `[.data.frame`(predict(logisticReg, GermanCreditTest, type = "prob"), :
undefined columns selected
In addition: Warning message:
In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == :
prediction from a rank-deficient fit may be misleading
Я не знаю, как решить эту ошибку.Я установил все необходимые пакеты.Может ли быть так, что я делаю что-то не так, потому что странно, что код из книги дает несколько ошибок.Спасибо!
### Recreate the model used in the over-fitting chapter
library(caret)
data(GermanCredit)
## First, remove near-zero variance predictors then get rid of a few predictors
## that duplicate values. For example, there are two possible values for the
## housing variable: "Rent", "Own" and "ForFree". So that we don't have linear
## dependencies, we get rid of one of the levels (e.g. "ForFree")
GermanCredit <- GermanCredit[, -nearZeroVar(GermanCredit)]
GermanCredit$CheckingAccountStatus.lt.0 <- NULL
GermanCredit$SavingsAccountBonds.lt.100 <- NULL
GermanCredit$EmploymentDuration.lt.1 <- NULL
GermanCredit$EmploymentDuration.Unemployed <- NULL
GermanCredit$Personal.Male.Married.Widowed <- NULL
GermanCredit$Property.Unknown <- NULL
GermanCredit$Housing.ForFree <- NULL
names(GermanCredit)[20] <- "Class"
## Split the data into training (80%) and test sets (20%)
set.seed(100)
inTrain <- createDataPartition(GermanCredit$Class, p = .8)[[1]]
GermanCreditTrain <- GermanCredit[ inTrain, ]
GermanCreditTest <- GermanCredit[-inTrain, ]
set.seed(1056)
logisticReg <- train(Class ~ .,
data = GermanCreditTrain,
method = "glm",
trControl = trainControl(method = "repeatedcv",
repeats = 5))
logisticReg
### Predict the test set
creditResults <- data.frame(obs = GermanCreditTest$Class)
creditResults$prob <- predict(logisticReg, GermanCreditTest, type = "prob")[, "Bad"]
creditResults$pred <- predict(logisticReg, GermanCreditTest)
creditResults$Label <- ifelse(creditResults$obs == "Bad",
"True Outcome: Bad Credit",
"True Outcome: Good Credit")
### Plot the probability of bad credit
histogram(~prob|Label,
data = creditResults,
layout = c(2, 1),
nint = 20,
xlab = "Probability of Bad Credit",
type = "count")
### Calculate and plot the calibration curve
creditCalib <- calibration(obs ~ prob, data = creditResults)
xyplot(creditCalib)
### Create the confusion matrix from the test set.
confusionMatrix(data = creditResults$pred,
reference = creditResults$obs)
### ROC curves:
### Like glm(), roc() treats the last level of the factor as the event
### of interest so we use relevel() to change the observed class data
library(pROC)
creditROC <- roc(relevel(creditResults$obs, "Good"), creditResults$prob)
coords(creditROC, "all")[,1:3]
auc(creditROC)
ci.auc(creditROC)
### Note the x-axis is reversed
plot(creditROC)
### Old-school:
plot(creditROC, legacy.axes = TRUE)
### Lift charts
creditLift <- lift(obs ~ prob, data = creditResults)
xyplot(creditLift)
################################################################################