Проблема: Матрица путаницы (пакет каретки, R), ошибка в данных на перекрывающихся уровнях
Цель: Создать матрицу путаницы для получения «Точности», «Чувствительности», «Специфичности» из указанной путаницы матричная структура.
Имеет рабочую таблицу сопряженности для таблицы прогнозирования:
> loans_prediction_table
model_prediction
Bad Good
0 120 710
1 81 2976
>
Получена ошибка:
Error in confusionMatrix.default(df_loans_train_data$statusRank,
loans_predict.predicted, :
The data must contain some levels that overlap the reference.
Альтернативное решение с as.factor (), например, тот же результат ошибки:
model_prediction_cm <-
confusionMatrix(as.factor(df_loans_train_data$statusRank),
as.factor(loans_predict.predicted), positive = "Good")
Альтернативное решение с as.factor (), например, confusionMatrix (as.factor () ... as.factor ()), сгенерированная ошибка «одинаковой длины»:
loans_predict.predicted <- factor(ifelse(loans_predict < 0.5, 0, 1))
model_prediction_cm <-
confusionMatrix(as.factor(loans_predict.predicted),
as.factor(df_loans_train_data$statusRank))
## result error:
> model_prediction_cm <-
confusionMatrix(as.factor(loans_predict.predicted),
as.factor(df_loans_train_data$statusRank))
Error in table(data, reference, dnn = dnn, ...) :
all arguments must have the same length
>
Используемые данные:
> head(df_loans_train_data$statusRank, 10)
[1] 1 1 0 0 1 1 1 0 1 0
Levels: 0 1
> str(df_loans_train_data$statusRank)
Factor w/ 2 levels "0","1": 2 2 1 1 2 2 2 1 2 1 ...
> head(loans_predict.predicted)
11413 2561 25337 1643 14264 24191
Bad <NA> Bad Bad Bad Bad
Levels: Bad Good
> str(loans_predict.predicted)
Factor w/ 2 levels "Bad","Good": 1 NA 1 1 1 1 1 1 1 1 ...
- attr(*, "names")= chr [1:4158] "11413" "2561" "25337" "1643" ...
>
loans_train_data = na.omit(loans_train_data)
df_loans_train_data <- as.data.frame(loans_train_data)
loans_predict.predicted <- factor(ifelse(loans_predict < 0.5, "Good",
"Bad"))
## problem code: confusionMatrix()
model_prediction_cm <- confusionMatrix(df_loans_train_data$statusRank,
loans_predict.predicted, positive = "Good")
model_prediction_cm$overall['Accuracy']
model_prediction_cm$overall['Sensitivity']
model_prediction_cm$overall['Specificity']
Пример отладочных данных: dput (loan_predict.predicted)
`33258` = 2L, `7249` = 2L, `4681` = 2L, `7040` = 2L, `5378` = 2L,
`13420` = 2L, `14028` = 2L, `23267` = 2L, `32953` = 2L, `26529` = 2L,
`30617` = 2L, `32348` = NA, `10303` = 2L, `20425` = 2L, `23817` = 2L,
`9459` = 2L, `33474` = 2L, `993` = 2L, `33870` = 2L, `33751` = 2L,
`26626` = 2L, `8784` = 2L, `32525` = 2L, `29272` = 2L, `5600` = 2L,
`33324` = 2L, `25767` = 2L, `25290` = 2L, `29297` = 2L, `27529` = NA,
`21944` = 2L, `27563` = 2L, `644` = 2L, `1348` = NA, `30568` = NA,
`26078` = 1L, `24222` = 2L, `28581` = 2L, `8299` = 2L, `16639` = 2L,
`33609` = 2L, `14870` = 2L, `33056` = 2L, `33162` = 2L, `4609` = 2L,
`28794` = 2L, `30851` = NA, `10850` = 2L, `16848` = 2L, `33720` = 1L,
`11570` = 2L, `16509` = 2L, `19207` = 2L, `29265` = 2L, `24578` = 2L,
`10129` = 2L, `27090` = 1L, `27485` = 2L, `28897` = 2L, `10176` = 2L,
`20959` = 2L, `4982` = 2L, `8021` = 2L, `1428` = 2L, `24250` = 2L,
`2929` = 2L, `14207` = 2L, `20656` = 2L, `23423` = 2L, `31682` = 2L,
`31989` = 1L, `13545` = 2L, `8453` = NA, `5468` = 2L, `15002` = 2L,
`29944` = 2L, `27050` = 2L, `32108` = 2L, `27711` = NA, `6610` = 2L,
`26874` = 2L, `27817` = 2L, `29768` = 2L, `16522` = 2L, `16917` = NA,
`14174` = 2L, `34318` = 2L, `16784` = 2L, `5040` = 2L, `18617` = 2L,
`32843` = 1L, `18461` = 2L, `10857` = 2L, `24549` = 2L, `12866` = 2L,
`14067` = 2L, `16067` = 2L, `18493` = 2L, `8966` = 2L, `8509` = 2L,
Отладка
dput(model_prediction_cm)
Error in dput(model_prediction_cm) :
object 'model_prediction_cm' not found