Глядя на веб-сайт Kaggle и данные, я использовал train.csv, это проблема с несколькими классами:
library(caret)
library(rpart)
library(e1071)
library(caretEnsemble)
set.seed(1234)
assignment_data1<-read.csv("train.csv")
assignment_data1$Cover_Type = factor(assignment_data1$Cover_Type)
idx <- createDataPartition(assignment_data1$Cover_Type,
p = 0.1, list = FALSE)
train.data_ensemble <- assignment_data1[idx, ]
Я взял только 10% для последующей части, из-за ограниченности ram на моем ноутбуке, так что это ярлыки:
table(train.data_ensemble$Cover_Type)
1 2 3 4 5 6 7
216 216 216 216 216 216 216
И мы настроили trainControl:
my_control <- trainControl(method = "cv",
number = 3,
classProbs=TRUE,
savePredictions = "final",
index=createResample(train.data_ensemble$Cover_Type ,3))
Запуск только этого, скажем, nnet выдает ошибку:
train(Cover_Type ~., data = train.data_ensemble,method="nnet",trControl = my_control,tuneLength = 2)
Error: At least one of the class levels is not a valid R variable name; This will cause errors when class probabilities are generated because the variables names will be converted to X1, X2, X3, X4, X5, X6, X7 . Please use factor levels that can be used as valid R variable names (see ?make.names for help).
Мы исправляем это:
train.data_ensemble$Cover_Type = paste0("type",as.character(train.data_ensemble$Cover_Type))
И fini sh список доступа:
model_list <- c("nnet", "rpart","ranger")
set.seed(1234)
# Fit the model on the training set without preProcess
list_of_models<- caretList(
Cover_Type ~., data = train.data_ensemble,
methodList =model_list,
trControl = my_control,
tuneLength = 2,
continue_on_fail = TRUE
)
names(list_of_models)
[1] "nnet" "rpart" "ranger"
lapply(list_of_models,"[[","results")
$nnet
size decay Accuracy Kappa AccuracySD KappaSD
1 1 0.0 0.1350390 0.01183745 0.01558538 0.02050306
2 1 0.1 0.1660759 0.04730726 0.01211138 0.01601860
3 3 0.0 0.1857729 0.05877921 0.01687908 0.01257810
4 3 0.1 0.2509231 0.13049948 0.03601895 0.03905056
$rpart
cp Accuracy Kappa AccuracySD KappaSD
1 0.1226852 0.2986852 0.1906243 0.05857385 0.06756310
2 0.1666667 0.2162676 0.1010794 0.08420706 0.08754039
$ranger
mtry min.node.size splitrule Accuracy Kappa AccuracySD KappaSD
1 2 1 gini 0.6736713 0.6198463 0.017877146 0.021061761
2 2 1 extratrees 0.6357918 0.5758087 0.020871998 0.024462156
3 55 1 gini 0.7098266 0.6613173 0.007074515 0.008099901
4 55 1 extratrees 0.7496037 0.7075914 0.009073924 0.010413872