Я выполняю перекрестную проверку набора обучающих данных в R. Я сделал это с помощью Random forest, и теперь я работаю с деревом решений, и когда я запускаю его, он выдает мне ошибку. Я провел перекрестную проверку для случайного леса, используя 10 и 3 раза. Я следую онлайн-уроку, чтобы изучить науку данных с использованием R, и столкнулся с этой трудностью, которую я пытался понять часами. Код:
#cross validation
library(caret)
library(doSNOW)
set.seed(2348)
cv.10.folds <- createMultiFolds(rf.label, k=10, times = 10)
#check stratification
table(rf.label)
342 / 549
#set up caret's trainControl object per above
ctrl.1 <- trainControl(method = "repeatedcv", number = 10, repeats = 10, index = cv.10.folds)
table(rf.label[cv.10.folds[[33]]])
#set up caret's traincontrol object per above
ctrl.1 <- trainControl(method = "repeatedcv", number = 10, repeats = 10, index = cv.10.folds)
#Set up doSNOW package for multi-core training. This is helpful as we're going
#to be training a lot of trees
cl <- makeCluster(6, types = "SOCK")
registerDoSNOW(c1)
#Set seed for reproducibility and train
set.seed(32384)
rf.4.cv.1 <- train(x = rf.train.4, y = rf.label, method = "rf", tunelength = 3,
ntree = 1000, trControl = ctrl.1)
#Shutdown cluster
stopCluster(cl)
#check out results
rf.4.cv.1
#rework with 3 folds
set.seed(37596)
cv.3.folds <- createMultiFolds(rf.label, k=3, times = 10)
#set up caret's trainControl object per above
ctrl.3 <- trainControl(method = "repeatedcv", number = 3, repeats = 10, index = cv.3.folds)
#set up caret's traincontrol object per above
ctrl.3 <- trainControl(method = "repeatedcv", number = 3, repeats = 10,
index = cv.3.folds)
#Set up doSNOW package for multi-core training. This is helpful as we're going
#to be training a lot of trees
cl <- makeCluster(6, types = "SOCK")
registerDoSNOW(c1)
#Set seed for reproducibility and train
set.seed(94622)
rf.3.cv.1 <- train(x = rf.train.3, y = rf.label, method = "rf", tunelength = 3,
ntree = 1000, trControl = ctrl.3)
#Shutdown cluster
stopCluster(cl)
#check out results
rf.3.cv.1
# Using single Decision tree to better understand what's going on with the features
library(rpart)
library(rpart.plot)
#Using 3 fold cross validation repeated 10 times
#create utility function
rpart.cv <- function(seed, training, labels, ctrl) {
cl <- makeCluster(6, type = "SOCK")
registerDoSNOW(cl)
set.seed(seed)
#Leverage formula interface for training
rpart.cv <- train(x = training, y = labels, method = "rpart", tunelength =30,
trControl = ctrl)
#Shutdown cluster
stopCluster(cl)
return (rpart.cv)
}
#Grab features
features <- c("Pclass", "title", "family.size")
rpart.train.1 <- data.combined[1:891, features]
#Run cross validation and check out results
rpart.1.cv.1 <- rpart.cv(94622, rpart.train.1, rf.label, ctrl.3)
rpart.1.cv.1
#Plot
prp(rpart.1.cv.1$finalModel, type = 0, extra =1, under = TRUE)
Когда я запустил его, я получил сообщение об ошибке:
Something is wrong; all the Accuracy metric values are missing:
Accuracy Kappa
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :3 NA's :3
Error: Stopping
In addition: Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
Show Traceback
Rerun with Debug
Error: Stopping > rpart.1.cv.1
Error: object 'rpart.1.cv.1' not found