Я пытаюсь использовать модель случайного леса, чтобы делать прогнозы
Мои данные выглядят так:
> str(margins_data)
'data.frame': 457961 obs. of 10 variables:
$ month : Factor w/ 7 levels "April","August",..: 6 6 4 6 2 1 5 6
5 4 ...
$ miles : num 416 1559 1156 672 1188 ...
$ equipment : Factor w/ 3 levels "Flat","Reefer",..: 1 3 3 3 3 2 3 2 3
3 ...
$ originstate : Factor w/ 62 levels " ","AB","AL",..: 20 55 14 34 14 56
14 34 57 14 ...
$ destinationstate: Factor w/ 62 levels "AB","AK","AL",..: 17 7 55 27 55 8
55 32 46 12 ...
$ margin : num 800 450 450 200 450 700 500 375 200 200 ...
$ ldi : num 2.5 4.84 3.1 1.75 3.35 ...
$ weight : int 40000 43000 40000 10000 39000 35000 39000 7817
38000 42720 ...
$ commoditygroup : Factor w/ 49 levels "Agriculture",..: 18 9 18 15 42 38
18 22 27 18 ...
$ customerindustry: Factor w/ 352 levels "Abrasive, Asbestos, And
Miscellaneous",..: 300 336 336 229 336 133 336 133 260 264 ...
- attr(*, "na.action")= 'omit' Named int 1182 2282 2869 2999 3082 4609 5360
5444 5445 6029 ...
..- attr(*, "names")= chr "1182" "2282" "2869" "2999" ...
Я разбил данные на тренировочный набор и тестовый набор:
N <- nrow(margins_data)
target <- round(N * 0.75)
gp <- runif(N)
margin_train <- margins_data[gp < 0.75, ]
margin_test <- margins_data[gp >= 0.75, ]
И определили параметры моей модели:
seed <- 423563
outcome <- "margin"
vars <- c("miles", "equipment", "originstate", "destinationstate", "margin",
"ldi", "weight", "commoditygroup", "customerindustry")
fmla <- paste(outcome, "~", paste(vars, collapse = " + "))
margin_model_rf <- ranger(fmla,
margin_train,
num.trees = 500,
respect.unordered.factors = "order",
seed = seed)
margin_model_rf
Call:
ranger(fmla, margin_train, num.trees = 500, respect.unordered.factors =
"order", seed = seed)
Type: Regression
Number of trees: 500
Sample size: 343253
Number of independent variables: 9
Mtry: 3
Target node size: 5
Variable importance mode: none
Splitrule: variance
OOB prediction error (MSE): 840.8202
Когда я пытаюсь предсказать данные испытаний, я получаю следующую ошибку:
margin_predict <- predict(margin_model_rf, margin_test)
Error: Missing data in columns: weight.
In addition: Warning message:
In mapply(function(x, y) { :
longer argument not a multiple of length of shorter
Любая помощь поэто будет с благодарностью.