Контрольный эксперимент с использованием MLR: нет коэффициентов корреляции для LASSO (но для SVR и Random Forest) - PullRequest
0 голосов
/ 01 октября 2018

Я провел контрольный эксперимент с использованием пакета mlr в R. Среди прочих я выбрал три коэффициента корреляции в качестве моих показателей эффективности (spearmanrho, my.pearson, kendalltau).Я тестирую следующего ученика: LASSO (regr.cvglmnet), Векторная регрессионная поддержка (regr.ksvm) и Случайный лес (regr.ranger).

Однако я получаю результаты только для SVR и Случайного леса,Для LASSO он возвращает NA для всех трех показателей эффективности корреляции.

Вопрос: Кто-нибудь знает, почему?

С другой стороны, имеет смысл сообщать о Пирсоне только для LASSO (поскольку это единственная линейная модель) и для двух других - либо kendalltauили spearmanrho, верно?

Я был бы очень признателен за ответ.

Код:

set.seed(1)

df <- data.frame(ID=c(1:100),
                 x= sample(1:100),
                 y= sample(1:100))

install.packages("mlr")
install.packages("glmnet") #package for LASSO
install.packages("kernlab") #package for SVR
install.packages("ranger") #package for rf
Packages <- c("mlr", "glmnet", "e1071", "ranger")
lapply(Packages, library, character.only = TRUE) 

###Step 1) Computing the tasks 
regr.task_df = makeRegrTask(id = "test", data = df, target = "x")

###Step 2) Tuning via inner resampling loop (for each algorithm)

##LASSO

lrn_LASSO = makeLearner("regr.cvglmnet") 
parsLASSO =  makeParamSet(
  makeDiscreteParam("s", values = c("lambda.1se", "lambda.min"))
) 

tuneLASSO = makeTuneControlRandom() 
inner = makeResampleDesc("CV", iters = 2) 
learnerLASSO = makeTuneWrapper(lrn_LASSO, resampling = inner, par.set = parsLASSO, control = tuneLASSO, show.info = FALSE) #do I need/ want to specify measure here (e.g. , measures = spearmanrho or mse) 

##SVR

lrn_SVR = makeLearner("regr.ksvm") 

parsSVM = makeParamSet(
  makeNumericParam("C", lower = -10, upper = 10, trafo = function(x)2^x),  
  makeNumericParam("sigma", lower = -10, upper = 10, trafo = function(x)2^x)
) 

tuneSVM = makeTuneControlRandom() 
inner = makeResampleDesc("CV", iters = 2)

learnerSVM = makeTuneWrapper(lrn_SVR, resampling = inner, par.set = parsSVM, control = tuneSVM, show.info = FALSE)

##Random Forest

lrn_RF = makeLearner("regr.ranger", par.vals = list("num.trees" = 10L)) 

parsRF = makeParamSet(
  makeIntegerParam("min.node.size" , lower = 1, upper = 12)
)

tuneRF = makeTuneControlRandom() 
inner = makeResampleDesc("CV", iters = 2) 

learnerRF = makeTuneWrapper(lrn_RF, resampling = inner, par.set = parsRF, control = tuneRF, show.info = FALSE)

###Step 3) Benchmarking via outer resampling loop

#Learners to be compared
lrns = list(
  makeLearner("regr.featureless"), #creating one without any features as a baseline
  learnerLASSO,
  learnerSVM,
  learnerRF
)

#outer resampling strategy
rdesc = makeResampleDesc("CV", iters = 2) 

#define measures 

meas <- list(mse, mae, rmse, rsq, kendalltau, spearmanrho, my.pearson) 

#benchmark
df_benchmarking <- benchmark(learners = lrns, 
                                 tasks = regr.task_df, 
                                 resamplings = rdesc, 
                                 models = FALSE, #to not keep the fitted models for all learners on all tasks to save memory (fitted models can be retrieved by function getBMRModels)
                                 measures = meas) 

Вывод:

Task: test, Learner: regr.featureless
Resampling: cross-validation
Measures:             mse           mae           rmse          rsq           kendalltau    spearmanrho   pearsonsr     
[Resample] iter 1:    823.0228000   25.2896000    28.6883740    -0.0075292          NA            NA            NA      
[Resample] iter 2:    852.7028000   24.8392000    29.2010753    -0.0072652          NA            NA            NA      


Aggregated Result: mse.test.mean=837.8628000,mae.test.mean=25.0644000,rmse.test.rmse=28.9458598,rsq.test.mean=-0.0073972,kendalltau.test.mean=      NA,spearmanrho.test.mean=      NA,pearsonsr.test.mean=      NA


Task: test, Learner: regr.cvglmnet.tuned
Resampling: cross-validation
Measures:             mse           mae           rmse          rsq           kendalltau    spearmanrho   pearsonsr     
[Resample] iter 1:    823.0228000   25.2896000    28.6883740    -0.0075292          NA            NA            NA      
[Resample] iter 2:    852.7028000   24.8392000    29.2010753    -0.0072652          NA            NA            NA      


Aggregated Result: mse.test.mean=837.8628000,mae.test.mean=25.0644000,rmse.test.rmse=28.9458598,rsq.test.mean=-0.0073972,kendalltau.test.mean=      NA,spearmanrho.test.mean=      NA,pearsonsr.test.mean=      NA


Task: test, Learner: regr.ksvm.tuned
Resampling: cross-validation
Measures:             mse           mae           rmse          rsq           kendalltau    spearmanrho   pearsonsr     
[Resample] iter 1:    814.0693912   25.1901372    28.5319013    0.0034314     0.2284100     0.2911033     0.2326230     
[Resample] iter 2:    780.3428232   23.3046787    27.9346169    0.0782108     0.1493878     0.2074910     0.3015794     


Aggregated Result: mse.test.mean=797.2061072,mae.test.mean=24.2474079,rmse.test.rmse=28.2348385,rsq.test.mean=0.0408211,kendalltau.test.mean=0.1888989,spearmanrho.test.mean=0.2492971,pearsonsr.test.mean=0.2671012


Task: test, Learner: regr.ranger.tuned
Resampling: cross-validation
Measures:             mse           mae           rmse          rsq           kendalltau    spearmanrho   pearsonsr     
[Resample] iter 1:    763.1294514   23.3170000    27.6247978    0.0657911     0.2206785     0.3080654     0.3227492     
[Resample] iter 2:    844.0942578   24.5514734    29.0533003    0.0029037     0.0875616     0.1308357     0.1717654     


Aggregated Result: mse.test.mean=803.6118546,mae.test.mean=23.9342367,rmse.test.rmse=28.3480485,rsq.test.mean=0.0343474,kendalltau.test.mean=0.1541201,spearmanrho.test.mean=0.2194506,pearsonsr.test.mean=0.2472573


There were 50 or more warnings (use warnings() to see the first 50)
> 
...