Ошибка в maxkappa.train [1, 2]: неверное количество измерений, обобщенная линейная модель - PullRequest
0 голосов
/ 10 апреля 2020

Я пытаюсь запустить обобщенную линейную модель со своими данными и получаю сообщение об ошибке «Ошибка в maxkappa.train [1, 2]: неверное количество измерений». Я не уверен, почему я получаю эту ошибку и почему выходные данные для моих моделей-кандидатов оказываются пустыми.

str(ChallengeEvents_Daily_BySpp)
'data.frame':   2190 obs. of  14 variables:
 $ Date            : Factor w/ 1095 levels "2017-01-01","2017-01-02",..: 1 1 2 2 3 3 4 4 5 5 ...
 $ Species         : Factor w/ 2 levels "PDFH","SCBC": 1 2 1 2 1 2 1 2 1 2 ...
 $ N.unique        : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Challenger      : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ Season          : Factor w/ 4 levels "FAL","SPR","SUM",..: 4 4 4 4 4 4 4 4 4 4 ...
 $ Temp            : num  0.127 0.127 0.281 0.281 0.322 ...
 $ Hydraulic.head.m: num  2.69 2.69 2.69 2.69 2.74 ...
 $ D.lock.n        : int  0 0 0 0 0 0 0 0 0 0 ...
 $ U.lock.n        : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Rec.D.n         : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Rec.U.n         : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Barge.D.n       : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Barge.U.n       : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Year            : Factor w/ 3 levels "2017","2018",..: 1 1 1 1 1 1 1 1 1 1 ...

Код, который я использую для запуска различных моделей.

test.models <- list(
  m0.form <- formula(Challenger ~ Temp +  Hydraulic.head.m + Barge.D.n + Barge.U.n + Rec.D.n + Rec.U.n + Year + Season),
  m1.form <- formula(Challenger ~ Barge.D.n),
  m2.form <- formula(Challenger ~ Barge.D.n + Hydraulic.head.m),
  m3.form <- formula(Challenger ~ Barge.D.n + Temp),
  m4.form <- formula(Challenger ~ Barge.D.n + Season),
  m5.form <- formula(Challenger ~ Barge.D.n + Year),
  m6.form <- formula(Challenger ~ Barge.D.n + Hydraulic.head.m + Temp),
  m7.form <- formula(Challenger ~ Barge.D.n + Temp + Season),
  m8.form <- formula(Challenger ~ Barge.D.n + Hydraulic.head.m + Temp + Year),
  m9.form <- formula(Challenger ~ Barge.U.n),
  m10.form <- formula(Challenger ~ Barge.U.n + Hydraulic.head.m),
  m11.form <- formula(Challenger ~ Barge.U.n + Temp),
  m12.form <- formula(Challenger ~ Barge.U.n + Season),
  m13.form <- formula(Challenger ~ Barge.U.n + Year),
  m14.form <- formula(Challenger ~ Barge.U.n + Hydraulic.head.m + Temp),
  m15.form <- formula(Challenger ~ Barge.U.n + Temp + Season),
  m16.form <- formula(Challenger ~ Barge.U.n + Hydraulic.head.m + Temp + Year),
  m17.form <- formula(Challenger ~ Rec.D.n),
  m18.form <- formula(Challenger ~ Rec.D.n + Hydraulic.head.m),
  m19.form <- formula(Challenger ~ Rec.D.n + Temp),
  m20.form <- formula(Challenger ~ Rec.D.n + Season),
  m21.form <- formula(Challenger ~ Rec.D.n + Year),
  m22.form <- formula(Challenger ~ Rec.D.n + Hydraulic.head.m + Temp),
  m23.form <- formula(Challenger ~ Rec.D.n + Temp + Season),
  m24.form <- formula(Challenger ~ Rec.D.n + Hydraulic.head.m + Temp + Year),
  m25.form <- formula(Challenger ~ Rec.U.n),
  m26.form <- formula(Challenger ~ Rec.U.n + Hydraulic.head.m),
  m27.form <- formula(Challenger ~ Rec.U.n + Temp),
  m28.form <- formula(Challenger ~ Rec.U.n + Season),
  m29.form <- formula(Challenger ~ Rec.U.n + Year),
  m30.form <- formula(Challenger ~ Rec.U.n + Hydraulic.head.m + Temp),
  m31.form <- formula(Challenger ~ Rec.U.n + Temp + Season),
  m32.form <- formula(Challenger ~ Rec.U.n + Hydraulic.head.m + Temp + Year),
  m33.form <- formula(Challenger ~ Year),
  m34.form <- formula(Challenger ~ Season),
  m35.form <- formula(Challenger ~ Temp),
  m36.form <- formula(Challenger ~ Hydraulic.head.m),
  m37.form <- formula(Challenger ~ Barge.D.n + Barge.U.n),
  m38.form <- formula(Challenger ~ Barge.D.n + Barge.U.n + Hydraulic.head.m),
  m39.form <- formula(Challenger ~ Barge.D.n + Barge.U.n + Temp),
  m40.form <- formula(Challenger ~ Barge.D.n + Barge.U.n + Season),
  m41.form <- formula(Challenger ~ Barge.D.n + Barge.U.n + Year),
  m42.form <- formula(Challenger ~ Barge.D.n + Barge.U.n + Hydraulic.head.m + Temp),
  m43.form <- formula(Challenger ~ Barge.D.n + Barge.U.n + Temp + Season),
  m44.form <- formula(Challenger ~ Barge.D.n + Barge.U.n + Hydraulic.head.m + Temp + Year),
  m45.form <- formula(Challenger ~ Rec.D.n + Rec.U.n),
  m46.form <- formula(Challenger ~ Rec.D.n + Rec.U.n + Hydraulic.head.m),
  m47.form <- formula(Challenger ~ Rec.D.n + Rec.U.n + Temp),
  m48.form <- formula(Challenger ~ Rec.D.n + Rec.U.n + Season),
  m49.form <- formula(Challenger ~ Rec.D.n + Rec.U.n + Year),
  m50.form <- formula(Challenger ~ Rec.D.n + Rec.U.n + Hydraulic.head.m + Temp),
  m51.form <- formula(Challenger ~ Rec.D.n + Rec.U.n + Temp + Season),
  m52.form <- formula(Challenger ~ Rec.D.n + Rec.U.n + Hydraulic.head.m + Temp + Year),
  m53.form <- formula(Challenger ~ Barge.D.n + Barge.U.n + Rec.D.n + Rec.U.n),
  m54.form <- formula(Challenger ~ Barge.D.n + Barge.U.n + Rec.D.n + Rec.U.n + Hydraulic.head.m),
  m55.form <- formula(Challenger ~ Barge.D.n + Barge.U.n + Rec.D.n + Rec.U.n + Temp),
  m56.form <- formula(Challenger ~ Barge.D.n + Barge.U.n + Rec.D.n + Rec.U.n + Season),
  m57.form <- formula(Challenger ~ Barge.D.n + Barge.U.n + Rec.D.n + Rec.U.n + Year),
  m58.form <- formula(Challenger ~ Barge.D.n + Barge.U.n + Rec.D.n + Rec.U.n + Hydraulic.head.m + Temp),
  m59.form <- formula(Challenger ~ Barge.D.n + Barge.U.n + Rec.D.n + Rec.U.n + Temp + Season),
  m60.form <- formula(Challenger ~ Barge.D.n + Barge.U.n + Rec.D.n + Rec.U.n + Hydraulic.head.m + Temp + Year),
  m61.form <- formula(Challenger ~ Barge.D.n + Rec.D.n),
  m62.form <- formula(Challenger ~ Barge.D.n + Rec.D.n + Hydraulic.head.m),
  m63.form <- formula(Challenger ~ Barge.D.n + Rec.D.n + Temp),
  m64.form <- formula(Challenger ~ Barge.D.n + Rec.D.n + Season),
  m65.form <- formula(Challenger ~ Barge.D.n + Rec.D.n + Year),
  m66.form <- formula(Challenger ~ Barge.D.n + Rec.D.n + Hydraulic.head.m + Temp),
  m67.form <- formula(Challenger ~ Barge.D.n + Rec.D.n + Temp + Season),
  m68.form <- formula(Challenger ~ Barge.D.n + Rec.D.n + Hydraulic.head.m + Temp + Year),
  m69.form <- formula(Challenger ~ Barge.U.n + Rec.U.n + (1|TRANSMITTERID)),
  m70.form <- formula(Challenger ~ Barge.U.n + Rec.U.n + Hydraulic.head.m),
  m71.form <- formula(Challenger ~ Barge.U.n + Rec.U.n + Temp),
  m72.form <- formula(Challenger ~ Barge.U.n + Rec.U.n + Season),
  m73.form <- formula(Challenger ~ Barge.U.n + Rec.U.n + Year),
  m74.form <- formula(Challenger ~ Barge.U.n + Rec.U.n + Hydraulic.head.m + Temp),
  m75.form <- formula(Challenger ~ Barge.U.n + Rec.U.n + Temp + Season),
  m76.form <- formula(Challenger ~ Barge.U.n + Rec.U.n + Hydraulic.head.m + Temp + Year))   

CompareModels <- list()

for(i in 1:length(spp)){ # begin loop through species
  spp.i <- spp[i]
  df.i <- ChallengeEvents_Daily_BySpp[ChallengeEvents_Daily_BySpp$Species == spp.i, ]
  # Randomly sample in 80/20 split
  df.pos <- df.i[df.i$Challenger == 1, ]
  pos.random <- sample(nrow(df.pos), nrow(df.pos), replace = FALSE)
  pos.cut <- floor(length(pos.random)*.8)
  df.pos.train <- df.pos[pos.random[c(1:pos.cut)], ]
  df.pos.test <- df.pos[pos.random[c((pos.cut+1):length(pos.random))], ]

  df.neg <- df.i[df.i$Challenger == 0, ]
  neg.random <- sample(nrow(df.neg), nrow(df.neg), replace = FALSE)
  #neg.random <- sample(nrow(df.neg), nrow(df.pos), replace = FALSE) # to balance pos/neg challenge
  neg.cut <- floor(length(neg.random)*.8)  
  df.neg.train <- df.neg[neg.random[c(1:neg.cut)], ] 
  df.neg.test <- df.neg[neg.random[c((neg.cut + 1):length(neg.random))], ]


  df.i.test <- rbind(df.neg.test, df.pos.test)
  df.i.train <- rbind(df.neg.train, df.pos.train)

  zz <- file(paste0(out.dir, 'ChallengeEventModels/', 'Challenge_Models_', spp.i, '.txt'), open = 'wt')
  sink(zz,  type =  "message")
  sink(zz,  type =  "output")

  for(j in 1:length(test.models)){ # begin loop through models
    name.i.j <- paste0(spp.i, "_m", j)
    model.j <- test.models[[j]]
    glm.j <-glm(model.j, 
                data = df.i.train, family = binomial) 

    print(name.i.j)
    print(model.j)
    print(summary(glm.j))
    print(confint(glm.j))


    #AUC for global model in model selection table 
    p.train <- predict(glm.j, df.i.train, type="response")
    pr.train <- prediction(p.train, df.i.train$Challenger)
    auc.train <- performance(pr.train, measure = "auc")
    auc.train <- auc.train@y.values[[1]]
    maxkappa.train <- ecospat.max.kappa(p.train, df.i.train$Challenger)[[2]]


    #AUC for global model in model selection table 
    p.test <- predict(glm.j, df.i.test, type="response")
    pr.test <- prediction(p.test, df.i.test$Challenger)
    auc.test <- performance(pr.test, measure = "auc")
    auc.test <- auc.test@y.values[[1]]
    maxkappa.test <- ecospat.max.kappa(p.test, df.i.test$Challenger)[[2]]


    CompareModels[[name.i.j]] <- data.frame(Species = spp.i, 
                                            Model = as.character(model.j)[3], 
                                            AIC = AIC(glm.j),
                                            converged = glm.j$converged,
                                            boundary = glm.j$boundary,
                                            AUC.train = auc.train,
                                            AUC.test = auc.test, 
                                            maxKappa.train = maxkappa.train[1,2], 
                                            Kappacutoff.train = maxkappa.train[2,2], 
                                            maxKappa.test = maxkappa.test[1,2], 
                                            Kappacutoff.test = maxkappa.test[2,2])

  } #end models
  sink()
  sink()
} #end species

CompareModels <- do.call(rbind, CompareModels)

Это единственный вывод, который я получаю. Я должен получить 152 модели (76 для одного вида и 76 для другого вида).

[1] "SCBC_m1"
Challenger ~ Temp + Hydraulic.head.m + Barge.D.n + Barge.U.n + 
    Rec.D.n + Rec.U.n + Year + Season

Call:
glm(formula = model.j, family = binomial, data = df.i.train)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.22549  -0.32947  -0.08226  -0.00009   2.78997  

Coefficients:
                   Estimate Std. Error z value Pr(>|z|)    
(Intercept)       -5.908158   1.444232  -4.091  4.3e-05 ***
Temp               0.206311   0.060722   3.398  0.00068 ***
Hydraulic.head.m  -0.830706   0.203861  -4.075  4.6e-05 ***
Barge.D.n         -0.099814   0.051126  -1.952  0.05090 .  
Barge.U.n         -0.020138   0.044261  -0.455  0.64912    
Rec.D.n            0.006503   0.152140   0.043  0.96591    
Rec.U.n            0.218535   0.180073   1.214  0.22490    
Year2018          -1.207097   0.433933  -2.782  0.00541 ** 
Year2019          -1.150210   0.450021  -2.556  0.01059 *  
SeasonSPR          2.058405   1.086584   1.894  0.05817 .  
SeasonSUM          2.555601   1.079296   2.368  0.01789 *  
SeasonWINT       -10.694876 721.821804  -0.015  0.98818    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 432.14  on 874  degrees of freedom
Residual deviance: 285.54  on 863  degrees of freedom
AIC: 309.54

Number of Fisher Scoring iterations: 18

Waiting for profiling to be done...
                         2.5 %       97.5 %
(Intercept)        -9.32226029 -3.402828646
Temp                0.09406931  0.332836137
Hydraulic.head.m   -1.24665216 -0.444214806
Barge.D.n          -0.20263859 -0.001542306
Barge.U.n          -0.10886309  0.065337244
Rec.D.n            -0.30888543  0.294943423
Rec.U.n            -0.13895160  0.571089446
Year2018           -2.08842313 -0.378075754
Year2019           -2.05408570 -0.282466826
SeasonSPR           0.30546815  5.007097435
SeasonSUM           0.83911673  5.498842642
SeasonWINT       -262.95844145 17.513941454
Error in maxkappa.train[1, 2] : incorrect number of dimensions
In addition: There were 50 or more warnings (use warnings() to see the first 50)
...