Как вы можете сделать список из рамки даты, а затем регрессии? - PullRequest
0 голосов
/ 15 мая 2019

У меня есть DF с 4 столбцами. В первом столбце указаны станции, а в остальных 3 столбцах - время, день недели и количество людей. Моя цель - сделать регрессию (glm) для каждой отдельной станции. Думаю со списком было бы проще или? У меня вопрос, как мне составить список и как сделать регрессию (glm) для каждой станции, используя список?

мой Df выглядит так:

вот фотография моего DF enter image description here

Пример кода:

TrainStation - это chr, День недели и timeOfday - факторы, а NumberOfPassenger - число.

    TrainStation<-c("East","North","East","North","North","Central","North","Central","East","North","East","North","Central","North","Central","North","Central","North","Central","North","Central","North","Central","East","North","East","North","Central","North","Central","East","North","East","North","Central","East")
TimeOfday<-c(12,12,8,16,10,6,0,7,1,3,23,15,12,8,16,10,1,3,5,7,9,10,12,11,17,2,4,5,13,14,18,19,20,21,22,23)
Date<-sample(seq(as.Date('2019/01/01'), as.Date('2019/02/28'), by="day"), 36)
Date<-as.character(Date)
DF<-cbind(TrainStation,TimeOfday,Date)
DF<-as.data.frame(DF)

#Weekdays
DF$Date<-as.Date(DF$Date)
DF$Date<-weekdays(DF$Date)
#TimeOfday
DF$TimeOfday<-strptime(DF$TimeOfday,format = "%H")
DF$TimeOfday<-hour(DF$TimeOfday)

DF$TrainStation<-as.character(DF$TrainStation)
DF$TimeOfday<-as.factor(DF$TimeOfday)
DF$Date<-as.factor(DF$Date)

#Data for regression
library(tidyverse)
DF2<-DF%>%
  group_by(TrainStation,Date,TimeOfday)%>%
  summarize(NumberOfPassenger = n_distinct(TrainStation))

Большое спасибо за помощь!

1 Ответ

1 голос
/ 15 мая 2019

Используя ваши данные, это то, что вы могли бы сделать:

Ваши данные

TrainStation<-c("East","North","East","North","North","Central","North","Central","East","North","East","North","Central","North","Central","North","Central","North","Central","North","Central","North","Central","East","North","East","North","Central","North","Central","East","North","East","North","Central","East")
TimeOfday<-c(12,12,8,16,10,6,0,7,1,3,23,15,12,8,16,10,1,3,5,7,9,10,12,11,17,2,4,5,13,14,18,19,20,21,22,23)
Date<-sample(seq(as.Date('2019/01/01'), as.Date('2019/02/28'), by="day"), 36)
Date<-as.character(Date)
DF<-cbind(TrainStation,TimeOfday,Date)
DF<-as.data.frame(DF)

#Weekdays
DF$Date<-as.Date(DF$Date)
DF$Date<-weekdays(DF$Date)
#TimeOfday
DF$TimeOfday<-strptime(DF$TimeOfday,format = "%H")
DF$TimeOfday<-hour(DF$TimeOfday)

DF$TrainStation<-as.character(DF$TrainStation)
DF$TimeOfday<-as.factor(DF$TimeOfday)
DF$Date<-as.factor(DF$Date)

#Data for regression
library(tidyverse)
DF2<-DF%>%
  group_by(TrainStation,Date,TimeOfday)%>%
  summarize(NumberOfPassenger = n_distinct(TrainStation))

Теперь перейдя в раздел моделирования, вы можете использовать вложенный столбец и затем применить свою модель

DF2 %>%
  ungroup() %>% 
  group_by(TrainStation) %>% 
  nest() %>% 
  mutate(model = map(data, ~glm(NumberOfPassenger~TimeOfday+Date, family = poisson(), data = .)))

Это даст вам нечто, похожее на:

# A tibble: 3 x 3
  TrainStation data              model    
  <chr>        <list>            <list>   
1 Central      <tibble [11 x 3]> <S3: glm>
2 East         <tibble [9 x 3]>  <S3: glm>
3 North        <tibble [16 x 3]> <S3: glm>

Который обладает всеми вложенными функциями.Если вы хотите извлечь параметры модели для каждой станции, вы можете сделать что-то вроде:

TrainStation<-c("East","North","East","North","North","Central","North","Central","East","North","East","North","Central","North","Central","North","Central","North","Central","North","Central","North","Central","East","North","East","North","Central","North","Central","East","North","East","North","Central","East")
TimeOfday<-c(12,12,8,16,10,6,0,7,1,3,23,15,12,8,16,10,1,3,5,7,9,10,12,11,17,2,4,5,13,14,18,19,20,21,22,23)
Date<-sample(seq(as.Date('2019/01/01'), as.Date('2019/02/28'), by="day"), 36)
Date<-as.character(Date)
DF<-cbind(TrainStation,TimeOfday,Date)
DF<-as.data.frame(DF)

#Weekdays
DF$Date<-as.Date(DF$Date)
DF$Date<-weekdays(DF$Date)
#TimeOfday
DF$TrainStation<-as.character(DF$TrainStation)

DF$TimeOfday<-as.factor(DF$TimeOfday)
DF$Date<-as.factor(DF$Date)

#Data for regression
library(tidyverse)
DF2<-DF%>%
  group_by(TrainStation,Date,TimeOfday)%>%
  summarize(NumberOfPassenger = n_distinct(TrainStation))

DF2 %>%
  ungroup() %>% 
  group_by(TrainStation) %>% 
  nest() %>% 
  mutate(model = map(data, ~glm(NumberOfPassenger~TimeOfday+Date, family = poisson(), data = .))) %>% 
  mutate(tidy_model = map(model, broom::tidy)) %>% 
  select(TrainStation, tidy_model) %>% 
  unnest(tidy_model)

Чтобы получить все параметры из модели для каждой станции

# A tibble: 35 x 6
   TrainStation term           estimate std.error statistic p.value
   <chr>        <chr>             <dbl>     <dbl>     <dbl>   <dbl>
 1 Central      (Intercept)    4.68e-11     1.000  4.68e-11   1.000
 2 Central      TimeOfday12   -3.19e-35     1.41  -2.26e-35   1    
 3 Central      TimeOfday14    5.24e-34     1.41   3.70e-34   1    
 4 Central      TimeOfday16    1.03e-34     1.41   7.28e-35   1    
 5 Central      TimeOfday22   -5.21e-18     2.00  -2.61e-18   1    
 6 Central      TimeOfday5    -5.21e-18     1.41  -3.68e-18   1    
 7 Central      TimeOfday6     2.17e-34     1.41   1.53e-34   1  
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...