Используя ваши данные, это то, что вы могли бы сделать:
Ваши данные
TrainStation<-c("East","North","East","North","North","Central","North","Central","East","North","East","North","Central","North","Central","North","Central","North","Central","North","Central","North","Central","East","North","East","North","Central","North","Central","East","North","East","North","Central","East")
TimeOfday<-c(12,12,8,16,10,6,0,7,1,3,23,15,12,8,16,10,1,3,5,7,9,10,12,11,17,2,4,5,13,14,18,19,20,21,22,23)
Date<-sample(seq(as.Date('2019/01/01'), as.Date('2019/02/28'), by="day"), 36)
Date<-as.character(Date)
DF<-cbind(TrainStation,TimeOfday,Date)
DF<-as.data.frame(DF)
#Weekdays
DF$Date<-as.Date(DF$Date)
DF$Date<-weekdays(DF$Date)
#TimeOfday
DF$TimeOfday<-strptime(DF$TimeOfday,format = "%H")
DF$TimeOfday<-hour(DF$TimeOfday)
DF$TrainStation<-as.character(DF$TrainStation)
DF$TimeOfday<-as.factor(DF$TimeOfday)
DF$Date<-as.factor(DF$Date)
#Data for regression
library(tidyverse)
DF2<-DF%>%
group_by(TrainStation,Date,TimeOfday)%>%
summarize(NumberOfPassenger = n_distinct(TrainStation))
Теперь перейдя в раздел моделирования, вы можете использовать вложенный столбец и затем применить свою модель
DF2 %>%
ungroup() %>%
group_by(TrainStation) %>%
nest() %>%
mutate(model = map(data, ~glm(NumberOfPassenger~TimeOfday+Date, family = poisson(), data = .)))
Это даст вам нечто, похожее на:
# A tibble: 3 x 3
TrainStation data model
<chr> <list> <list>
1 Central <tibble [11 x 3]> <S3: glm>
2 East <tibble [9 x 3]> <S3: glm>
3 North <tibble [16 x 3]> <S3: glm>
Который обладает всеми вложенными функциями.Если вы хотите извлечь параметры модели для каждой станции, вы можете сделать что-то вроде:
TrainStation<-c("East","North","East","North","North","Central","North","Central","East","North","East","North","Central","North","Central","North","Central","North","Central","North","Central","North","Central","East","North","East","North","Central","North","Central","East","North","East","North","Central","East")
TimeOfday<-c(12,12,8,16,10,6,0,7,1,3,23,15,12,8,16,10,1,3,5,7,9,10,12,11,17,2,4,5,13,14,18,19,20,21,22,23)
Date<-sample(seq(as.Date('2019/01/01'), as.Date('2019/02/28'), by="day"), 36)
Date<-as.character(Date)
DF<-cbind(TrainStation,TimeOfday,Date)
DF<-as.data.frame(DF)
#Weekdays
DF$Date<-as.Date(DF$Date)
DF$Date<-weekdays(DF$Date)
#TimeOfday
DF$TrainStation<-as.character(DF$TrainStation)
DF$TimeOfday<-as.factor(DF$TimeOfday)
DF$Date<-as.factor(DF$Date)
#Data for regression
library(tidyverse)
DF2<-DF%>%
group_by(TrainStation,Date,TimeOfday)%>%
summarize(NumberOfPassenger = n_distinct(TrainStation))
DF2 %>%
ungroup() %>%
group_by(TrainStation) %>%
nest() %>%
mutate(model = map(data, ~glm(NumberOfPassenger~TimeOfday+Date, family = poisson(), data = .))) %>%
mutate(tidy_model = map(model, broom::tidy)) %>%
select(TrainStation, tidy_model) %>%
unnest(tidy_model)
Чтобы получить все параметры из модели для каждой станции
# A tibble: 35 x 6
TrainStation term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Central (Intercept) 4.68e-11 1.000 4.68e-11 1.000
2 Central TimeOfday12 -3.19e-35 1.41 -2.26e-35 1
3 Central TimeOfday14 5.24e-34 1.41 3.70e-34 1
4 Central TimeOfday16 1.03e-34 1.41 7.28e-35 1
5 Central TimeOfday22 -5.21e-18 2.00 -2.61e-18 1
6 Central TimeOfday5 -5.21e-18 1.41 -3.68e-18 1
7 Central TimeOfday6 2.17e-34 1.41 1.53e-34 1