сделать кадр данных временного ряда в R ask 2 (использовать dplyr ??) - PullRequest
0 голосов
/ 06 апреля 2020

Если у меня есть данные счета-фактуры магазина. пропущенные данные, потому что никто не продал.

missed date
         day   item sale value
1 2011-01-01  apple  yes   100
2 2011-01-02  apple   no   200
4 2011-01-06 banana  yes   500

true calendar
         day  
1 2011-01-01  
2 2011-01-02  
3 2011-01-04  
4 2011-01-05  
5 2011-01-06 

Мне нужны полные данные, например, "tidyverse ::: completeat c".

true calendar
         day  
1 2011-01-01  
2 2011-01-02  
3 2011-01-04  
4 2011-01-05  
5 2011-01-06 

Я хочу добавить Jan-4 и 5 января.

bind "2011-01-01" "apple"  "yes" "100"
bind "2011-01-01" "apple"  "no"  "0"  
bind "2011-01-01" "banana" "yes" "0"  
bind "2011-01-01" "banana" "no"  "0"  
bind "2011-01-02" "apple"  "yes" "0"  
bind "2011-01-02" "apple"  "no"  "200"
bind "2011-01-02" "banana" "yes" "0"  
bind "2011-01-02" "banana" "no"  "0" 

bind "2011-01-04" "apple"  "yes" "0"
bind "2011-01-04" "apple"  "no"  "0"  
bind "2011-01-04" "banana" "yes" "0"  
bind "2011-01-04" "banana" "no"  "0"  
bind "2011-01-05" "apple"  "yes" "0"  
bind "2011-01-05" "apple"  "no"  "0"  
bind "2011-01-05" "banana" "yes" "0"
bind "2011-01-05" "banana" "no"  "0"  

bind "2011-01-06" "apple"  "yes" "0"  
bind "2011-01-06" "apple"  "no"  "0"  
bind "2011-01-06" "banana" "yes" "500"
bind "2011-01-06" "banana" "no"  "0"  

как я могу это сделать? на языке R.

1 Ответ

1 голос
/ 06 апреля 2020

Мы можем использовать complete для генерации всех дат от минимального day до максимального значения в day, а затем right_join с calendar, чтобы сохранить только даты, присутствующие в calendar.

library(dplyr)

df %>%
  mutate(day = as.Date(day)) %>%
  tidyr::complete(item, sale, day = seq(min(day), max(day), by = 'day'), 
                  fill = list(value = 0)) %>%
  right_join(calendar %>% mutate(day = as.Date(day)), by = 'day')


# A tibble: 20 x 4
#   item   sale  day        value
#   <fct>  <fct> <date>     <dbl>
# 1 apple  no    2011-01-01     0
# 2 apple  yes   2011-01-01   100
# 3 banana no    2011-01-01     0
# 4 banana yes   2011-01-01     0
# 5 apple  no    2011-01-02   200
# 6 apple  yes   2011-01-02     0
# 7 banana no    2011-01-02     0
# 8 banana yes   2011-01-02     0
# 9 apple  no    2011-01-04     0
#10 apple  yes   2011-01-04     0
#11 banana no    2011-01-04     0
#12 banana yes   2011-01-04     0
#13 apple  no    2011-01-05     0
#14 apple  yes   2011-01-05     0
#15 banana no    2011-01-05     0
#16 banana yes   2011-01-05     0
#17 apple  no    2011-01-06     0
#18 apple  yes   2011-01-06     0
#19 banana no    2011-01-06     0
#20 banana yes   2011-01-06   500

data

df <- structure(list(day = structure(1:3, .Label = c("2011-01-01", 
"2011-01-02", "2011-01-06"), class = "factor"), item = structure(c(1L, 
1L, 2L), .Label = c("apple", "banana"), class = "factor"), sale = 
structure(c(2L, 1L, 2L), .Label = c("no", "yes"), class = "factor"),
value = c(100L, 200L, 500L)), class = "data.frame", row.names = c("1", "2", "4"))

calendar <- structure(list(day = structure(1:5, .Label = c("2011-01-01", 
"2011-01-02", "2011-01-04", "2011-01-05", "2011-01-06"), class = 
"factor")), class = "data.frame", row.names = c("1", "2", "3", "4", "5"))
...