Как добавить недостающие месяцы во фрейм данных? - PullRequest
0 голосов
/ 03 октября 2018

У меня есть набор данных с тремя наблюдениями: январь, февраль и март.Я хотел бы добавить оставшиеся месяцы в виде наблюдений с нулями к тому же дататуру, но у меня возникают проблемы с добавлением этих данных.

Вот мой текущий код:

library(dplyr)

Period <- c("January 2015", "February 2015", "March 2015",
            "January 2016", "February 2016", "March 2016",
            "January 2017", "February 2017", "March 2017",
            "January 2018", "February 2018", "March 2018")

Month <- c("January", "February", "March",
           "January", "February", "March",
           "January", "February", "March",
           "January", "February", "March")

Dollars <- c(936, 753, 731, 
             667, 643, 588, 
             948, 894, 997, 
             774,745, 684)

dat <- data.frame(Period = Period, Month = Month, Dollars = Dollars)

dat2 <- dat %>%
  dplyr::select(Month, Dollars) %>%
  dplyr::group_by(Month) %>%
  dplyr::summarise(AvgDollars = mean(Dollars))

Любые идеи по заполнениюС апреля по декабрь в наборе данных очень ценятся.Заранее спасибо!

Ответы [ 3 ]

0 голосов
/ 03 октября 2018

Вот двухэтапное решение:

library(dplyr)
Sys.setlocale("LC_TIME", "English")
# first, define a dataframe with each month from January 2015 to December 2018
dat2 <- data.frame(Period = format(seq(as.Date("2015/1/1"),
                                       as.Date("2018/12/1"), by = "month"),
                                   format = "%B %Y"),
                   Month = substr(Period, 1, nchar(Period)-5)) 
# then, merge dat and dat2
dat %>%
  select(Period, Dollars) %>%
  right_join(dat2, by = "Period") %>%
  select(Period, Month, Dollars)
           Period    Month Dollars
1    January 2015  January     936
2   February 2015 February     753
3      March 2015    March     731
4      April 2015  January      NA
5        May 2015 February      NA
6       June 2015    March      NA
7       July 2015  January      NA
8     August 2015 February      NA
9  September 2015    March      NA
10   October 2015  January      NA
11  November 2015 February      NA
12  December 2015    March      NA
13   January 2016  January     667
14  February 2016 February     643
15     March 2016    March     588
16     April 2016  January      NA
17       May 2016 February      NA
18      June 2016    March      NA
19      July 2016  January      NA
20    August 2016 February      NA
21 September 2016    March      NA
22   October 2016  January      NA
23  November 2016 February      NA
24  December 2016    March      NA
25   January 2017  January     948
26  February 2017 February     894
27     March 2017    March     997
28     April 2017  January      NA
29       May 2017 February      NA
30      June 2017    March      NA
31      July 2017  January      NA
32    August 2017 February      NA
33 September 2017    March      NA
34   October 2017  January      NA
35  November 2017 February      NA
36  December 2017    March      NA
37   January 2018  January     774
38  February 2018 February     745
39     March 2018    March     684
40     April 2018  January      NA
41       May 2018 February      NA
42      June 2018    March      NA
43      July 2018  January      NA
44    August 2018 February      NA
45 September 2018    March      NA
46   October 2018  January      NA
47  November 2018 February      NA
48  December 2018    March      NA
0 голосов
/ 04 октября 2018

Вот способ сделать это, используя complete за один шаг:

library(tidyverse)

Затем используйте complete:

dat2 <- data.frame(Period = Period, Month = Month, Dollars = Dollars) %>% 
  # make a "year" variable
  mutate(Year = word(Period, 2,2)) %>% 
  # remove period variable (we'll add it in later)
  select(-Period) %>% 
  # month.name is a base variable listing all months (thanks @Gregor).
  # nesting by "Year" lets complete know you only want the years listed in your dataset.
  complete(Month = month.name, nesting(Year), fill = list(Dollars = 0)) %>% 
  # Arrange by Year and month
  arrange(Year, Month) %>% 
  #remake the "period" variable 
  mutate(Period = paste(Month, Year)) %>% 
  group_by(Month) %>% 
  summarise(AvgDollars = mean(Dollars))
0 голосов
/ 03 октября 2018

Возможно, есть более изящное решение с помощью dplyr, но вот быстрое решение без особого набора:

dat <- rbind(data.frame(Period = Period, Month = Month, Dollars = Dollars),
             data.frame(Period = c(sapply(2015:2018, function(x) format(ISOdate(x,4:12,1),"%B %Y"))),
                        Month = c(sapply(2015:2018, function(x) format(ISOdate(x,4:12,1),"%B"))),
                        Dollars = 0))
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...