Мы можем использовать cut
для распределения temp
в бункеры и суммировать по city
и temp_range
:
library(dplyr)
df %>%
mutate(temp_range = cut(temp, breaks = seq(0, 100, 10))) %>%
group_by(city, temp_range) %>%
summarize(years = n_distinct(year))
Вывод:
# A tibble: 6 x 3
# Groups: city [3]
city temp_range years
<fct> <fct> <int>
1 DC (0,10] 1
2 DC (50,60] 1
3 DC (70,80] 1
4 NYC (70,80] 1
5 Seattle (0,10] 1
6 Seattle (80,90] 1
С помощью dplyr 0.8.0
мы также можем сохранить уровни пустых факторов, установив для нового аргумента .drop
значение FALSE
в group_by
:
df %>%
mutate(temp_range = cut(temp, breaks = seq(0, 100, 10))) %>%
group_by(city, temp_range, .drop = FALSE) %>%
summarize(years = n_distinct(year))
Вывод:
# A tibble: 30 x 3
# Groups: city [3]
city temp_range years
<fct> <fct> <int>
1 DC (0,10] 1
2 DC (10,20] 0
3 DC (20,30] 0
4 DC (30,40] 0
5 DC (40,50] 0
6 DC (50,60] 1
7 DC (60,70] 0
8 DC (70,80] 1
9 DC (80,90] 0
10 DC (90,100] 0
# ... with 20 more rows
Данные:
df <- structure(list(city = structure(c(3L, 3L, 2L, 1L, 1L, 1L), .Label = c("DC",
"NYC", "Seattle"), class = "factor"), year = c(2019L, 2018L,
2010L, 2011L, 2011L, 2018L), temp = c(82L, 10L, 78L, 71L, 10L,
60L)), class = "data.frame", row.names = c(NA, -6L))