Самый быстрый способ векторизации - это вернуть определенную область для каждого значения с помощью ifelse
.Однако я предлагаю использовать dplyr::if_else
здесь, поскольку он обеспечивает некоторые гарантии типа (а base::ifelse
- нет).
regionconvert2 <- function(x) {
if_else(x %in% c("Texas","Oklahoma","Arkansas","Louisiana","Mississippi","Alabama","Georgia","Florida","Tennessee","Kentucky","West Virginia","Virginia","North Carolina","South Carolina", "Maryland","Delaware"),
"South",
if_else(x %in% c("Maine","New Hampshire","Vermont","Massachusetts","Connecticut","Rhode Island","New York","New Jersey","Pennsylvannia"),
"Northeast",
if_else(x %in% c("Ohio","Michigan","Illinois","Indiana","Wisconsin","Minnesota","Iowa","Missouri","North Dakota","South Dakota","Nebraska","Kansas"),
"Midwest",
if_else(x %in% c("Alaska","Hawaii","Washington","Oregon","California","Nevada","Idaho","Utah","Arizona","New Mexico","Colorado","Wyoming","Montana"),
"West",
NA_character_))))
}
Предварительно заполните полностью- NA
вывод, затем замените отдельные значения, как мы их определяем:
regionconvert3 <- function(x) {
out <- x[NA]
ind <- x %in% c("Texas","Oklahoma","Arkansas","Louisiana","Mississippi","Alabama","Georgia","Florida","Tennessee","Kentucky","West Virginia","Virginia","North Carolina","South Carolina", "Maryland","Delaware")
out[ind] <- "South"
ind <- x %in% c("Maine","New Hampshire","Vermont","Massachusetts","Connecticut","Rhode Island","New York","New Jersey","Pennsylvannia")
out[ind] <- "Northeast"
ind <- x %in% c("Ohio","Michigan","Illinois","Indiana","Wisconsin","Minnesota","Iowa","Missouri","North Dakota","South Dakota","Nebraska","Kansas")
out[ind] <- "Midwest"
ind <- x %in% c("Alaska","Hawaii","Washington","Oregon","California","Nevada","Idaho","Utah","Arizona","New Mexico","Colorado","Wyoming","Montana")
out[ind] <- "West"
return(out)
}
Мне это не очень нравится, честно говоря, так как он довольно жестко кодируется (и имеет повторяющийся код), поэтомуулучшенная версия выглядит примерно так:
regionlist <- list(
South = c("Texas","Oklahoma","Arkansas","Louisiana","Mississippi","Alabama","Georgia","Florida","Tennessee","Kentucky","West Virginia","Virginia","North Carolina","South Carolina", "Maryland","Delaware"),
Northeast = c("Maine","New Hampshire","Vermont","Massachusetts","Connecticut","Rhode Island","New York","New Jersey","Pennsylvannia"),
Midwest = c("Ohio","Michigan","Illinois","Indiana","Wisconsin","Minnesota","Iowa","Missouri","North Dakota","South Dakota","Nebraska","Kansas"),
West = c("Alaska","Hawaii","Washington","Oregon","California","Nevada","Idaho","Utah","Arizona","New Mexico","Colorado","Wyoming","Montana")
)
regionconvert4 <- function(x, lookup) {
out <- x[NA]
for (nm in names(lookup)) {
ind <- x %in% lookup[[nm]]
out[ind] <- nm
}
return(out)
}
Цель этого второго варианта - заменить значение (вектор возможных значений) именем записей в списке.