В base R
мы можем использовать regmatches/gregexpr
regmatches(x, gregexpr("#\\S+", x))
#[[1]]
#[1] "#Sun" "#Halo"
#[[2]]
#[1] "#YouthStrikeClimate" "#FridayForFuture" "#FridaysFuture" "#ClimateChange"
#[[3]]
#[1] "#storm"
Об использовании gsub
, либо
trimws(gsub("(?<!#)\\b\\S+\\s*", "", x, perl = TRUE))
или
trimws(gsub("(^| )[A-Za-z]+\\b", "", x))
сохранит слова, начинающиеся с #
, и отделит каждое слово пробелом
данные
x <- c("The #Sun #Halo is out in full force today People need to look up once in",
"inspired #YouthStrikeClimate #FridayForFuture #FridaysFuture #ClimateChange",
"Multiple warnings in effect for snow and wind with the latest #storm Metro"
)