Вам необходимо выполнить несколько замен. Существует базовая версия R и альтернатива, использующая stringr
.
Обратите внимание, что мне пришлось выйти из знака доллара, чтобы сделать эту работу (отредактировано).
Очевидно, вам все еще нужно работать с шаблонами регулярных выражений.
library(stringr)
nums <- "(\\d+)|([\\$£]?\\d{1-3}(,\\d{3})+)|([\\$£]?(\\d+)?\\.\\d+)|Zero|One|Two|Three|Four|Five|Six|Seven|Eight|Nine|Ten|Eleven|Twelve|Thirteen|Fourteen|Fifteen|Sixteen|Seventeen|Eighteen|Nineteen|Twenty|Thirty|Fourty|Fifty|Sixty|Seventy|Eighty|Ninty|Hundred|Thousand|Million|Billion|Trillion|\\b(M{1,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|M{0,4}(CM|C?D|D?C{1,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|X?L|L?X{1,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|I?V|V?I{1,3}))\\b"
lines <- c("I am twenty years old.",
"I am Twenty years old.",
"I have $50.45 in my pocket.",
"This tree is 100,000 years old.",
"This is fine.", "Not sure if that is, two.")
linesNums <- grep(nums, lines, value = TRUE)
rms <- regmatches(linesNums, gregexpr(nums, linesNums))
rms <- unique(unlist(rms))
# alternative stringr function:
str_replace_all(linesNums,
setNames(paste0("<<", rms, ">>"),
gsub("$", "\\$", rms, fixed = TRUE)))
#> [1] "<<I>> am twenty years old."
#> [2] "<<I>> am <<Twenty>> years old."
#> [3] "<<I>> have <<$50.45>> in my pocket."
#> [4] "This tree is <<100>>,<<000>> years old."
# base R function:
multisub <- function(target, output, string) {
replacement.list <- apply(cbind(target, output), 1, as.list)
mygsub <- function(l, x) gsub(pattern = l[1], replacement = l[2], x, perl=TRUE)
Reduce(mygsub, replacement.list, init = string, right = TRUE)
}
multisub(gsub("$", "\\$", rms, fixed = TRUE), paste0("<<", rms, ">>"), linesNums)
#> [1] "<<I>> am twenty years old."
#> [2] "<<I>> am <<Twenty>> years old."
#> [3] "<<I>> have <<$50.45>> in my pocket."
#> [4] "This tree is <<100>>,<<000>> years old."
Создано в 2020-04-13 пакетом Представ (v0.3.0)