Здесь используются 3 различных выражения для 3 столбцов
library(dplyr)
library(stringr)
df[[1]] %>%
mutate(`Tax on this income` = gsub(",", "", `Tax on this income`),
col1 = str_extract(`Tax on this income`, "(?<=^\\$)\\d+"),
col2 = str_extract(`Tax on this income`, "\\d+.(\\d+)?(?=(\\s+)?c)"),
col3 = str_extract(`Tax on this income`, "(?<=\\$)\\d+$"))
# Taxable income Tax on this income col1 col2 col3
#1 0 – $18,200 Nil <NA> <NA> <NA>
#2 $18,201 – $37,000 19c for each $1 over $18200 <NA> 19 18200
#3 $37,001 – $87,000 $3572 plus 32.5c for each $1 over $37000 3572 32.5 37000
#4 $87,001 – $180,000 $19822 plus 37c for each $1 over $87000 19822 37 87000
#5 $180,001 and over $54232 plus 45c for each $1 over $180000 54232 45 180000
Поскольку "cents"
также начинается с "c"
, это также будет работать, когда у вас вместо "c" вместо "c".
df[[19]] %>%
mutate(`Tax on this income` = gsub(",", "", `Tax on this income`),
col1 = str_extract(`Tax on this income`, "(?<=^\\$)\\d+"),
col2 = str_extract(`Tax on this income`, "\\d+.(\\d+)?(?=(\\s+)?c)"),
col3 = str_extract(`Tax on this income`, "(?<=\\$)\\d+$"))
# Taxable income Tax on this income col1 col2 col3
#1 $1 – $5,400 Nil <NA> <NA> <NA>
#2 $5,401 – $20,700 20 cents for each $1 over $5400 <NA> 20 5400
#3 $20,701 – $38,000 $3060 plus 34 cents for each $1 over $20700 3060 34 20700
#4 $38,001 – $50,000 $8942 plus 43 cents for each $1 over $38000 8942 43 38000
#5 $50,001 and over $14102 plus 47 cents for each $1 over $50000 14102 47 50000
Поскольку у вас есть список данных, вы можете использовать map
, чтобы применить его к каждому из них
purrr::map(df,.%>%
mutate(`Tax on this income` = gsub(",", "", `Tax on this income`),
col1 = str_extract(`Tax on this income`, "(?<=^\\$)\\d+"),
col2 = str_extract(`Tax on this income`, "\\d+.(\\d+)?(?=(\\s+)?c)"),
col3 = str_extract(`Tax on this income`, "(?<=\\$)\\d+$")))