Это просто обычная работа по извлечению текста. Есть много способов сделать это, и я уверен, что есть более элегантные способы сделать это, чем этот, но этот выполняет свою работу:
library(pdftools)
library(dplyr)
keywords <- pdf_text("mypdf.pdf") %>%
strsplit("Schema element:") %>%
lapply(function(x) x[-1]) %>%
lapply(function(x) sapply(strsplit(x, "\r\n"), `[`, 1)) %>%
unlist %>%
trimws()
text <- pdf_text("mypdf.pdf") %>%
strsplit("Guidance on completion of schema element:") %>%
lapply(function(x) x[-1]) %>%
lapply(function(x) sapply(strsplit(x, ":"), `[`, 1)) %>%
lapply(function(x) sapply(strsplit(x, "\r\n"),
function(y) paste(y[-length(y)], collapse = ""))) %>%
unlist() %>%
{gsub(" ", " ", .)} %>%
trimws() %>%
strsplit("Guidance on contents") %>%
sapply(`[`, 1)
df <- tibble(keywords, text)
Итак, результат выглядит так:
df
#> # A tibble: 15 x 2
#> keywords text
#> <chr> <chr>
#> 1 swExemption44Driver "Required. Select from the enumeration list the driver~
#> 2 swExemption45Impact "Required. Select from the enumeration list the impact~
#> 3 swExemption45Driver "Required. Select from the enumeration list the driver~
#> 4 swDisproportionateCost "Required. Indicate if disproportionate costs have bee~
#> 5 swDisproportionateCostScale "Conditional. Select from the enumeration list the sc~
#> 6 swDisproportionateCostAnalysis "Conditional. Select from the enumeration list the an~
#> 7 swDisproportionateCostAlterna~ "Conditional. Select from the enumeration list the al~
#> 8 swDisproportionateCostOtherEU~ "Conditional. Indicate whether the costs of basic mea~
#> 9 swTechnicalInfeasibility "Required. Report how ‘technical infeasibility’ has be~
#> 10 swNaturalConditions "Required. Select from the enumeration list the eleme~
#> 11 swExemption46 "Required. Select from the enumeration list the reason~
#> 12 swExemption47 "Required. Select from the enumeration list the modif~
#> 13 swExemptionsTransboundary "Required. Indicate whether the application of exempt~
#> 14 swExemptionsReference "Required. Provide references or hyperlinks to the re~
#> 15 driversSWExemptionsReference "Required. Provide references or hyperlinks to the re~