findAssocs
возвращает <0 строк> (или 0-длина строки.имя) для всех моих входных данных, и я недостаточно умен, чтобы понять, почему. Я разместил код внизу, чтобы показать вам мои шаги, заранее спасибо!
читать в файлах
text <- readLines(list.files())
text <- readLines(file.choose())
text <- pdf_text(file.choose())
коллапс
text <- paste(unlist(text), collapse ="")
преобразовать в корпус
docs <- Corpus(VectorSource(text))`
чистый
toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))
docs <- tm_map(docs, toSpace, "/")
docs <- tm_map(docs, toSpace, "@")
docs <- tm_map(docs, toSpace, "\\|")
docs <- tm_map(docs, content_transformer(tolower))
docs <- tm_map(docs, removeNumbers)
docs <- tm_map(docs, removeWords, stopwords("dutch"))
docs <- tm_map(docs, removeWords, stopwords("english"))
docs <- tm_map(docs, removePunctuation)
docs <- tm_map(docs, stripWhitespace)
ЦМР
dtm <- TermDocumentMatrix(docs)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
ассоциированные термины
as.data.frame(findAssocs(dtm, terms = "security", corlimit = 0.3))
SUBSET:
[976] "how to utilize public or hybrid clouds without"
[977] "compromising the overall security posture and at"
[978] "the same time leveraging cost and scaling benefits"
[979] "of the cloud."
[980] ""
[981] "If we take a holistic view of the non-technical challenges, it boils down to how the mid-to-large sized"
[982] "IT departments are organized. The typical organization is divided into for example a network team,"
[983] "a server team, a security team and an application"
[984] "team working perfectly within their respective silo"
[985] "but rarely frictionless between the silos."