У меня есть данные, по которым я хочу разделить строки.
df <- data.frame(text=c("Lately, I haven't been able to view my Online Payment Card. It's prompting me to have to upgrade my account whereas before it didn't. I have used the Card at various online stores before and have successfully used it. But now it's starting to get very frustrating that I have to said \"upgrade\" my account. Do fix this... **I noticed some users have the same issue..","I've been using this app for almost 2 years without any problems. Until, their system just blocked my virtual paying card without any notice. So, I was forced to apply for an upgrade and it was rejected thrice, despite providing all of my available IDs. This app has been a big disappointment."), id=c(1,2), stringsAsFactors = FALSE)
Я хочу разбить предложения в текстовом столбце и получить следующее:
df <- data.frame (text = c("Lately, I haven't been able to view my Online Payment Card. It's prompting me to have to upgrade my account whereas before it didn't. I have used the Card at various online stores before and have successfully used it. But now it's starting to get very frustrating that I have to said \"upgrade\" my account. Do fix this... **I noticed some users have the same issue..",
"I've been using this app for almost 2 years without any problems. Until, their system just blocked my virtual paying card without any notice. So, I was forced to apply for an upgrade and it was rejected thrice, despite providing all of my available IDs. This app has been a big disappointment.",
"Lately, I haven't been able to view my Online Payment Card.",
"It's prompting me to have to upgrade my account whereas before it didn't.",
"I have used the Card at various online stores before and have successfully used it.",
"But now it's starting to get very frustrating that I have to said upgrade my account.",
"Do fix this|", "**I noticed some users have the same issue|",
"I've been using this app for almost 2 years without any problems.",
"Until, their system just blocked my virtual paying card without any notice.",
"So, I was forced to apply for an upgrade and it was rejected thrice, despite providing all of my available IDs.",
"This app has been a big disappointment."), id = c(1, 2, 1, 1,
1, 1, 1, 1, 2, 2, 2, 2), tag = c("DONE", "DONE", NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA), stringsAsFactors = FALSE)
Я сделал это с помощью этого кода, однако я думаю, что -l oop так медленно. Мне нужно сделать это для 73 000 строк. Поэтому мне нужен более быстрый подход. Попытка 1:
library("qdap")
df$tag <- NA
for (review_num in 1:nrow(df)) {
x = sent_detect(df$text[review_num])
if (length(x) > 1) {
for (sentence_num in 1:length(x)) {
df <- rbind(df, df[review_num,])
df$text[nrow(df)] <- x[sentence_num]
}
df$tag[review_num] <- "DONE"
}
}
Попытка 2: строки: 73000, затраченное время: 252 минуты или ~ 4 часа
reviews_df1 <- data.frame(id=character(0), text=character(0))
for (review_num in 1:nrow(df)) {
preprocess_sent <- sent_detect(df$text[review_num])
if (length(preprocess_sent) > 0) {
x <- data.frame(id=df$id[review_num],
text=preprocess_sent)
reviews_df <- rbind(reviews_df1, x)
}
colnames(reviews_df) <- c("id", "text")
}
Попытка 3: строки: 29000, затраченное время: 170 минут или ~ 2,8 часа
library(qdap)
library(dplyr)
library(tidyr)
df <- data.frame(text=c("Lately, I haven't been able to view my Online Payment Card. It's prompting me to have to upgrade my account whereas before it didn't. I have used the Card at various online stores before and have successfully used it. But now it's starting to get very frustrating that I have to said \"upgrade\" my account. Do fix this... **I noticed some users have the same issue..","I've been using this app for almost 2 years without any problems. Until, their system just blocked my virtual paying card without any notice. So, I was forced to apply for an upgrade and it was rejected thrice, despite providing all of my available IDs. This app has been a big disappointment."), id=c(1,2), stringsAsFactors = FALSE)
df %>%
group_by(text) %>%
mutate(sentences = list(sent_detect(df$text))) %>%
unnest(cols=sentences) -> out.df
out.df