Как посчитать частоту уникальных слов в столбце? - PullRequest
0 голосов
/ 28 апреля 2019

Мне нужно посчитать частоту уникальных слов, найденных в столбце, который содержит описания в каждой строке.

До сих пор я исключил список стоп-слов из исходного столбца и извлек уникальные слова из столбца и поместил их в список с именем unique_description.

> description[1:5]
[1] "Come stay Vinh & Stuart (Awarded one Australia's top hosts Airbnb CEO Brian Chesky & key shareholder Ashton Kutcher. We're Sydney's #1 reviewed hosts ). Find out 've positively reviewed 500+ times. Message talk first BEFORE make reservation request - And please read listing end (hint hint). Everything need know . We're pretty relaxed hosts, fully appreciate staying someone , home home, -one. This business, hotel. We're casual Airbnb hosts, hoteliers. If 're looking alternative expensive hotel, 're . Here 'll treated same way treat family & friends stay. So... fluffy bathrobes... Please hello message *BEFORE* make reservation request... It'll help speed things up, smooth things out... Please read listing way end. It make getting confirmed reserv"                                   
[2] "Beautifully renovated, spacious quiet, 3 Bedroom, 3 Bathroom home 10 minute walk beaches Fairlight Forty Baskets, 30 minute walk Manly via coastal promenade, Express bus runs 20 mins door. Our home thirty minute walk along seashore promenade Manly, one Sydney's beautiful beaches, village restaurants, cafes, shopping. If prefer more variety, Manly ferry take Sydney CBD 15 minutes. The residence sited sought- family-friendly street short stroll nearby North Harbour reserve Forty Baskets cafe beach. It's short walk further express CBD buses, ferries, Manly entertainment. Or bus (#131 #132) around corner drops Manly 8 minutes. Our home features stainless steel galley kitchen, including Ilve oven gas cooktop. We two separate living areas ground floor. The front lounge enjoys P&O" 
[3] "Welcome sanctuary - bright, comfortable one bedroom apartment North Sydney. Free Wifi, heated pool/jacuzzi everything need make stay Sydney very comfortable. Enjoy fabulous Home away home, fantastic stay Sydney! The apartment within walking distance restaurants shops, Luna Park North Sydney business district. Access Sydney CBD easy bus, train, taxi ferry. It short bus ride famous Balmoral Beach Taronga Zoo. My apartment situated North Sydney 3 kms Sydney CBD. Here details apartment: You'll enjoy being centrally located couple blocks away train station go anywhere quickly Sydney. The apartment features several windows tons natural light. It comfortable fully stocked. Here's I here: LIVING ROOM: 50\" LCD TV DVD / blu ray player CD/Radio/Blue tooth syncing w"                        
[4] "Fully self-contained sunny studio apartment. 10mn walk Bondi beach. Bus city door. Private 13m swimming pool. Sunny, studio apartment . Private terrace. bus door Bondi Junction City Ground floor 1 bedroom double bed plus kitchenette & study desk. shower & toilet, share laundry, kitchen facilities Swimming pool 13m. Separate security private entrance Private entrance. Ground floor. Happy indicate best spots walking, dining, entertaining best sightseeing location Sydney. Upmarket area. Very nice quiet neighbourhood . Very safe place. Bus door city."                                                                                                                                                                                                                                             
[5] "Sunny warehouse/loft apartment heart one Sydney's best neighbourhoods. Located corner two iconic Darlinghurst streets famous laneway bars eateries, footsteps equally amazing Surry Hills Potts Point. Walk through beautiful parks city less 10 mins, opera house 20 access Bondi Beach easily 25 via bus stop directly front building. My apartment beautiful, simple, open plan / one bedroom loft soaring high ceilings hardwood floors hint 's previous life printing factory 1940s. It huge windows flood space glorious sunshine throughout day provide refreshing breeze during summer. A few key features: * Wireless harman/kardon aura stereo system stream music wirelessly bluetooth device * Internal laundry washer dryer * The kitchen equipped gas cooking, microwave, dishwasher basics preparing m"
> unique_description[1:10]
 [1] "Come"        "stay"        "Vinh"        "&"           "Stuart"      "(Awarded"   
 [7] "one"         "Australia's" "top"         "hosts"      

Я не уверен, как подсчитать частоту слов в unique_description, которые находятся в столбце «описание».Я пытался использовать freq_terms в библиотеке (qdap), но qdap не загружается для меня, поэтому я пытаюсь найти другой путь.

1 Ответ

0 голосов
/ 28 апреля 2019

Вы можете использовать пакет stringr.

x <- "Come stay Vinh & Stuart (Awarded one Australia's top hosts Airbnb CEO Brian Chesky & key shareholder Ashton Kutcher. We're Sydney's #1 reviewed hosts ). Find out 've positively reviewed 500+ times. Message talk first BEFORE make reservation request - And please read listing end (hint hint). Everything need know . We're pretty relaxed hosts, fully appreciate staying someone , home home, -one. This business, hotel. We're casual Airbnb hosts, hoteliers. If 're looking alternative expensive hotel, 're . Here 'll treated same way treat family & friends stay. So... fluffy bathrobes... Please hello message *BEFORE* make reservation request... It'll help speed things up, smooth things out... Please read listing way end. It make getting confirmed reserv"                                   
y <- "come stay Stuart"

unique_desc <- c("come", "stay", "Stuart")
desc <- c(x,y)

result <- lapply(desc, FUN = str_count, pattern = unique_desc)
#result holds first element is counts in first element of desc

lapply вызовет функцию str_count для каждого элемента desc. В этом примере в i-й записи результата есть вектор счетчиков, соответствующий i-й записи в деск, а вектор счетчиков соответствует количеству каждого слова в unique_desc.

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.