Как насчет чего-то подобного?
library(xml2)
library(rvest)
library(tidyverse)
url <- "http://pitchfork.com/reviews/albums/grimes-miss-anthropocene"
html <- read_html(url)
review <- html %>%
xml_nodes("p") %>%
html_text() %>%
enframe("paragraph_no", "text")
review
## A tibble: 14 x 2
# paragraph_no text
# <int> <chr>
# 1 1 Best new music
# 2 2 Grimes’ first project as a bona fide pop star is more morose th…
# 3 3 In 2011, Grimes was eager to say in an interview that she had “…
# 4 4 Miss Anthropocene is Grimes’ fifth album and her first as that …
# 5 5 The result is a record that’s more morose than her previous wor…
# 6 6 In November 2018, Grimes released “We Appreciate Power,” a coll…
# 7 7 When Grimes veers away from high concept toward examining intim…
# 8 8 Miss Anthropocene thrills when it reveals a refined, linear evo…
# 9 9 So much about the actual music of Miss Anthropocene succeeds th…
#10 10 And that’s the obstacle, the slimy mouthfeel, standing in the w…
#11 11 Correction: An earlier version of this review erroneously state…
#12 12 Listen to our Best New Music playlist on Spotify and Apple Musi…
#13 13 Buy: Rough Trade
#14 14 (Pitchfork may earn a commission from purchases made through af…
review
- это tibble
и содержит обзор, разделенный по абзацам; может потребоваться дополнительная очистка (например, удаление первой и последней строки).
Для оценки можно использовать селектор атрибутов класса
score <- html %>% xml_nodes("[class='score']") %>% html_text() %>% as.numeric()
score
#[1] 8.2
Обтекание up (в функции)
Обернем все в function
, который возвращает list
с обзором tibble
и цифрой c счет.
get_pitchfork_data <- function(url) {
html <- read_html(url)
list(
review = html %>%
xml_nodes("p") %>%
html_text() %>%
trimws() %>%
enframe("paragraph_no", "text"),
score = html %>%
xml_nodes("[class='score']") %>%
html_text() %>%
as.numeric())
}
Тест 1 :
Граймс - Мисс Антропоцен
get_pitchfork_data("http://pitchfork.com/reviews/albums/grimes-miss-anthropocene")
#$review
## A tibble: 14 x 2
# paragraph_no text
# <int> <chr>
# 1 1 Best new music
# 2 2 Grimes’ first project as a bona fide pop star is more morose th…
# 3 3 In 2011, Grimes was eager to say in an interview that she had “…
# 4 4 Miss Anthropocene is Grimes’ fifth album and her first as that …
# 5 5 The result is a record that’s more morose than her previous wor…
# 6 6 In November 2018, Grimes released “We Appreciate Power,” a coll…
# 7 7 When Grimes veers away from high concept toward examining intim…
# 8 8 Miss Anthropocene thrills when it reveals a refined, linear evo…
# 9 9 So much about the actual music of Miss Anthropocene succeeds th…
#10 10 And that’s the obstacle, the slimy mouthfeel, standing in the w…
#11 11 Correction: An earlier version of this review erroneously state…
#12 12 Listen to our Best New Music playlist on Spotify and Apple Musi…
#13 13 Buy: Rough Trade
#14 14 (Pitchfork may earn a commission from purchases made through af…
#
#$score
#[1] 8.2
Тест 2:
Радиоголовка - OK Компьютер (переиздание)
get_pitchfork_data("https://pitchfork.com/reviews/albums/radiohead-ok-computer-oknotok-1997-2017/")
#$review
## A tibble: 12 x 2
# paragraph_no text
# <int> <chr>
# 1 1 Best new reissue
# 2 2 Twenty years on, Radiohead revisit their 1997 masterpiece with …
# 3 3 As they regrouped to figure out what their third album might be…
# 4 4 It’s still funny to think, two decades later, that Thom Yorke’s…
# 5 5 It’s unclear what happened to that album. OK Computer obviously…
# 6 6 OKNOTOK is something a little more interesting than a remaster …
# 7 7 But “Lift’s” reputation for positivity might be a little confus…
# 8 8 The most fun to be had with OKNOTOK is in these line-blurring m…
# 9 9 This fondness for camp and schlock has always been latent in Ra…
#10 10 The ghost of Bond followed them once they decamped from their s…
#11 11 Radiohead have been at least as brilliant at packaging and posi…
#12 12 Now that they have arrived at an autumnal, valedictory stage in…
#
#$score
#[1] 10