У меня есть div
с 2 p
тегами.
Мне нужно получить текст второго из этих p
элементов.
<div class="fb-price-list">
<p class="fb-price">S/ 1,699 (Internet)</p>
<p class="fb-price">S/ 2,399 (Normal)</p>
</div>
ожидаемый результат:
S/ 2,399 (Normal)
Я имею это, но не работает:
tvs_url <- read_html("https://www.falabella.com.pe/falabella-pe/category/cat210477/TV-Televisores?page=1")
product_price_actual <- tvs_url %>%
html_nodes('div.pod-group pod-group__large-pod div.pod-body div.fb-price-list p.fb-price:nth-child(2)') %>%
html_text()
HTML:
<div class="pod-item"><div class="fb-form__input--checkbox fb-pod__item__compare"><input id="fb-pod__item__input-16754140" class="fb-pod__item__compare__input" type="checkbox" name="fb-pod__item__input-16754140" value="16754140"><label for="fb-pod__item__input-16754140" class="fb-pod__item__compare__label">Comparar</label></div><div class="pod-head"><a class="pod-head__image" href="/falabella-pe/product/16754140/LED-UHD-4K-55-Smart-TV-UN55RU7100GXPE-SERIE-RU7100/16754140"><div class="content__image"><img src="//falabella.scene7.com/is/image/FalabellaPE/16754140?wid=544&hei=544&qlt=70&anchor=750,750&crop=0,0,0,0" alt="img" class="image"></div></a><a href="/falabella-pe/product/16754140/LED-UHD-4K-55-Smart-TV-UN55RU7100GXPE-SERIE-RU7100/16754140" class="pod-head__stickerslink"><div class="pod-head__stickers"><div class="fb-responsive-flag fb-responsive-stylised-caps fb-pod__flag fb-pod__flag--percentoff" data-discount-content="">29%</div></div></a></div><div class="pod-body"><a class="section__pod-top" href="/falabella-pe/product/16754140/LED-UHD-4K-55-Smart-TV-UN55RU7100GXPE-SERIE-RU7100/16754140"><div class="section__pod-top-brand">SAMSUNG</div><div class="section__pod-top-title"><div class="LinesEllipsis ">LED UHD 4K 55" Smart TV UN55RU7100GXPE SERIE RU7100<wbr></div></div></a><div class="section__pod-middle"><div class="section__pod-middle-content__stickers"><div class="fb-responsive-flag fb-responsive-stylised-caps fb-pod__flag fb-pod__flag--percentoff" data-discount-content="">29%</div></div><div class="section__information"><a class="section__information-link" href="/falabella-pe/product/16754140/LED-UHD-4K-55-Smart-TV-UN55RU7100GXPE-SERIE-RU7100/16754140"><div class="fb-price-list"><p class="fb-price">S/ 1,699 (Internet)</p><p class="fb-price">S/ 2,399 (Normal)</p></div></a></div><div class="section__pod-middle-content__button"><button class="btn-add-to-basket">AGREGAR A TU BOLSA</button></div></div><div class="section__pod-bottom"><div class="fb-pod__rating" style="visibility: hidden;"><a href="/falabella-pe/product/16754140/LED-UHD-4K-55-Smart-TV-UN55RU7100GXPE-SERIE-RU7100/16754140#comments"><div class="fb-rating-stars"><div class="fb-rating-stars__container"><div class="fb-rating-stars__holder"><span class=""><i class="icon-rating"></i></span></div><div class="fb-rating-stars__holder"><span class=""><i class="icon-rating"></i></span></div><div class="fb-rating-stars__holder"><span class=""><i class="icon-rating"></i></span></div><div class="fb-rating-stars__holder"><span class=""><i class="icon-rating"></i></span></div><div class="fb-rating-stars__holder"><span class=""><i class="icon-rating"></i></span></div><p class="fb-rating-stars__count">0 <span class="fb-rating-stars__count__max"> / 5</span></p></div></div></a></div><a class="section__pod-bottom-descriptionlink" href="/falabella-pe/product/16754140/LED-UHD-4K-55-Smart-TV-UN55RU7100GXPE-SERIE-RU7100/16754140"><ul class="section__pod-bottom-description"><li>Modelo: UN55RU7100GXPE</li><li>Tamaño de la pantalla: 55"</li><li>Resolución: 4K Ultra HD</li><li>Tecnología: Led</li><li>Conexión bluetooth: Sí</li></ul></a></div></div></div>
ОБНОВЛЕНИЕ 1:
Основываясь на выбранном ответе, который я использовал ifelse
, чтобы проверить количество символов для данной позиции:
Позиция, подлежащая контролю, является 4-ой, когда нет precio_antes (до цены), эта позиция занята другим элементом, поэтому нам нужно поставить NA
в таких случаях:
ifelse(nchar(sapply(splitted, "[", 4))>3, NA, sapply(splitted, "[", 6))
Как я строю финал df:
df <- data.frame(
brand = sapply(splitted, "[", 2), #We don't need the "comparar" text so we start from 2
product = sapply(splitted, "[", 3),
precio_antes = ifelse(nchar(sapply(splitted, "[", 4))>3, NA, sapply(splitted, "[", 6)),
precio_actual = ifelse(nchar(sapply(splitted, "[", 4))<=3, sapply(splitted, "[", 5), sapply(splitted, "[", 4))
)