я хочу проанализировать данные из нескольких тегов json + id, используя python с веб-сайта - PullRequest
0 голосов
/ 05 января 2020

Я ищу кого-то, кто может мне помочь. Я хочу проанализировать данные из тегов JSON + ID. на этом сайте есть 3 разных тега JSON + ID, которые я хочу получить из 1-го тега. но он автоматически анализирует данные из второго тега, который мне не нужен. есть ссылка https://www.wayfair.com/home-improvement/pdp/orren-ellis-lina-36-single-bathroom-vanity-set-oris2048.html

вот мой код

import requests as re
from bs4 import BeautifulSoup as bs
import json
import csv


source=re.get("https://www.wayfair.com/home-improvement/pdp/orren-ellis-lina-36-single-bathroom-vanity-set-oris2048.html")
parse=bs(source.text,'html.parser')
js = json.loads(parse.find("script",type="application/ld+json").text)
print(js)

только мне дают этот вывод

<script type="application/ld+json">{"@context":"http://schema.org","@type":"WebSite","name":"Wayfair","url":"https://www.wayfair.com"}</script>

но я хочу этот тег json + данные тега id, а не другой

<script type="application/ld+json">{"@context":"http://schema.org","@type":"Product","name":"Lina 36\" Single Bathroom Vanity Set","brand":"Orren Ellis","sku":"ORIS2048","url":"https://www.wayfair.com/home-improvement/pdp/orren-ellis-lina-36-single-bathroom-vanity-set-oris2048.html","image":"https://secure.img1-fg.wfcdn.com/im/68800231/compr-r85/3930/39308112/lina-36-single-bathroom-vanity-set.jpg","aggregateRating":{"@type":"AggregateRating","reviewCount":67,"ratingValue":4.5},"offers":{"@type":"Offer","availability":"http://schema.org/InStock","priceCurrency":"USD","price":476.99},"review":[{"@type":"Review","author":"Christine","datePublished":"2018-12-02","description":"Love this decision. Delivery was a breeze and it looks very expensive compared to the price. Quality is great. My contractor added the marble backsplash to protect the walls. I wish the drawers were feels lighter for my hair products to stand up, but that would have compromised the look. Really happy with the deidsion.","reviewRating":{"@type":"Rating","bestRating":5,"ratingValue":5,"worstRating":1}},{"@type":"Review","author":"Darryle","datePublished":"2019-04-22","description":"Very happy with this vanity, exactly what we wanted and were looking for. Have to do some more work on the bathroom, including the wall (so don't mind the mess)--so it can hang evenly, and then we'll fully attach the sink, that's why there looks like there is a gap between the cabinets and sink top, in case you notice. Wanted to include pictures because I appreciate them when I'm shopping. Would definitely recommend!","reviewRating":{"@type":"Rating","bestRating":5,"ratingValue":5,"worstRating":1}},{"@type":"Review","author":"Lisa","datePublished":"2019-07-12","description":"I love the wide vanity top when doing my makeup and hair.  The drawers are deep and can hold plenty of toiletries and towels.","reviewRating":{"@type":"Rating","bestRating":5,"ratingValue":5,"worstRating":1}},{"@type":"Review","author":"Angela","datePublished":"2019-08-11","description":"Perfect for a small bathroom. However,,the color rosewood I ordered is a walnut color.","reviewRating":{"@type":"Rating","bestRating":5,"ratingValue":5,"worstRating":1}},{"@type":"Review","author":"Marianela","datePublished":"2019-05-04","description":"Oh so so beautiful and well made.","reviewRating":{"@type":"Rating","bestRating":5,"ratingValue":5,"worstRating":1}},{"@type":"Review","author":"Howard","datePublished":"2019-04-16","description":"Great vanity.  The finish is fantastic.   We are very pleased with our choice.","reviewRating":{"@type":"Rating","bestRating":5,"ratingValue":5,"worstRating":1}},{"@type":"Review","author":"Corinne","datePublished":"2018-10-17","description":"Great looking vanity! Sink has straight modern lines.","reviewRating":{"@type":"Rating","bestRating":5,"ratingValue":5,"worstRating":1}},{"@type":"Review","author":"Mara I.","datePublished":"2018-11-20","description":"Beautifull","reviewRating":{"@type":"Rating","bestRating":5,"ratingValue":5,"worstRating":1}},{"@type":"Review","author":"Argentina","datePublished":"2019-08-25","description":"It\u2019s perfect... i love it","reviewRating":{"@type":"Rating","bestRating":5,"ratingValue":5,"worstRating":1}},{"@type":"Review","author":"Benjamin","datePublished":"2019-08-24","description":"Clean, modern style.  Easy install","reviewRating":{"@type":"Rating","bestRating":5,"ratingValue":5,"worstRating":1}}]}</script>

1 Ответ

0 голосов
/ 06 января 2020

Требуемые данные можно получить, только добавив к запросу повара ie. Попробуйте следующие методы

from simplified_scrapy.simplified_doc import SimplifiedDoc 
from simplified_scrapy.request import req
# first request, get cookies
html = req.get('https://www.wayfair.com/home-improvement/pdp/orren-ellis-lina-36-single-bathroom-vanity-set-oris2048.html')
# second request, get the aim content
html = req.get('https://www.wayfair.com/home-improvement/pdp/orren-ellis-lina-36-single-bathroom-vanity-set-oris2048.html')
doc = SimplifiedDoc(html)
item1 = doc.getElement('script',attr='type',value='application/ld+json') 
print(item1.html) 
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...