я webscraping с bs4 и URL не будет отображаться - PullRequest
1 голос
/ 30 мая 2019

Я новичок в webscaping и хотел отсканировать все charackterportraits с сайта lol , и когда я просмотрел одну из картинок в браузере, она была в теге "img scr =" url "и я хочу получить URL для загрузки изображения, но когда я делаю soup.select ('img [src]') или soup.select ('img'), он возвращает пустой список, и я не знаю, почему

вот код:

data=requests.get(website)
data.raise_for_status()


soup = bs4.BeautifulSoup(data.text,"lxml")
print(soup)
#soup returns html    


elems = soup.select('img[src]')
print(elems)
#elems returns an empty list

Ответы [ 3 ]

3 голосов
/ 30 мая 2019

Это может быть возможно сделать с запросом, но кажется, что ваш запрос get не получает полный pageSource.

Вы можете решить эту проблему, используя селен, чтобы просто получить содержимое.

from selenium import webdriver
import bs4

driver = webdriver.Chrome()
driver.get('https://na.leagueoflegends.com/en/game-info/champions/')
page_source = driver.page_source
driver.close()
soup = bs4.BeautifulSoup(page_source, "lxml")
print(soup)

elems = soup.find_all('img')
for elem in elems:
    print(elem.attrs['src'])

Выход:

https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Aatrox.png
https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Ahri.png
https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Akali.png
https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Alistar.png
https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Amumu.png
https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Anivia.png
...
1 голос
/ 30 мая 2019

Используйте ту же конечную точку, что и страница.Найдите его на вкладке сети

import requests 

base = 'https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/'
r = requests.get('https://ddragon.leagueoflegends.com/cdn/9.11.1/data/en_US/champion.json').json()
images = [base + r['data'][item]['image']['full'] for item in r['data']]
print(images)
0 голосов
/ 30 мая 2019

Вот ваш ответ

import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(requests.get("https://na.leagueoflegends.com/en/game-info/champions/").text, 'lxml')
soup.find_all('link')    #these are your tags eg: a , script link 

OUTPUT:
Out[21]: 
[<a href="/en/game-info/get-started/">Get Started</a>,
 <a href="/en/game-info/get-started/what-is-lol/">What is League of Legends?</a>,
 <a href="https://na.leagueoflegends.com/en/site/guide/index.html">New Player Guide</a>,
 <a href="/en/game-info/get-started/chat-commands/">Chat Commands</a>,
 <a href="/en/game-info/get-started/community-interaction/">Community Interaction</a>,
 <a href="/en/featured/summoners-code">The Summoner's Code</a>,
 <a href="/en/game-info/champions/">Champions</a>,
 <a href="/en/game-info/items/">Items</a>,
 <a href="/en/game-info/summoners/">Summoners</a>,
 <a href="/en/game-info/summoners/spells/">Summoner Spells</a>,
 <a href="/en/game-info/game-modes/">Game Modes</a>,
 <a href="/en/game-info/game-modes/summoners-rift/">Summoner's Rift</a>,
 <a href="/en/game-info/game-modes/the-twisted-treeline/">The Twisted Treeline</a>,
 <a href="/en/game-info/game-modes/howling-abyss/">Howling Abyss</a>,
 <a href="//na.leagueoflegends.com/en/">Home</a>,
 <a href="/en/game-info/">Game Info</a>]

soup = BeautifulSoup(requests.get("https://na.leagueoflegends.com/en/game-info/champions/").text, 'lxml')
soup.find_all('script')
Out[22]: 
soup = BeautifulSoup(requests.get("https://na.leagueoflegends.com/en/game-info/champions/").text, 'lxml')
soup.find_all('a')
[<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='//www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-N98J');</script>,
 <script>window.ga = window.ga || function(){(ga.q=ga.q||[]).push(arguments)};ga.l = +new Date;</script>,
 <script src="https://lolstatic-a.akamaihd.net/lolkit/1.1.6/modernizr.js" type="text/javascript"></script>,
 <script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>,
 <script src="https://lolstatic-a.akamaihd.net/lolkit/1.1.6/riot-all.js" type="text/javascript"></script>,
 <script src="https://lolstatic-a.akamaihd.net/lolkit/1.1.6/riot-kit-all.js" type="text/javascript"></script>,
 <script type="text/javascript">rg_force_language = 'en_US';rg_force_manifest = 'https://ddragon.leagueoflegends.com/realms/na.js';rg_assets = 'https://lolstatic-a.akamaihd.net/game-info/1.1.9';</script>,
 <script type="text/javascript">window.riotBarConfig = {touchpoints: {activeTouchpoint: 'game'},locale: {landingUrlPattern : 'https://na.leagueoflegends.com//game-info/'},footer: {enabled: true,container: {renderFooterInto: '#footer'}}};</script>,
 <script async="" src="https://lolstatic-a.akamaihd.net/riotbar/prod/latest/en_US.js"></script>,
 <script src="https://ddragon.leagueoflegends.com/cdn/dragonhead.js" type="text/javascript"></script>,
 <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/riot-dd-utils.js" type="text/javascript"></script>,
 <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/riot-dd-i18n.js" type="text/javascript"></script>,
 <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/external/jquery.lazy-load.js" type="text/javascript"></script>,
 <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/DDFilterApp.js" type="text/javascript"></script>,
 <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/DDMarkupItem.js" type="text/javascript"></script>,
 <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/DDMarkupContainer.js" type="text/javascript"></script>,
 <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/champions/ChampionsListGridItem.js" type="text/javascript"></script>,
 <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/champions/ChampionsListGridView.js" type="text/javascript"></script>,
 <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/champions/ChampionsListApp.js" type="text/javascript"></script>]

soup = BeautifulSoup(requests.get("https://na.leagueoflegends.com/en/game-info/champions/").text, 'lxml')
soup.find_all('link')
Out[23]: 
[<link href="https://lolstatic-a.akamaihd.net/lolkit/1.1.6/lol-kit.css" rel="stylesheet"/>,
 <link href="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/css/base-styles.css" rel="stylesheet"/>,
 <link href="https://lolstatic-a.akamaihd.net/lolkit/1.1.6/resources/images/favicon.ico" rel="SHORTCUT ICON"/>]
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...