Как использовать python, чтобы нажать «загрузить еще», чтобы извлечь ссылки имен - PullRequest
0 голосов
/ 24 сентября 2019

Я хочу получить ссылки на имена со всех страниц, нажав «Загрузить еще», и мне нужна помощь с нумерацией страниц

У меня есть логика для печати ссылок на имена, но мне нужна помощь с нумерацией страниц

for pos in positions:
    url = "https://247sports.com/Season/2021-Football/CompositeRecruitRankings/?InstitutionGroup=HighSchool"
    two = requests.get("https://247sports.com/Season/2021-Football/CompositeRecruitRankings/?InstitutionGroup=HighSchool" + pos,headers=HEADERS)
    bsObj = BeautifulSoup(two.content , 'lxml')
    main_content = urljoin(url,bsObj.select(".data-js")[1]['href'])  ## ['href']InstitutionGroup" extracting the link leading to the page containing everything available here
    response = requests.get(main_content)
    obj = BeautifulSoup(response.content , 'lxml')
    names = obj.findAll("div",{"class" : "recruit"})

for player_name in names:
    player_name.find('a',{'class' : ' rankings-page__name-link'})    
    for all_players in player_name.find_all('a', href=True):

        player_urls = site + all_players.get('href')
       # print(player_urls)

Ожидается вывод: https://247sports.com/Player/Jack-Sawyer-46049925/ (ссылки всех имен игроков)

1 Ответ

0 голосов
/ 24 сентября 2019

Можно просто перебирать параметры в запросах.Поскольку вы можете просто продолжать итерации вечно, я проверил, когда игроки начали повторять (по существу, когда следующая итерация не добавляет новых игроков).Кажется, чтобы остановиться после 21 страницы, которая дает 960 игроков.

import requests
from bs4 import BeautifulSoup

url = 'https://247sports.com/Season/2021-Football/CompositeRecruitRankings/'

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36'}

player_links = []
prior_count = 0
for page in range(1,101):
    #print ('Page: %s' %page)
    payload = {
    'ViewPath': '~/Views/SkyNet/PlayerSportRanking/_SimpleSetForSeason.ascx',
    'InstitutionGroup': 'HighSchool',
    'Page': '%s' %page}

    response = requests.get(url, headers=headers, params=payload)
    soup = BeautifulSoup(response.text, 'html.parser')


    recruits = soup.find_all('div',{'class':'recruit'})
    for recruit in recruits:
        print ('https://247sports.com' + recruit.find('a')['href'])
        player_links.append('https://247sports.com' + recruit.find('a')['href'])

    current_count = len(list(set(player_links)))
    if prior_count == current_count:
        print ('No more players')
        break
    else:
        prior_count = current_count

Вывод:

print (player_links)
['https://247sports.com/Player/Korey-Foreman-46056100', 'https://247sports.com/Player/Jack-Sawyer-46049925', 'https://247sports.com/Player/Tommy-Brockermeyer-46040211', 'https://247sports.com/Player/James-Williams-46049981', 'https://247sports.com/Player/Payton-Page-46055295', 'https://247sports.com/Player/Camar-Wheaton-46050152', 'https://247sports.com/Player/Brock-Vandagriff-46050870', 'https://247sports.com/Player/JT-Tuimoloau-46048440', 'https://247sports.com/Player/Emeka-Egbuka-46048438', 'https://247sports.com/Player/Tony-Grimes-46048912', 'https://247sports.com/Player/Sam-Huard-46048437', 'https://247sports.com/Player/Amarius-Mims-46079928', 'https://247sports.com/Player/Savion-Byrd-46078964', 'https://247sports.com/Player/Jake-Garcia-46053996', 'https://247sports.com/Player/Agiye-Hall-46055274', 'https://247sports.com/Player/Caleb-Williams-46040610', 'https://247sports.com/Player/JJ-McCarthy-46042742', 'https://247sports.com/Player/Dylan-Brooks-46079585', 'https://247sports.com/Player/Nolan-Rucci-46058902', 'https://247sports.com/Player/GaQuincy-McKinstry-46052990', 'https://247sports.com/Player/Will-Shipley-46056925', 'https://247sports.com/Player/Maason-Smith-46057128', 'https://247sports.com/Player/Isaiah-Johnson-46050757', 'https://247sports.com/Player/Landon-Jackson-46049327', 'https://247sports.com/Player/Tunmise-Adeleye-46050288', 'https://247sports.com/Player/Terrence-Lewis-46058521', 'https://247sports.com/Player/Lee-Hunter-46058922', 'https://247sports.com/Player/Raesjon-Davis-46056065', 'https://247sports.com/Player/Kyle-McCord-46047962', 'https://247sports.com/Player/Beaux-Collins-46049126', 'https://247sports.com/Player/Landon-Tengwall-46048781', 'https://247sports.com/Player/Smael-Mondon-46058273', 'https://247sports.com/Player/Derrick-Davis-Jr-46049676', 'https://247sports.com/Player/Troy-Franklin-46048840', 'https://247sports.com/Player/Tywone-Malone-46081337', 'https://247sports.com/Player/Micah-Morris-46051663', 'https://247sports.com/Player/Donte-Thornton-46056489', 'https://247sports.com/Player/Bryce-Langston-46050326', 'https://247sports.com/Player/Damon-Payne-46041148', 'https://247sports.com/Player/Rocco-Spindler-46049869', 'https://247sports.com/Player/David-Daniel-46076804', 'https://247sports.com/Player/Branden-Jennings-46049721', 'https://247sports.com/Player/JaTavion-Sanders-46058800', 'https://247sports.com/Player/Chris-Hilton-46055801', 'https://247sports.com/Player/Jason-Marshall-46051367', ... ]
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...