Сначала проверьте сетевые запросы в Dev Tools ( нажмите F12 в Chromes ) и проследите за полезной нагрузкой. В вашем запросе отсутствуют биты данных.
Причина отсутствия данных формы состоит в том, что они добавляются с помощью JavaScript (когда пользователь щелкает номер страницы). Как только данные формы установлены, появляется JavaScript, который выполняет следующее:
xmlRequest.open("POST", action, true);
xmlRequest.setRequestHeader("Content-Type", "application/x-www-form-urlencoded; charset=utf-8");
xmlRequest.send(postData);
Итак, все, что вам нужно сделать, это эмулировать это в вашем Python скрипте. Хотя похоже, что для функции разбиения по страницам требуется только два дополнительных значения __CALLBACKID
и __CALLBACKPARAM
В следующем примере; Я скопировал первые 4 страницы (примечание: первый пост - это просто целевая страница):
import requests
from bs4 import BeautifulSoup
link = "http://surrogateweb.co.ocean.nj.us/BluestoneWeb/Default.aspx"
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
r = s.get(link)
r.raise_for_status()
soup = BeautifulSoup(r.text,"lxml")
payload = {i['name']:i.get('value','') for i in soup.select('input[name],select')}
for k,v in payload.items():
if k.endswith('ComboBox_case_type'):
payload[k] = "Probate"
elif k.endswith('ComboBox_case_type_VI'):
payload[k] = "WILL"
elif k.endswith('ComboBox_case_type$DDD$L'):
payload[k] = "WILL"
elif k.endswith('ComboBox_town$DDD$L'):
payload[k] = "%"
page_id_list = ['PN0','PN1', 'PN2', 'PN3'] # TODO: This is proof of concept. You need to refactor code. Purhaps scrape the page id from paging html.
for page_id in page_id_list:
# Add 2 post items. This is required for ASP.NET Gridview AJAX postback event.
payload['__CALLBACKID'] = 'ctl00$ContentPlaceHolder1$ASPxGridView_search',
# TODO: you might want to examine "__CALLBACKPARAM" acrross multiple pages. However it looks like it works by swapping PageID (e.g PN1, PN2)
payload['__CALLBACKPARAM'] = 'c0:KV|151;["5798534","5798533","5798532","5798531","5798529","5798519","5798518","5798517","5798515","5798514","5798512","5798503","5798501","5798496","5798495"];CR|2;{};GB|20;12|PAGERONCLICK3|' + page_id + ';'
r = s.post(link, data=payload)
r.raise_for_status()
soup = BeautifulSoup(r.text,"lxml")
for pk_id in soup.select("a.dxeHyperlink_Youthful[href*='Q_PK_ID']"):
print(pk_id.get("href"))
Вывод :
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798668
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798588
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798584
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798573
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798572
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798570
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798569
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798568
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798566
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798564
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798560
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798552
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798542
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798541
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798535
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798534
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798533
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798532
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798531
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798529
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798519
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798518
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798517
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798515
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798514
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798512
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798503
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798501
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798496
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798495
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798494
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798492
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798485
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798480
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798479
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798476
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798475
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798474
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798472
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798471
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798470
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798469
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798466
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798463
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798462
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798460
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798459
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798458
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798457
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798455
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798454
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798453
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798452
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798449
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798448
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798447
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798446
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798445
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798444
WebPages/web_case_detail_ocean.aspx?Q_PK_ID=5798443
Хотя решение может быть достигнуто с помощью запросов . Может быть темпераментным. Селен обычно лучше.