После выбора опции из выпадающего списка и заполнения капчи я не смог найти __EVENTTARGET в разделе сети для «открыть PDF».
https://ceo.maharashtra.gov.in/SearchList/
Вот мой код паука scrapy: --
import scrapy
import os
class MahaSpider(scrapy.Spider):
name = 'maha'
allowed_domains = ['ceo.maharashtra.gov.in']
start_urls = ['https://ceo.maharashtra.gov.in/SearchList/']
search_action_url = "https://ceo.maharashtra.gov.in/SearchList/"
def parse(self, response):
formdata = dict()
for input in response.xpath("//form[@id='Form']//input"):
name = input.xpath('./@name').get()
print name
value = input.xpath('./@value').get()
formdata[name] = str(value) if value else ''
formdata['ctl00$Content$DistrictList'] = '26'
formdata['__EVENTTARGET'] = 'ctl00$Content$DistrictList'
return scrapy.FormRequest.from_response(response, method='POST',dont_click=True,formdata=formdata, callback=self.parse_dist)
def parse_dist(self, response):
formdata = dict()
for input in response.xpath("//form[@id='Form']//input"):
name = input.xpath('./@name').get()
print name
value = input.xpath('./@value').get()
formdata[name] = str(value) if value else ''
formdata['ctl00$Content$AssemblyList']= '216'
formdata['__EVENTTARGET'] = 'ctl00$Content$AssemblyList'
return scrapy.FormRequest.from_response(response, method='POST',dont_click=True,formdata=formdata, callback=self.parse_asse)
def parse_asse(self, response):
formdata = dict()
for input in response.xpath("//form[@id='Form']//input"):
name = input.xpath('./@name').get()
print name
value = input.xpath('./@value').get()
formdata[name] = str(value) if value else ''
formdata['ctl00$Content$PartList']= '1'
formdata['__EVENTTARGET'] = 'ctl00$Content$PartList'
return scrapy.FormRequest.from_response(response, method='POST',dont_click=True,formdata=formdata, callback=self.parse_search)
def parse_search(self, response):
os.system("wget https://ceo.maharashtra.gov.in/SearchList/Captcha.aspx --no-check-certificate")
captcha=raw_input("please enter the captcha")
formdata = dict()
for input in response.xpath("//form[@id='Form']//input"):
name = input.xpath('./@name').get()
print name
value = input.xpath('./@value').get()
formdata[name] = str(value) if value else ''
formdata['__EVENTTARGET'] = 'ctl00$Content$txtcaptcha'
formdata['ctl00$Content$txtcaptcha']=captcha
return scrapy.FormRequest.from_response(response, method='POST',dont_click=True,formdata=formdata, callback=self.parse_search1)
def parse_search1(self, response):
formdata = dict()
for input in response.xpath("//form[@id='Form']//input"):
name = input.xpath('./@name').get()
print name
value = input.xpath('./@value').get()
formdata[name] = str(value) if value else ''
formdata['__EVENTTARGET'] = 'ctl00$Content$OpenButton'
print response.url
return scrapy.FormRequest.from_response(response, method='POST',dont_click=False,formdata=formdata, callback=self.parse_search2)
def parse_search2(self,response):
scrapy.shell.inspect_response(response, self)
Необходимо загрузить PDF, но не удается загрузить, потому что: - 1 >> Не получено __EVENTTARGET для «получить PDF» в разделе сети.2 >> Новая вкладка появится, если мы нажмем «получить PDF».