Я не могу просканировать этот URL: http://ceee.com.br/consulta Поскольку требуется проверка подлинности
Через библиотеку запросов я смог выполнить следующее:
import requests
url = "http://ceee.com.br/pportal/ceee/Component/ExecLogin.aspx"
payload = {
'inpGroupID': 'delegacia',
'inpNC': '',
'inpDsUsernameLogin': '<USER>',
'inpDsPasswordLogin': '<PASSWORD>'
}
headers = {
'Host': 'ceee.com.br',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:73.0) Gecko/20100101 Firefox/73.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
}
response = requests.request("POST", url, headers=headers, data=payload)
requests.request()
print(response.text.encode('utf8'))
Давая:
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): ceee.com.br:80
DEBUG:urllib3.connectionpool:http://ceee.com.br:80 "POST /pportal/ceee/Component/ExecLogin.aspx HTTP/1.1" 302 164
DEBUG:urllib3.connectionpool:http://ceee.com.br:80 "GET /pportal/ceee/Component/Controller.aspx?CC=2767 HTTP/1.1" 200 2434
Но в Scrapy мне не повезло:
...
payload = {
'inpGroupID': 'delegacia',
'inpNC': '',
'inpDsUsernameLogin': '<USER>',
'inpDsPasswordLogin': '<PASSWORD>'
}
def start_requests(self):
yield Request(
self.url, callback=self.parse, method='POST',
body=json.dumps(self.payload)
)
def parse(self, response: Response):
if 'Controller' in response.url:
self.log(':)')
else:
self.log(':(')
...
2020-03-09 20:57:44 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2020-03-09 20:57:50 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2020-03-09 20:57:50 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2020-03-09 20:57:50 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2020-03-09 20:57:50 [scrapy.core.engine] INFO: Spider opened
2020-03-09 20:57:50 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-03-09 20:57:50 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2020-03-09 20:57:51 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <POST http://ceee.com.br/pportal/ceee/Component/ExecLogin.aspx> (failed 1 times): 500 Internal Server Error
2020-03-09 20:57:51 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <POST http://ceee.com.br/pportal/ceee/Component/ExecLogin.aspx> (failed 2 times): 500 Internal Server Error
2020-03-09 20:57:51 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <POST http://ceee.com.br/pportal/ceee/Component/ExecLogin.aspx> (failed 3 times): 500 Internal Server Error
2020-03-09 20:57:51 [scrapy.core.engine] DEBUG: Crawled (500) <POST http://ceee.com.br/pportal/ceee/Component/ExecLogin.aspx> (referer: None)
2020-03-09 20:57:51 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <500 http://ceee.com.br/pportal/ceee/Component/ExecLogin.aspx>: HTTP status code is not handled or not allowed
2020-03-09 20:57:51 [scrapy.core.engine] INFO: Closing spider (finished)
...
Использование URL-адреса, вызываемого формой (http://ceee.com.br/pportal/ceee/Component/DEFormService_COnsulta.aspx) он выдает HTTP-код 200, но все же:
...
2020-03-09 21:01:20 [scrapy.core.engine] INFO: Spider opened
2020-03-09 21:01:20 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-03-09 21:01:20 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2020-03-09 21:01:21 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://ceee.com.br/pportal/ceee/Component/DEForm_LoginUser.aspx> (referer: None)
2020-03-09 21:01:22 [spider] DEBUG: :(
2020-03-09 21:01:22 [scrapy.core.engine] INFO: Closing spider (finished)
2020-03-09 21:01:22 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
...
Также пробовал с loginform
, как показано здесь и с FormRequest
с здесь .
Я не могу понять эту проблему, никогда не работал с сеансами Scrapy: (
Любая помощь приветствуется