Выполните аутентификацию входа в Scrapy в приложении asp. net - PullRequest
0 голосов
/ 10 марта 2020

Я не могу просканировать этот URL: http://ceee.com.br/consulta Поскольку требуется проверка подлинности

Через библиотеку запросов я смог выполнить следующее:

import requests

url = "http://ceee.com.br/pportal/ceee/Component/ExecLogin.aspx"
payload = {
    'inpGroupID': 'delegacia',
    'inpNC': '',
    'inpDsUsernameLogin': '<USER>',
    'inpDsPasswordLogin': '<PASSWORD>'
}
headers = {
  'Host': 'ceee.com.br',
  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:73.0) Gecko/20100101 Firefox/73.0',
  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
}
response = requests.request("POST", url, headers=headers, data=payload)
requests.request()
print(response.text.encode('utf8'))

Давая:

DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): ceee.com.br:80
DEBUG:urllib3.connectionpool:http://ceee.com.br:80 "POST /pportal/ceee/Component/ExecLogin.aspx HTTP/1.1" 302 164
DEBUG:urllib3.connectionpool:http://ceee.com.br:80 "GET /pportal/ceee/Component/Controller.aspx?CC=2767 HTTP/1.1" 200 2434

Но в Scrapy мне не повезло:

...
    payload = {
        'inpGroupID': 'delegacia',
        'inpNC': '',
        'inpDsUsernameLogin': '<USER>',
        'inpDsPasswordLogin': '<PASSWORD>'
    }
def start_requests(self):
        yield Request(
            self.url, callback=self.parse, method='POST',
            body=json.dumps(self.payload)
        )

def parse(self, response: Response):
    if 'Controller' in response.url:
        self.log(':)')
    else:
        self.log(':(')
...
2020-03-09 20:57:44 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.logstats.LogStats']
2020-03-09 20:57:50 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2020-03-09 20:57:50 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2020-03-09 20:57:50 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2020-03-09 20:57:50 [scrapy.core.engine] INFO: Spider opened
2020-03-09 20:57:50 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-03-09 20:57:50 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2020-03-09 20:57:51 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <POST http://ceee.com.br/pportal/ceee/Component/ExecLogin.aspx> (failed 1 times): 500 Internal Server Error
2020-03-09 20:57:51 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <POST http://ceee.com.br/pportal/ceee/Component/ExecLogin.aspx> (failed 2 times): 500 Internal Server Error
2020-03-09 20:57:51 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <POST http://ceee.com.br/pportal/ceee/Component/ExecLogin.aspx> (failed 3 times): 500 Internal Server Error
2020-03-09 20:57:51 [scrapy.core.engine] DEBUG: Crawled (500) <POST http://ceee.com.br/pportal/ceee/Component/ExecLogin.aspx> (referer: None)
2020-03-09 20:57:51 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <500 http://ceee.com.br/pportal/ceee/Component/ExecLogin.aspx>: HTTP status code is not handled or not allowed
2020-03-09 20:57:51 [scrapy.core.engine] INFO: Closing spider (finished)
...

Использование URL-адреса, вызываемого формой (http://ceee.com.br/pportal/ceee/Component/DEFormService_COnsulta.aspx) он выдает HTTP-код 200, но все же:

...
2020-03-09 21:01:20 [scrapy.core.engine] INFO: Spider opened
2020-03-09 21:01:20 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-03-09 21:01:20 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2020-03-09 21:01:21 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://ceee.com.br/pportal/ceee/Component/DEForm_LoginUser.aspx> (referer: None)
2020-03-09 21:01:22 [spider] DEBUG: :(
2020-03-09 21:01:22 [scrapy.core.engine] INFO: Closing spider (finished)
2020-03-09 21:01:22 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
...

Также пробовал с loginform, как показано здесь и с FormRequest с здесь .

Я не могу понять эту проблему, никогда не работал с сеансами Scrapy: (

Любая помощь приветствуется

...