У меня проблемы, когда я использую scrapy для получения запроса. Я анализирую сеть со ссылкой на статью, ее последний символ - это номер страницы, но когда я добавляю «call_back = self.parse_article», это сработало только один раз. Почему?
def parse(self, response):
# start_url likes "www.xxxxpage=1"
for page in range(1, 10):
print(page)
print(self.start_urls)
url = Join(separator='')([self.start_urls[0][:-1], str(page)])
yield Request(url, headers={'User-Agent': self.ua.random}, callback=self.parse_article)
# Here "callback" it just works one time !
# parse the link of article, which is in the page of url before
article_selector = response.xpath("//*[@class='box-result clearfix']/h2/a/@href")
for url in article_selector.extract():
yield Request(url)
def parse_article(self, response):
print(response.url)
Я получаю 9 сообщений, страница 1, а ссылки на статьи взяты с url «xxxxpage = 1».
1
xxxxpage=1
2
xxxxpage=1
······
9
xxxxpage=1
# These are article links of start url(page=1)
<Selector xpath="//*[@class='box-result clearfix']/h2/a/@href" data='https://finance.sina.com.cn/stock/uss...'>, ···]
Когда я удаляю первый «call_back = self.article », работает не совсем правильно
1
xxxxpage=1
2
xxxxpage=2
······
9
xxxxpage=9
# These are links of page1
<Selector xpath="//*[@class='box-result clearfix']/h2/a/@href" data='https://finance.sina.com.cn/stock/uss...'>, ···]
<<Selector xpath="//*[@class='box-result clearfix']/h2/a/@href" data='https://finance.sina.com.cn/stock/uss...'>, ···]
<Selector xpath="//*[@class='box-result clearfix']/h2/a/@href" data='https://finance.sina.com.cn/stock/uss...'>, ···]
<Selector xpath="//*[@class='box-result clearfix']/h2/a/@href" data='https://finance.sina.com.cn/stock/uss...'>, ···]
1
xxxxpage=1
2
xxxxpage=2
······
9
xxxxpage=9
# These are links of page2
<Selector xpath="//*[@class='box-result clearfix']/h2/a/@href" data='https://finance.sina.com.cn/stock/uss...'>, ···]
<<Selector xpath="//*[@class='box-result clearfix']/h2/a/@href" data='https://finance.sina.com.cn/stock/uss...'>, ···]
<Selector xpath="//*[@class='box-result clearfix']/h2/a/@href" data='https://finance.sina.com.cn/stock/uss...'>, ···]
<Selector xpath="//*[@class='box-result clearfix']/h2/a/@href" data='https://finance.sina.com.cn/stock/uss...'>, ···]
······(page2 - page8 and links)
1
xxxxpage=1
2
xxxxpage=2
······
9
xxxxpage=9
# These are links of page9
<Selector xpath="//*[@class='box-result clearfix']/h2/a/@href" data='https://finance.sina.com.cn/stock/uss...'>, ···]
<<Selector xpath="//*[@class='box-result clearfix']/h2/a/@href" data='https://finance.sina.com.cn/stock/uss...'>, ···]
<Selector xpath="//*[@class='box-result clearfix']/h2/a/@href" data='https://finance.sina.com.cn/stock/uss...'>, ···]
<Selector xpath="//*[@class='box-result clearfix']/h2/a/@href" data='https://finance.sina.com.cn/stock/uss...'>, ···]