Этот скрипт получит все form-profile-xxx
URL-адреса из https://gg.co.uk/racing/16-jun-2020/thirsk-1300
, а затем получит каждую строку, принадлежащую этой гонке, со страницы профиля и сохранит ее в csv:
import csv
import requests
from bs4 import BeautifulSoup
url = 'https://gg.co.uk/racing/16-jun-2020/thirsk-1300'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
all_data = []
for a in soup.select('a[href^="/racing/form-profile-"]'):
u = 'https://gg.co.uk' + a['href']
s = BeautifulSoup(requests.get(u).content, 'html.parser')
row = s.select_one('tr:has(a[href="{}"])'.format(url.replace('https://gg.co.uk', '')))
if not row:
continue
tds = [td.get_text(strip=True, separator='\n') for td in row.select('td')]
print(tds)
all_data.append(tds)
with open('data.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
for row in all_data:
writer.writerow(row)
Печать:
['1st\n3\n5', '16th Jun 2020\nGood to Soft\n7f\nClass 5', '1:00 Thirsk\nHigh Peak\n9\n5\nF Norton\nM Johnston', '5/6\nWon']
['1st\n3\n5', '16th Jun 2020\nGood to Soft\n7f\nClass 5', '1:00 Thirsk\nHigh Peak\n9st 5lb\nF Norton\nM Johnston', '5/6\nWon']
['1st\n3\n5', '16th Jun 2020\nGood to Soft\n7f\nClass 5', '1:00 Thirsk\nHigh Peak\n9st 5lb\nF Norton\nM Johnston', '5/6\nWon']
['2nd\n2\n6', '16th Jun 2020\nGood to Soft\n7f\nClass 5', '1:00 Thirsk\nDeputy\n9\n5\nS Donohoe\nC Fellowes', '5/2']
['2nd\n2\n6', '16th Jun 2020\nGood to Soft\n7f\nClass 5', '1:00 Thirsk\nDeputy\n9st 5lb\nS Donohoe\nC Fellowes', '5/2']
['2nd\n2\n6', '16th Jun 2020\nGood to Soft\n7f\nClass 5', '1:00 Thirsk\nDeputy\n9st 5lb\nS Donohoe\nC Fellowes', '5/2']
['3rd\n4\n2', '16th Jun 2020\nGood to Soft\n7f\nClass 5', '1:00 Thirsk\nInfant Hercules\n9\n5\nKevin Stott\nK A Ryan', '12/1\n2']
['3rd\n4\n2', '16th Jun 2020\nGood to Soft\n7f\nClass 5', '1:00 Thirsk\nInfant Hercules\n9st 5lb\nKevin Stott\nK A Ryan', '12/1\n2']
['3rd\n4\n2', '16th Jun 2020\nGood to Soft\n7f\nClass 5', '1:00 Thirsk\nInfant Hercules\n9st 5lb\nKevin Stott\nK A Ryan', '12/1\n2']
['4th\n8\n3', '16th Jun 2020\nGood to Soft\n7f\nClass 5', '1:00 Thirsk\nChilli Leaves\n9\n0\nCallum Rodriguez\nK Dalgleish', '12/1\n2.5']
['4th\n8\n3', '16th Jun 2020\nGood to Soft\n7f\nClass 5', '1:00 Thirsk\nChilli Leaves\n9st\nCallum Rodriguez\nK Dalgleish', '12/1\n2.5']
['4th\n8\n3', '16th Jun 2020\nGood to Soft\n7f\nClass 5', '1:00 Thirsk\nChilli Leaves\n9st\nCallum Rodriguez\nK Dalgleish', '12/1\n2.5']
['5th\n6\n4', '16th Jun 2020\nGood to Soft\n7f\nClass 5', '1:00 Thirsk\nMy Best Friend\n9\n5\nD Nolan\nD OʼMeara', '15/2\n4.25']
['5th\n6\n4', '16th Jun 2020\nGood to Soft\n7f\nClass 5', '1:00 Thirsk\nMy Best Friend\n9st 5lb\nD Nolan\nD OʼMeara', '15/2\n4.25']
['6th\n7\n8', '16th Jun 2020\nGood to Soft\n7f\nClass 5', '1:00 Thirsk\nTopper Bill\n9\n5\nBarry McHugh\nAdrian Nicholls', '25/1\n6.25']
['6th\n7\n8', '16th Jun 2020\nGood to Soft\n7f\nClass 5', '1:00 Thirsk\nTopper Bill\n9st 5lb\nBarry McHugh\nAdrian Nicholls', '25/1\n6.25']
['6th\n7\n8', '16th Jun 2020\nGood to Soft\n7f\nClass 5', '1:00 Thirsk\nTopper Bill\n9st 5lb\nBarry McHugh\nAdrian Nicholls', '25/1\n6.25']
['7th\n1\n1', '16th Jun 2020\nGood to Soft\n7f\nClass 5', '1:00 Thirsk\nDandini\n9\n5\nBen Robinson\nOllie Pears', '40/1\n7']
['7th\n1\n1', '16th Jun 2020\nGood to Soft\n7f\nClass 5', '1:00 Thirsk\nDandini\n9st 5lb\nBen Robinson\nOllie Pears', '40/1\n7']
['7th\n1\n1', '16th Jun 2020\nGood to Soft\n7f\nClass 5', '1:00 Thirsk\nDandini\n9st 5lb\nBen Robinson\nOllie Pears', '40/1\n7']
['8th\n5\n7', '16th Jun 2020\nGood to Soft\n7f\nClass 5', '1:00 Thirsk\nMarsellus\n9\n5\nD Allan\nT D Easterby', '33/1\n27']
['8th\n5\n7', '16th Jun 2020\nGood to Soft\n7f\nClass 5', '1:00 Thirsk\nMarsellus\n9st 5lb\nD Allan\nT D Easterby', '33/1\n27']
['8th\n5\n7', '16th Jun 2020\nGood to Soft\n7f\nClass 5', '1:00 Thirsk\nMarsellus\n9st 5lb\nD Allan\nT D Easterby', '33/1\n27']
И сохраняет data.csv
(скриншот из Libre Office):