Я предпочитаю использовать теги <table>
, чтобы использовать pandas 'read_html()
, так как он использует beautifulsoup под капотом и будет выполнять работу по разбору таблиц:
import requests
import pandas as pd
from bs4 import BeautifulSoup
base_url = "http://www.harness.org.au"
webpage_response = requests.get('http://www.harness.org.au/racing/tracks/', "html.parser")
soup = BeautifulSoup(webpage_response.content, "html.parser")
# only finding one track
# soup.table to find all links for days racing
#harness_table = soup.row
# scraps a href that is an incomplete URL that im trying to get to
tracks = soup.find(class_="col-lg-10 col-md-10 col-sm-10 col-xs-10 content")
lists = []
links = tracks.find_all('a')
#Gets each track
for a in links:
lists.append(base_url+a["href"])
# track1 = []
#purpose - just to get track name before going over other data
for link in lists:
webpage = requests.get(link)
track = BeautifulSoup(webpage.content, "html.parser")
trackname = track.find(class_="pageTitle")
if 'itemprop' in links[lists.index(link)].attrs.keys():
continue
try:
track1 = trackname.get_text()
except:
print ('No class="pageTitle" found.')
track1 = ''
barrierTable = track.find('div',{'class':'barrierStats'})
barrierTable = pd.read_html(str(barrierTable))[0]
if barrierTable.empty == True:
barrierTable = 'No Barrier Data...'
print('%s\n%s\n' %(track1, barrierTable) + '#'*70 + '\n')
Выход:
Albany
Barrier 1 2 3 4 5 6 7 8 9 10
0 Starts 74 74 74 74 72 61 71 60 41 34
1 Wins 11 11 9 13 6 9 7 3 4 1
######################################################################
Albion Park
Barrier 1 2 3 4 5 ... 8 9 10 11 12 13
0 Starts 1250 1250 1250 1250 1249 ... 1125 959 790 467 277 6
1 Wins 197 137 186 153 115 ... 100 73 77 39 17 0
[2 rows x 14 columns]
######################################################################
Albury
Barrier 1 2 3 4 5 6 7 8 9 10 11
0 Starts 63 63 63 63 58 35 60 55 39 25 5
1 Wins 11 8 6 11 2 5 5 10 3 2 0
######################################################################
Ararat
Barrier 1 2 3 4 5 6 7 8 9 10 11
0 Starts 34 34 34 34 32 31 23 24 16 11 6
1 Wins 2 3 3 9 3 5 3 3 1 1 1
######################################################################
Armidale
Barrier 1 2 3 4 5 6 7 8 9 10
0 Starts 15 15 15 15 14 11 15 15 10 5
1 Wins 3 0 1 0 0 1 3 4 2 1
######################################################################
....