У меня есть образец HTML-текста, как показано ниже:
..........
<a href="d?racename=&country=1000&startmonth=1&endmonth=10&startdate=2018&enddate=2019&maxdist=unlimitied&class=any&x=1&order=winner&z=Px_8iD">Winner</a>
</th>
<th background="b8.gif" width="30" title="Winning time - click on this header to sort results by this column">
<a href="d?racename=&country=1000&startmonth=1&endmonth=10&startdate=2018&enddate=2019&maxdist=unlimitied&class=any&x=1&order=wintime&z=Px_8iD">Wintime</a>
</th>
<th background="b8.gif" title="races with icon have video available for download">Film</th>
</tr>\n<tr>
<td><a href="d?r=4552510&z=Px_8iD">OAKS AT LOGAN PARK (1-2 WINS)</a></td>
<td>Warragul</td>
<td>18;OCT;2019</td>
<td>7</td>
<td>GR;Tier</td>
<td>460;503</td>
<td><a href="d?i=2390975">Madalia Ken</a></td>
<td>26.00</td>
<td></td>
</tr>\n<tr bgcolor="#cccccc">
<td><a href="d?r=4552511&z=Px_8iD">AUSTRALIAN QUALITY PET FOODS</a></td>
<td>Warragul</td>
<td>18;OCT;2019</td>
<td>8</td>
<td>GR;Grad</td>
<td>460;503</td>
<td><a href="d?i=2304665">Midnight Storm</a></td>
<td>26.24</td>
<td></td>
</tr>\n<tr>
<td><a href="d?r=4552512&z=Px_8iD">EAST IVANHOE GROCERS</a></td>
<td>Warragul</td>
<td>18;OCT;2019</td>
<td>9</td>
<td>GR;Grad</td>
<td>400;437</td>
<td><a href="d?i=2362422">Early Promise</a></td>
<td>23.15</td>
<td></td>
</tr>
Мне нужно извлечь данные в каждый столбец, как показано ниже:
row 1
\n<tr ><td><a href="d?r=4552510&z=Px_8iD"> column name = "r_ID" , value = 4552510
OAKS AT LOGAN PARK (1-2 WINS)</a></td> column name = "r_name" , value = OAKS AT LOGAN PARK (1-2 WINS)
<td>Warragul</td> column name = "s_name" , value = Warragul
<td>18;OCT;2019</td> column name = "date" , value = 18;OCT;2019
<td>7</td> column name = "h" , value = 7
<td>GR;Tier</td> column name = "g" , value = GR;Tier
<td>460;503</td> column name = "d" , value = 460;503
<td><a href="d?i=2390975"> column name = "w_ID" , value = 2390975
Madalia Ken</a></td> column name = "w_name" , value = Madalia Ken
<td>26.00</td> column name = "wt" , value = 26.00
<td></td></tr> column name = "f" , value = ''
row 2
\n<tr bgcolor="#cccccc" ><td><a href="d?r=4552511&z=Px_8iD"> column name = "r_ID" , value = 4552511
AUSTRALIAN QUALITY PET FOODS</a></td> column name = "r_name" , value = AUSTRALIAN QUALITY PET FOODS
<td>Warragul</td> column name = "s_name" , value = Warragul
<td>18;OCT;2019</td> column name = "date" , value = 18;OCT;2019
<td>8</td> column name = "h" , value = 8
<td>GR;Grad</td> column name = "g" , value = GR;Grad
<td>460;503</td> column name = "d" , value = 460;503
<td><a href="d?i=2304665"> column name = "w_ID" , value = 2304665
Midnight Storm</a></td> column name = "w_name" , value = Midnight Storm
<td>26.24</td> column name = "wt" , value = 26.024
<td></td></tr> column name = "f" , value = ''
row 3
\n<tr ><td><a href="d?r=4552512&z=Px_8iD"> column name = "r_ID" , value = 4552512
EAST IVANHOE GROCERS</a></td> column name = "r_name" , value = EAST IVANHOE GROCERS
<td>Warragul</td> column name = "s_name" , value = Warragul
<td>18;OCT;2019</td> column name = "date" , value = 18;OCT;2019
<td>9</td> column name = "h" , value = 9
<td>GR;Grad</td> column name = "g" , value = GR;Grad
<td>400;437</td> column name = "d" , value = 400;437
<td><a href="d?i=2362422"> column name = "w_ID" , value = 2362422
Early Promise</a></td> column name = "w_name" , value = Early Promise
<td>23.15</td> column name = "wt" , value = 23.15
<td></td></tr> column name = "f" , value = ''
Я пробовал BeautifulSoup, но не работает, потому что: 1) часть данных находится внутри тега 2) когда я использую soup=getPage(url).find("table")
, часть тега стала >
, пример: <a href="d?i=2383236">Porsche Monelli / a > / t d > t d > 2 2 . 8 8 / t d > t d > / t d > / t r >
Любая помощь? Спасибо.