Если вы пытаетесь проанализировать <table>
теги, перейдите к пандам .read_html()
.Это делает большую часть тяжелой работы для вас.Он вернет список данных.Таблица, на которую вы ссылаетесь, является третьей таблицей (поэтому позиция индекса 2)
import pandas as pd
url="http://www.elections.in/"
tables = pd.read_html(url)
Вывод:
print (tables[2].to_string())
State Party Number of Seats
0 Andaman & Nicobar Islands Indian National Congress 1
1 Andhra Pradesh Yuvajana Sramika Rythu Congress Party 22
2 Andhra Pradesh Telugu Desam 3
3 Arunachal Pradesh Bharatiya Janata Party 2
4 Assam Bharatiya Janata Party 9
5 Assam Indian National Congress 3
6 Assam All India United Democratic Front 1
7 Assam Independent 1
8 Bihar Bharatiya Janata Party 17
9 Bihar Janata Dal (United) 16
10 Bihar Lok Jan Shakti Party 6
11 Bihar Indian National Congress 1
12 Chandigarh Bharatiya Janata Party 1
13 Chhattisgarh Bharatiya Janata Party 9
14 Chhattisgarh Indian National Congress 2
15 Dadra & Nagar Haveli Independent 1
16 Daman & Diu Bharatiya Janata Party 1
17 Goa Bharatiya Janata Party 1
18 Goa Indian National Congress 1
19 Gujarat Bharatiya Janata Party 26
20 Haryana Bharatiya Janata Party 10
21 Himachal Pradesh Bharatiya Janata Party 4
22 Jammu & Kashmir Bharatiya Janata Party 3
23 Jammu & Kashmir Jammu & Kashmir National Conference 3
24 Jharkhand Bharatiya Janata Party 11
25 Jharkhand Ajsu Party 1
26 Jharkhand Indian National Congress 1
27 Jharkhand Jharkhand Mukti Morcha 1
28 Karnataka Bharatiya Janata Party 25
29 Karnataka Independent 1
30 Karnataka Indian National Congress 1
31 Karnataka Janata Dal (Secular) 1
32 Kerala Indian National Congress 15
33 Kerala Indian Union Muslim League 2
34 Kerala Communist Party Of India (Marxist) 1
35 Kerala Kerala Congress (M) 1
36 Kerala Revolutionary Socialist Party 1
37 Lakshadweep Nationalist Congress Party 1
38 Madhya Pradesh Bharatiya Janata Party 28
39 Madhya Pradesh Indian National Congress 1
40 Maharashtra Bharatiya Janata Party 23
41 Maharashtra Shivsena 18
42 Maharashtra Nationalist Congress Party 4
43 Maharashtra All India Majlis-E-Ittehadul Muslimeen 1
44 Maharashtra Independent 1
45 Maharashtra Indian National Congress 1
46 Manipur Bharatiya Janata Party 1
47 Manipur Naga Peoples Front 1
48 Meghalaya Indian National Congress 1
49 Meghalaya National People'S Party 1
50 Mizoram Mizo National Front 1
51 Nagaland Nationalist Democratic Progressive Party 1
52 NCT OF Delhi Bharatiya Janata Party 7
53 Odisha Biju Janata Dal 12
54 Odisha Bharatiya Janata Party 8
55 Odisha Indian National Congress 1
56 Puducherry Indian National Congress 1
57 Punjab Indian National Congress 8
58 Punjab Bharatiya Janata Party 2
59 Punjab Shiromani Akali Dal 2
60 Punjab Aam Aadmi Party 1
61 Rajasthan Bharatiya Janata Party 24
62 Rajasthan Rashtriya Loktantrik Party 1
63 Sikkim Sikkim Krantikari Morcha 1
64 Tamil Nadu Dravida Munnetra Kazhagam 23
65 Tamil Nadu Indian National Congress 8
66 Tamil Nadu Communist Party Of India 2
67 Tamil Nadu Communist Party Of India (Marxist) 2
68 Tamil Nadu All India Anna Dravida Munnetra Kazhagam 1
69 Tamil Nadu Indian Union Muslim League 1
70 Tamil Nadu Viduthalai Chiruthaigal Katchi 1
71 Telangana Telangana Rashtra Samithi 9
72 Telangana Bharatiya Janata Party 4
73 Telangana Indian National Congress 3
74 Telangana All India Majlis-E-Ittehadul Muslimeen 1
75 Tripura Bharatiya Janata Party 2
76 Uttar Pradesh Bharatiya Janata Party 62
77 Uttar Pradesh Bahujan Samaj Party 10
78 Uttar Pradesh Samajwadi Party 5
79 Uttar Pradesh Apna Dal (Soneylal) 2
80 Uttar Pradesh Indian National Congress 1
81 Uttarakhand Bharatiya Janata Party 5
82 West Bengal All India Trinamool Congress 22
83 West Bengal Bharatiya Janata Party 18
84 West Bengal Indian National Congress
2
Чтобы добиться этого с помощью BeautifulSoup, вам нужнонеобходимо выполнить итерацию по каждой строке (тег <tr>
), затем по каждому тегу ячейки данных каждой строки (<td>
), а затем добавить его в список или фрейм данных или как угодно, чтобы сохранить его.
Так что-то вроде этого:
import requests
import os
from bs4 import BeautifulSoup
url="http://www.elections.in/"
r=requests.get(url).content
htmlDoc=r.decode("utf-8")
soup = BeautifulSoup(htmlDoc, 'html.parser')
table = soup.find_all('table')[2]
rows = table.find_all('tr')
headers = table.find_all('th')
headers = [ each.text for each in headers ]
list_of_rows = []
for row in rows:
data = row.find_all('td')
if data != []:
data = [ each.text for each in data ]
list_of_rows.append(data)
Вывод:
print (headers)
['State', 'Party', 'Number of Seats']
print (list_of_rows)
[['Andaman & Nicobar Islands', 'Indian National Congress', '1'], ['Andhra Pradesh', 'Yuvajana Sramika Rythu Congress Party', '22'], ['Andhra Pradesh', 'Telugu Desam', '3'], ['Arunachal Pradesh', 'Bharatiya Janata Party', '2'], ['Assam', 'Bharatiya Janata Party', '9'], ['Assam', 'Indian National Congress', '3'], ['Assam', 'All India United Democratic Front', '1'], ['Assam', 'Independent', '1'], ['Bihar', 'Bharatiya Janata Party', '17'], ['Bihar', 'Janata Dal (United)', '16'], ['Bihar', 'Lok Jan Shakti Party', '6'], ['Bihar', 'Indian National Congress', '1'], ['Chandigarh', 'Bharatiya Janata Party', '1'], ['Chhattisgarh', 'Bharatiya Janata Party', '9'], ['Chhattisgarh', 'Indian National Congress', '2'], ['Dadra & Nagar Haveli', 'Independent', '1'], ['Daman & Diu', 'Bharatiya Janata Party', '1'], ['Goa', 'Bharatiya Janata Party', '1'], ['Goa', 'Indian National Congress', '1'], ['Gujarat', 'Bharatiya Janata Party', '26'], ['Haryana', 'Bharatiya Janata Party', '10'], ['Himachal Pradesh', 'Bharatiya Janata Party', '4'], ['Jammu & Kashmir', 'Bharatiya Janata Party', '3'], ['Jammu & Kashmir', 'Jammu & Kashmir National Conference', '3'], ['Jharkhand', 'Bharatiya Janata Party', '11'], ['Jharkhand', 'Ajsu Party', '1'], ['Jharkhand', 'Indian National Congress', '1'], ['Jharkhand', 'Jharkhand Mukti Morcha', '1'], ['Karnataka', 'Bharatiya Janata Party', '25'], ['Karnataka', 'Independent', '1'], ['Karnataka', 'Indian National Congress', '1'], ['Karnataka', 'Janata Dal (Secular)', '1'], ['Kerala', 'Indian National Congress', '15'], ['Kerala', 'Indian Union Muslim League', '2'], ['Kerala', 'Communist Party Of India (Marxist)', '1'], ['Kerala', 'Kerala Congress (M)', '1'], ['Kerala', 'Revolutionary Socialist Party', '1'], ['Lakshadweep', 'Nationalist Congress Party', '1'], ['Madhya Pradesh', 'Bharatiya Janata Party', '28'], ['Madhya Pradesh', 'Indian National Congress', '1'], ['Maharashtra', 'Bharatiya Janata Party', '23'], ['Maharashtra', 'Shivsena', '18'], ['Maharashtra', 'Nationalist Congress Party', '4'], ['Maharashtra', 'All India Majlis-E-Ittehadul Muslimeen', '1'], ['Maharashtra', 'Independent', '1'], ['Maharashtra', 'Indian National Congress', '1'], ['Manipur', 'Bharatiya Janata Party', '1'], ['Manipur', 'Naga Peoples Front', '1'], ['Meghalaya', 'Indian National Congress', '1'], ['Meghalaya', "National People'S Party", '1'], ['Mizoram', 'Mizo National Front', '1'], ['Nagaland', 'Nationalist Democratic Progressive Party', '1'], ['NCT OF Delhi', 'Bharatiya Janata Party', '7'], ['Odisha', 'Biju Janata Dal', '12'], ['Odisha', 'Bharatiya Janata Party', '8'], ['Odisha', 'Indian National Congress', '1'], ['Puducherry', 'Indian National Congress', '1'], ['Punjab', 'Indian National Congress', '8'], ['Punjab', 'Bharatiya Janata Party', '2'], ['Punjab', 'Shiromani Akali Dal', '2'], ['Punjab', 'Aam Aadmi Party', '1'], ['Rajasthan', 'Bharatiya Janata Party', '24'], ['Rajasthan', 'Rashtriya Loktantrik Party', '1'], ['Sikkim', 'Sikkim Krantikari Morcha', '1'], ['Tamil Nadu', 'Dravida Munnetra Kazhagam', '23'], ['Tamil Nadu', 'Indian National Congress', '8'], ['Tamil Nadu', 'Communist Party Of India', '2'], ['Tamil Nadu', 'Communist Party Of India (Marxist)', '2'], ['Tamil Nadu', 'All India Anna Dravida Munnetra Kazhagam', '1'], ['Tamil Nadu', 'Indian Union Muslim League', '1'], ['Tamil Nadu', 'Viduthalai Chiruthaigal Katchi', '1'], ['Telangana', 'Telangana Rashtra Samithi', '9'], ['Telangana', 'Bharatiya Janata Party', '4'], ['Telangana', 'Indian National Congress', '3'], ['Telangana', 'All India Majlis-E-Ittehadul Muslimeen', '1'], ['Tripura', 'Bharatiya Janata Party', '2'], ['Uttar Pradesh', 'Bharatiya Janata Party', '62'], ['Uttar Pradesh', 'Bahujan Samaj Party', '10'], ['Uttar Pradesh', 'Samajwadi Party', '5'], ['Uttar Pradesh', 'Apna Dal (Soneylal)', '2'], ['Uttar Pradesh', 'Indian National Congress', '1'], ['Uttarakhand', 'Bharatiya Janata Party', '5'], ['West Bengal', 'All India Trinamool Congress', '22'], ['West Bengal', 'Bharatiya Janata Party', '18'], ['West Bengal', 'Indian National Congress', '2']]
Но, как я уже сказал, панды сделают это для вас с .read_html()