Я подозреваю, что проблема связана с get_player_suffix (хотя я не знаю, как у вас работает эта функция.) Чтобы получить URL-адрес игрока, для MOST он просто перенаправляет на правильный сайт, однако для некоторых игроков, суффикс может не соответствовать формату последних 5 букв фамилии + первые 2 буквы имени с последующим номером 01 (Ie bryanko01
). Например, Энтони Дэвис: davisan02
.
Для DeMarcus Cousins
он все еще следует этому формату, но если вы ищете его имя, вам нужны результаты поиска, а не перенаправление на его страницу, как это делает для Кобе.
Итак, я переписал функцию get_player_suffix
и сказал, что если она возвращает результаты поиска, выберите нужного вам игрока, и он будет go по-своему. В противном случае, если он перенаправит на страницу, он просто наберет go:
from bs4 import BeautifulSoup
from requests import get
import pandas as pd
def get_player_suffix(name):
searchUrl = 'https://www.basketball-reference.com/search/search.fcgi'
payload = {'hint': name,
'search': name}
response = get(searchUrl, params=payload)
soup = BeautifulSoup(response.text, 'html.parser')
if soup.find('h1').text == 'Search Results':
players = soup.find('h1').find_next('div').find_all('a', href=True)
playerChoices = {}
for each in players:
if 'players/%s/' %(name.split()[-1][0].lower()) in each['href']:
playerChoices[each.text] = each['href']
if len(playerChoices.keys()) > 1:
print ('Make a choice:')
for idx, player in enumerate(playerChoices.keys()):
print (' %s: %s' %(idx, player))
playerUrl = playerChoices[list(playerChoices.keys())[int(input('\nEnter number: '))]].replace('.html', '')
else:
playerUrl = playerChoices[list(playerChoices.keys())[0]].replace('.html', '')
else:
playerUrl = response.url.split('.com')[-1].replace('.html', '')
return playerUrl
def get_game_logs(name, start_date, end_date, playoffs=False):
suffix = get_player_suffix(name).replace('/', '%2F')
start_date_str = start_date
end_date_str = end_date
start_date = pd.to_datetime(start_date)
end_date = pd.to_datetime(end_date)
years = list(range(start_date.year, end_date.year+2))
if playoffs: selector = 'div_pgl_basic_playoffs'
else: selector = 'div_pgl_basic'
final_df = None
for year in years:
print(year)
url = f'https://widgets.sports-reference.com/wg.fcgi?css=1&site=bbr&url={suffix}%2Fgamelog%2F{year}&div={selector}'
r = get(url)
if r.status_code==200:
soup = BeautifulSoup(r.content, 'html.parser')
table = soup.find('table')
if table:
df = pd.read_html(str(table))[0]
active_df = pd.DataFrame(columns = list(df.columns))
for index, row in df.iterrows():
if len(row['GS'])>1: continue
active_df = active_df.append(row)
if final_df is None:
final_df = pd.DataFrame(columns=list(active_df.columns))
final_df = final_df.append(active_df)
return final_df
Вывод:
df2 = get_game_logs('Anthony Davis', '2017-08-01', '2019-02-02')
Make a choice:
0: Anthony Davis (2013-2020)
1: Mark Davis (1996-2000)
Enter number:
Затем введите 0
print (df2)
Rk G Date Age Tm Unnamed: 5 ... BLK TOV PF PTS GmSc +/-
0 1 1 2016-10-26 23-229 NOP NaN ... 4 3 4 50 44.2 -3
1 2 2 2016-10-28 23-231 NOP NaN ... 2 1 2 45 39.7 -3
2 3 3 2016-10-29 23-232 NOP @ ... 3 1 1 18 11.3 -17
3 4 4 2016-11-01 23-235 NOP NaN ... 3 1 2 35 33.9 +3
4 5 5 2016-11-02 23-236 NOP @ ... 3 5 5 10 1.6 -23
5 6 6 2016-11-04 23-238 NOP NaN ... 4 1 1 22 22.7 -6
6 7 7 2016-11-07 23-241 NOP @ ... 0 3 1 33 25.2 -6
7 8 8 2016-11-08 23-242 NOP @ ... 4 3 2 34 28.3 -4
8 9 9 2016-11-10 23-244 NOP @ ... 4 2 3 30 24.2 +7
9 10 10 2016-11-12 23-246 NOP NaN ... 2 1 0 34 24.4 -10
10 11 11 2016-11-14 23-248 NOP NaN ... 2 7 2 25 14.4 0
12 13 12 2016-11-18 23-252 NOP NaN ... 4 2 2 38 36.3 +16
13 14 13 2016-11-19 23-253 NOP NaN ... 3 2 3 38 31.2 +7
14 15 14 2016-11-22 23-256 NOP @ ... 3 2 1 13 10.1 +13
15 16 15 2016-11-23 23-257 NOP NaN ... 1 4 1 45 36.3 +33
16 17 16 2016-11-25 23-259 NOP @ ... 5 1 1 31 28.9 -16
17 18 17 2016-11-27 23-261 NOP @ ... 1 2 3 36 30.3 -19
18 19 18 2016-11-29 23-263 NOP NaN ... 2 1 4 41 39.0 +19
19 20 19 2016-12-02 23-266 NOP NaN ... 1 2 2 21 15.9 -7
21 21 20 2016-12-04 23-268 NOP @ ... 4 3 3 37 27.9 -10
22 22 21 2016-12-05 23-269 NOP NaN ... 4 1 1 28 21.0 -5
23 23 22 2016-12-08 23-272 NOP NaN ... 2 3 1 26 16.8 -10
25 25 23 2016-12-11 23-275 NOP @ ... 2 3 4 14 9.1 -7
26 26 24 2016-12-13 23-277 NOP NaN ... 5 6 2 28 25.0 -8
27 27 25 2016-12-15 23-279 NOP NaN ... 5 4 2 35 25.3 +14
28 28 26 2016-12-16 23-280 NOP @ ... 0 1 0 19 12.9 -15
29 29 27 2016-12-18 23-282 NOP @ ... 1 3 4 12 3.6 -23
30 30 28 2016-12-20 23-284 NOP @ ... 2 1 2 31 21.4 +11
31 31 29 2016-12-21 23-285 NOP NaN ... 2 2 5 34 25.4 +6
32 32 30 2016-12-23 23-287 NOP NaN ... 4 3 3 28 28.6 +4
.. .. .. ... ... ... ... ... .. .. .. .. ... ...
27 27 26 2019-12-15 26-279 LAL @ ... 2 5 5 27 16.7 -3
29 29 27 2019-12-19 26-283 LAL @ ... 3 1 4 36 29.9 +6
30 30 28 2019-12-22 26-286 LAL NaN ... 4 5 4 32 22.9 -13
31 31 29 2019-12-25 26-289 LAL NaN ... 2 3 4 24 18.0 -10
32 32 30 2019-12-28 26-292 LAL @ ... 0 2 3 20 16.6 +6
33 33 31 2019-12-29 26-293 LAL NaN ... 1 4 2 23 20.8 +16
34 34 32 2020-01-01 26-296 LAL NaN ... 1 3 1 26 20.4 +16
35 35 33 2020-01-03 26-298 LAL NaN ... 1 3 3 46 42.2 +26
36 36 34 2020-01-05 26-300 LAL NaN ... 8 0 3 24 27.7 -11
37 37 35 2020-01-07 26-302 LAL NaN ... 2 3 2 5 5.7 +10
44 43 36 2020-01-20 26-315 LAL @ ... 2 2 5 9 6.7 -24
45 44 37 2020-01-22 26-317 LAL @ ... 2 1 0 28 27.8 +8
46 45 38 2020-01-23 26-318 LAL @ ... 1 2 3 16 14.5 +6
47 46 39 2020-01-25 26-320 LAL @ ... 1 5 3 31 20.4 -10
48 47 40 2020-01-31 26-326 LAL NaN ... 5 4 1 37 37.1 -13
49 48 41 2020-02-01 26-327 LAL @ ... 1 2 1 21 22.8 +20
50 49 42 2020-02-04 26-330 LAL NaN ... 0 4 1 18 12.3 +7
51 50 43 2020-02-06 26-332 LAL NaN ... 3 0 3 32 32.8 -4
52 51 44 2020-02-08 26-334 LAL @ ... 1 2 2 27 26.7 +4
53 52 45 2020-02-10 26-336 LAL NaN ... 1 3 2 25 21.9 +9
54 53 46 2020-02-12 26-338 LAL @ ... 2 2 2 33 25.9 +5
55 54 47 2020-02-21 26-347 LAL NaN ... 7 4 2 28 27.1 +11
56 55 48 2020-02-23 26-349 LAL NaN ... 2 6 3 32 19.9 -3
57 56 49 2020-02-25 26-351 LAL NaN ... 6 2 3 21 19.0 +11
58 57 50 2020-02-27 26-353 LAL @ ... 2 3 2 23 18.0 +17
59 58 51 2020-02-29 26-355 LAL @ ... 2 1 2 15 14.9 -10
61 60 52 2020-03-03 26-358 LAL NaN ... 2 2 1 37 38.0 +30
63 61 53 2020-03-06 26-361 LAL NaN ... 2 4 4 30 16.9 +8
64 62 54 2020-03-08 26-363 LAL @ ... 1 1 5 30 23.9 +12
65 63 55 2020-03-10 26-365 LAL NaN ... 1 1 4 26 18.6 -5
[261 rows x 30 columns]
После запуска df3 = get_game_logs('DeMarcus Cousins', '2013-08-01', '2014-02-02')
вы получите Вывод:
print (df3)
Rk G Date Age Tm Unnamed: 5 ... BLK TOV PF PTS GmSc +/-
0 1 1 2012-10-31 22-079 SAC @ ... 2 7 4 14 6.8 -12
1 2 2 2012-11-02 22-081 SAC @ ... 0 1 5 11 2.1 -5
2 3 3 2012-11-03 22-082 SAC @ ... 1 1 4 21 14.0 -8
3 4 4 2012-11-05 22-084 SAC NaN ... 2 3 2 23 21.8 +1
4 5 5 2012-11-07 22-086 SAC NaN ... 0 1 2 21 16.2 -8
5 6 6 2012-11-09 22-088 SAC NaN ... 0 2 5 14 8.8 -7
8 9 7 2012-11-16 22-095 SAC NaN ... 0 2 3 9 10.2 -19
9 10 8 2012-11-18 22-097 SAC NaN ... 1 4 3 29 20.8 -2
10 11 9 2012-11-21 22-100 SAC NaN ... 0 3 6 7 0.7 +10
11 12 10 2012-11-23 22-102 SAC @ ... 2 4 4 14 8.6 -1
12 13 11 2012-11-24 22-103 SAC NaN ... 0 2 3 14 11.1 +2
13 14 12 2012-11-27 22-106 SAC NaN ... 0 2 4 20 11.1 -1
14 15 13 2012-11-30 22-109 SAC NaN ... 0 2 1 19 15.8 -6
15 16 14 2012-12-01 22-110 SAC @ ... 0 2 2 8 2.6 -19
16 17 15 2012-12-05 22-114 SAC NaN ... 1 4 3 25 17.7 +10
17 18 16 2012-12-07 22-116 SAC NaN ... 2 4 2 17 16.1 +6
18 19 17 2012-12-08 22-117 SAC @ ... 0 0 2 19 15.1 +13
19 20 18 2012-12-10 22-119 SAC @ ... 0 4 2 25 17.4 -12
22 22 19 2012-12-14 22-123 SAC @ ... 1 2 0 10 6.8 -6
23 23 20 2012-12-16 22-125 SAC NaN ... 0 1 3 19 14.6 -22
24 24 21 2012-12-17 22-126 SAC @ ... 0 3 4 9 3.0 -15
25 25 22 2012-12-19 22-128 SAC NaN ... 1 5 5 24 18.5 +6
26 26 23 2012-12-21 22-130 SAC @ ... 1 2 0 9 8.2 -9
29 29 24 2012-12-28 22-137 SAC NaN ... 1 2 5 15 14.2 +6
30 30 25 2012-12-30 22-139 SAC NaN ... 0 2 5 12 13.0 +21
31 31 26 2013-01-01 22-141 SAC @ ... 0 2 5 21 17.7 -7
32 32 27 2013-01-02 22-142 SAC @ ... 1 1 3 18 21.2 +13
33 33 28 2013-01-04 22-144 SAC @ ... 0 4 4 31 29.2 +18
34 34 29 2013-01-05 22-145 SAC @ ... 1 0 3 28 26.3 -14
35 35 30 2013-01-07 22-147 SAC NaN ... 0 4 3 10 4.2 -21
.. .. .. ... ... ... ... ... .. .. .. .. ... ...
43 42 30 2015-01-21 24-161 SAC NaN ... 0 2 3 28 21.5 +11
44 43 31 2015-01-23 24-163 SAC @ ... 0 5 4 28 19.8 -9
45 44 32 2015-01-28 24-168 SAC @ ... 2 9 4 13 4.8 -4
46 45 33 2015-01-30 24-170 SAC @ ... 1 3 2 21 20.9 -15
47 46 34 2015-01-31 24-171 SAC @ ... 0 4 6 20 12.0 +7
48 47 35 2015-02-03 24-174 SAC NaN ... 4 3 3 26 23.8 -14
49 48 36 2015-02-05 24-176 SAC NaN ... 2 6 3 23 15.6 -22
50 49 37 2015-02-07 24-178 SAC @ ... 2 4 6 27 12.8 -11
51 50 38 2015-02-08 24-179 SAC NaN ... 2 6 3 28 14.9 +2
52 51 39 2015-02-10 24-181 SAC @ ... 1 2 4 15 10.0 -16
53 52 40 2015-02-11 24-182 SAC @ ... 4 6 4 28 23.6 0
54 53 41 2015-02-20 24-191 SAC NaN ... 1 9 3 31 22.7 -10
55 54 42 2015-02-21 24-192 SAC @ ... 2 5 4 21 8.7 -12
56 55 43 2015-02-25 24-196 SAC NaN ... 1 2 6 16 13.5 +9
59 58 44 2015-03-03 24-202 SAC @ ... 2 3 3 22 21.3 +19
60 59 45 2015-03-04 24-203 SAC @ ... 2 5 4 14 5.5 -7
61 60 46 2015-03-06 24-205 SAC @ ... 3 1 2 29 25.7 -8
63 61 47 2015-03-07 24-206 SAC @ ... 1 5 6 27 23.6 -6
64 62 48 2015-03-09 24-208 SAC @ ... 0 1 2 12 16.1 -22
65 63 49 2015-03-11 24-210 SAC @ ... 0 4 4 20 12.2 -3
66 64 50 2015-03-13 24-212 SAC @ ... 1 3 2 39 35.4 +5
67 65 51 2015-03-14 24-213 SAC @ ... 1 3 6 30 21.5 -7
68 66 52 2015-03-16 24-215 SAC NaN ... 3 5 5 20 14.0 -4
71 69 53 2015-03-22 24-221 SAC NaN ... 0 5 5 20 16.3 +5
72 70 54 2015-03-24 24-223 SAC NaN ... 4 3 5 33 29.5 -2
73 71 55 2015-03-25 24-224 SAC @ ... 1 5 2 24 17.4 +8
74 72 56 2015-03-27 24-226 SAC @ ... 3 3 3 39 33.3 -1
76 74 57 2015-04-01 24-231 SAC @ ... 6 5 4 24 25.6 +1
77 75 58 2015-04-03 24-233 SAC NaN ... 3 5 5 24 21.8 +7
78 76 59 2015-04-05 24-235 SAC NaN ... 1 8 2 26 13.9 -1
[205 rows x 30 columns]