Python, BeautifulSoup4 TypeError: find () не принимает ни одного ключевого аргумента - PullRequest
0 голосов
/ 08 января 2020

Каждый, я хочу разобрать html с beautifulsoup4 и написать этот код:

from selenium import webdriver
from django.core.management.base import BaseCommand
import datetime
from bs4 import BeautifulSoup as bs


url = "https://www.basketball-reference.com/leagues/NBA_2020.html"
main_url = "https://www.basketball-reference.com"
browser = webdriver.Chrome()
browser.set_window_size(1920, 1080)
browser.minimize_window()
browser.get(url)
soup = bs(browser.page_source, 'lxml')
team_urls = []
try:
    tables = soup.find('table', id='team-stats-per_game')
    for tr in tables.tbody:
        team_name = tr.find('a')
        try:
           if type(team_name) != int:
                if type(team_name) != 'NoneType':
                    team_url = team_name.get('href')
                    team_urls.append(team_url)
        except:
            pass
except Exception as e:
    print(e)
for team in team_urls:
    browser2 = webdriver.Chrome()
    browser2.minimize_window()
    browser2.get(main_url + team)
    team_soup = bs(browser2.page_source, 'lxml')
    team_op_stats = team_soup.find('table', id='team_and_opponent').find_all('tbody')
    for t1_stats in team_op_stats[0]:
        if t1_stats.find('th', attrs={'class', 'left'}):
            print(t1_stats)
        print("##" * 50)
    browser2.quit()
    break
browser.quit()

этот код выводится:

File "C:\Users\ysfnm\PycharmProjects\denemee\denemee\apps\result\management\commands\nba.py", line 46, in handle
    if t1_stats.find('th', attrs={'class', 'left'}):
TypeError: find() takes no keyword arguments

В результате моего исследования я нашел что ответы, данные другим друзьям, получившим ту же ошибку, были следующими:

Вы не вызываете .find () BeautifulSoup, вы вызываете его для обычного строкового объекта (.text атрибут из вашего объекта BeautifulSoup).

Но:

            for t1_stats in team_op_stats[0]:
                print(t1_stats)
                print("##" * 50)

этот код выведет:

<tr>
<th class="left" data-stat="player" scope="row">Team/G</th>
<td class="center iz" data-stat="g"></td>
<td class="center" data-stat="mp_per_g">240.7</td>
<td class="center" data-stat="fg_per_g">43.8</td>
<td class="center" data-stat="fga_per_g">91.0</td>
<td class="center" data-stat="fg_pct">.481</td>
<td class="center" data-stat="fg3_per_g">14.0</td>
<td class="center" data-stat="fg3a_per_g">39.1</td>
<td class="center" data-stat="fg3_pct">.359</td>
<td class="center" data-stat="fg2_per_g">29.7</td>
<td class="center" data-stat="fg2a_per_g">51.9</td>
<td class="center" data-stat="fg2_pct">.573</td>
<td class="center" data-stat="ft_per_g">17.7</td>
<td class="center" data-stat="fta_per_g">24.3</td>
<td class="center" data-stat="ft_pct">.727</td>
<td class="center" data-stat="orb_per_g">10.0</td>
<td class="center" data-stat="drb_per_g">41.5</td>
<td class="center" data-stat="trb_per_g">51.5</td>
<td class="center" data-stat="ast_per_g">26.0</td>
<td class="center" data-stat="stl_per_g">7.7</td>
<td class="center" data-stat="blk_per_g">6.5</td>
<td class="center" data-stat="tov_per_g">14.7</td>
<td class="center" data-stat="pf_per_g">19.3</td>
<td class="center" data-stat="pts_per_g">119.2</td>
</tr>

Где моя ошибка?

1 Ответ

1 голос
/ 08 января 2020
  • изменить: if t1_stats.find('th', attrs={'class', 'left'}):
  • на: if t1_stats.find('th', attrs={'class': 'left'}):

Затем

  • изменить: for t1_stats in team_op_stats[0]:
  • до for t1_stats in team_op_stats:

ОДНАКО

Использование Selenium - медленный процесс. Таблицы внутри находятся в комментариях. Вы можете использовать запросы, затем использовать BeautifulSoup, чтобы вытащить комментарии, а затем захватить таблицы там с помощью Pandas. Обработка будет намного быстрее.

Я не совсем уверен, какую таблицу вы хотите, но из того, что вы показали выше, похоже на статистику команды:

Код:

import requests
from bs4 import BeautifulSoup
from bs4 import Comment
import pandas as pd


url = "https://www.basketball-reference.com/leagues/NBA_2020.html"
main_url = "https://www.basketball-reference.com"

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')


team_urls = []
teams = soup.find_all('div', {'class':'division'})
for each in teams:
    links = each.find_all('a', href=True)
    for link in links:
        team_urls.append(main_url + link['href'])





for team in team_urls:
    response = requests.get(team)
    soup = BeautifulSoup(response.text, 'html.parser')

    seas = soup.find('h1').find_all('span')[0].text
    teamName = soup.find('h1').find_all('span')[1].text

    comments = soup.find_all(string=lambda text: isinstance(text, Comment))

    tables = []
    for each in comments:
        if 'table' in each:
            try:
                tables.append(pd.read_html(each)[0])
            except:
                continue
    print ('%s %s' %(seas, teamName))        
    print (tables[2].to_string())
    print("##" * 50)

Пример вывода:

2019-20 Toronto Raptors
   Unnamed: 0     G     MP     FG    FGA     FG%     3P    3PA    3P%      2P     2PA     2P%     FT    FTA    FT%    ORB   DRB   TRB    AST     STL    BLK    TOV     PF    PTS
0        Team  37.0   8955   1459   3270   0.446    489   1330  0.368     970    1940   0.500    680    850  0.800    381  1335  1716    915     305    198    549    775   4087
1      Team/G   NaN  242.0   39.4   88.4   0.446   13.2   35.9  0.368    26.2    52.4   0.500   18.4   23.0  0.800   10.3  36.1  46.4   24.7     8.2    5.4   14.8   20.9  110.5
2     Lg Rank   NaN     11     21     19  22.000      5      7  6.000      27      24  25.000     10     16  4.000     15     8     9     11       7      9     14     15     15
3   Year/Year   NaN  -0.2%  -6.5%  -0.8%  -0.027   6.8%   6.4%  0.001  -12.1%   -5.2%  -0.039   4.0%   4.5% -0.004   7.4%  1.3%  2.6%  -2.7%   -0.6%   0.4%   5.8%  -0.4%  -3.5%
4    Opponent  37.0   8955   1402   3313   0.423    470   1416  0.332     932    1897   0.491    615    811  0.758    432  1310  1742    921     249    208    610    744   3889
5  Opponent/G   NaN  242.0   37.9   89.5   0.423   12.7   38.3  0.332    25.2    51.3   0.491   16.6   21.9  0.758   11.7  35.4  47.1   24.9     6.7    5.6   16.5   20.1  105.1
6     Lg Rank   NaN     11      2     17   2.000     26     29  3.000       2       4   3.000     10     11  7.000     29    17    26     22       2     26      2     19      4
7   Year/Year   NaN  -0.2%  -5.9%  -0.1%  -0.026  18.1%  22.6% -0.013  -14.6%  -12.3%  -0.014  -2.6%  -1.7% -0.007  10.4%  3.5%  5.2%   1.4%  -11.3%  25.3%  10.4%  -2.0%  -3.0%
####################################################################################################
2019-20 Boston Celtics
   Unnamed: 0     G     MP     FG    FGA     FG%     3P    3PA     3P%     2P    2PA     2P%     FT    FTA    FT%    ORB    DRB    TRB     AST    STL    BLK   TOV    PF    PTS
0        Team  34.0   8185   1384   3036   0.456    403   1151   0.350    981   1885   0.520    598    749  0.798    372   1200   1572     784    277    210   474   720   3769
1      Team/G   NaN  240.7   40.7   89.3   0.456   11.9   33.9   0.350   28.9   55.4   0.520   17.6   22.0  0.798   10.9   35.3   46.2    23.1    8.1    6.2  13.9  21.2  110.9
2     Lg Rank   NaN     30     15     16  17.000     16     13  20.000     13     15  11.000     14     22  6.000      8     14     10      21      9      6     8    16     14
3   Year/Year   NaN  -0.2%  -3.3%  -1.4%  -0.009  -5.8%  -1.9%  -0.015  -2.2%  -1.0%  -0.006  12.5%  13.0% -0.004  11.6%   1.6%   3.8%  -12.3%  -5.4%  16.4%  8.7%  4.0%  -1.4%
4    Opponent  34.0   8185   1281   2932   0.437    397   1156   0.343    884   1776   0.498    561    761  0.737    346   1156   1502     775    233    187   539   711   3520
5  Opponent/G   NaN  240.7   37.7   86.2   0.437   11.7   34.0   0.343   26.0   52.2   0.498   16.5   22.4  0.737   10.2   34.0   44.2    22.8    6.9    5.5  15.9  20.9  103.5
6     Lg Rank   NaN     30      1      6   4.000     13     18   9.000      4      6   8.000      8     16  1.000     16      8      9       5      3     22     6    14      1
7   Year/Year   NaN  -0.2%  -4.6%  -2.1%  -0.012   1.4%   1.5%  -0.000  -7.1%  -4.3%  -0.015  -5.4%  -2.0% -0.027  -2.1%  -4.3%  -3.8%   -3.7%   1.1%  42.3%  4.7%  7.0%  -4.1%
####################################################################################################

Если вы получаете только статистику команды за игру, вы можете получить ее за 1 запрос в https://www.basketball-reference.com/leagues/NBA_2019.html#all_team-stats-base. Интересно, что статистика различается (не знаю почему) между 1 ссылкой и индивидуальной ссылкой каждой команды.

import requests
from bs4 import BeautifulSoup
from bs4 import Comment
import pandas as pd



url = 'https://www.basketball-reference.com/leagues/NBA_2020.html#all_team-stats-base'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')

comments = soup.find_all(string=lambda text: isinstance(text, Comment))

tables = []
for each in comments:
    if 'table' in each:
        try:
            tables.append(pd.read_html(each)[0])
        except:
            continue


print (tables[1])

Вывод:

print (tables[1].to_string())
      Rk                    Team   G     MP    FG   FGA    FG%    3P   3PA    3P%    2P   2PA    2P%    FT   FTA    FT%   ORB   DRB   TRB   AST  STL  BLK   TOV    PF    PTS
0    1.0          Boston Celtics  34  240.7  37.7  86.2  0.437  11.7  34.0  0.343  26.0  52.2  0.498  16.5  22.4  0.737  10.2  34.0  44.2  22.8  6.9  5.5  15.9  20.9  103.5
1    2.0          Denver Nuggets  36  242.1  39.0  86.1  0.453  10.7  32.6  0.327  28.3  53.5  0.529  16.6  21.9  0.757  10.1  33.4  43.5  24.0  7.1  4.7  14.4  20.6  105.2
2    3.0               Utah Jazz  36  240.0  39.2  89.3  0.439  10.8  31.8  0.341  28.4  57.6  0.493  16.7  21.7  0.769   9.8  34.3  44.0  20.5  7.9  5.0  12.6  21.2  105.9
3    4.0           Orlando Magic  37  240.0  38.8  86.4  0.449  11.9  33.6  0.354  26.9  52.8  0.509  14.3  19.0  0.754   9.8  36.8  46.5  23.1  6.9  4.3  15.1  19.2  103.8
4    5.0              Miami Heat  36  244.2  38.3  86.9  0.441  12.2  37.3  0.327  26.1  49.7  0.526  18.4  24.1  0.764   9.4  32.3  41.7  24.0  8.1  4.2  14.3  22.1  107.3
5    6.0      Los Angeles Lakers  37  240.7  38.1  87.1  0.437  11.0  32.6  0.337  27.1  54.5  0.498  17.7  22.2  0.797   9.7  32.3  42.0  22.9  8.2  4.1  16.0  21.3  104.9
6    7.0         Toronto Raptors  37  242.0  37.9  89.5  0.423  12.7  38.3  0.332  25.2  51.3  0.491  16.6  21.9  0.758  11.7  35.4  47.1  24.9  6.7  5.6  16.5  20.1  105.1
7    8.0          Indiana Pacers  37  242.0  39.2  88.6  0.442  10.8  32.3  0.336  28.3  56.3  0.503  17.0  21.8  0.780  10.4  34.6  45.1  23.4  6.5  4.8  14.3  18.9  106.2
8    9.0        Dallas Mavericks  36  242.1  40.6  90.9  0.447  11.4  33.8  0.337  29.2  57.1  0.511  16.7  21.6  0.773  11.1  34.5  45.6  23.3  7.2  3.9  12.7  21.4  109.3
9   10.0           Chicago Bulls  37  241.4  38.2  83.7  0.457  10.9  32.5  0.336  27.3  51.2  0.534  19.8  26.1  0.761  10.5  36.6  47.2  24.1  8.3  6.5  18.3  19.9  107.2
10  11.0   Oklahoma City Thunder  37  242.7  40.8  89.9  0.454  10.8  31.3  0.343  30.1  58.6  0.513  15.0  18.7  0.802  10.6  34.4  45.1  22.7  6.9  4.2  14.3  23.0  107.4
11  12.0         Houston Rockets  35  241.4  42.3  92.2  0.459  12.5  35.7  0.351  29.8  56.6  0.527  16.7  22.3  0.751  10.8  35.1  45.9  26.1  7.9  4.7  15.4  21.6  113.9
12  13.0       San Antonio Spurs  35  244.3  42.6  92.0  0.463  12.5  34.5  0.361  30.2  57.5  0.525  17.1  22.3  0.766   9.7  36.1  45.8  25.1  7.2  4.7  12.6  19.8  114.8
13  14.0           Brooklyn Nets  36  243.5  40.7  93.9  0.433  12.2  34.4  0.354  28.5  59.4  0.479  18.1  23.4  0.771  11.5  35.9  47.4  21.2  7.8  5.6  13.5  21.2  111.7
14  15.0      Philadelphia 76ers  38  241.3  39.1  85.6  0.457   9.8  27.6  0.355  29.3  57.9  0.505  18.0  24.3  0.738   8.2  32.5  40.7  21.9  7.4  4.0  14.2  20.9  105.9
15  16.0         Milwaukee Bucks  38  240.7  38.6  93.4  0.414  14.2  38.4  0.370  24.4  55.0  0.444  15.8  20.6  0.769   9.8  36.3  46.0  23.9  7.2  4.6  14.5  21.4  107.3
16  17.0  Minnesota Timberwolves  36  244.9  41.8  91.6  0.457  11.2  31.4  0.356  30.6  60.1  0.509  19.6  24.8  0.788  11.1  37.4  48.5  23.3  7.4  5.5  15.7  22.3  114.4
17  18.0        Sacramento Kings  38  242.6  39.5  84.9  0.465  11.7  33.4  0.349  27.8  51.5  0.540  17.9  22.5  0.796   9.3  33.8  43.1  24.3  8.1  4.3  15.3  19.0  108.5
18  19.0         New York Knicks  37  240.7  39.5  85.9  0.460  13.6  35.1  0.386  25.9  50.8  0.511  19.3  26.2  0.739  10.2  36.2  46.3  23.9  7.0  4.9  14.2  19.9  111.9
19  20.0    Los Angeles Clippers  38  240.7  39.4  89.7  0.439  11.9  34.4  0.346  27.5  55.3  0.497  19.0  24.7  0.768  10.9  34.4  45.3  22.8  8.4  5.0  15.3  23.5  109.8
20  21.0     Cleveland Cavaliers  37  240.7  43.4  89.7  0.484  12.7  33.7  0.377  30.7  56.0  0.549  14.1  18.3  0.770   9.9  33.5  43.5  25.9  8.8  6.6  12.9  19.6  113.6
21  22.0         Detroit Pistons  38  240.0  41.7  88.1  0.474  11.4  30.4  0.377  30.3  57.7  0.525  16.2  20.9  0.774  10.2  33.1  43.3  25.1  8.2  5.8  14.1  20.1  111.1
22  23.0            Phoenix Suns  37  242.0  41.8  87.5  0.477  11.9  32.0  0.373  29.8  55.4  0.538  19.7  25.6  0.771   9.0  36.0  45.1  23.8  7.6  5.6  16.2  23.4  115.2
23  24.0   Golden State Warriors  38  242.0  41.6  88.4  0.471  13.5  34.8  0.387  28.1  53.6  0.525  16.2  21.0  0.774  10.4  35.8  46.3  25.2  8.1  5.4  16.3  20.4  112.9
24  25.0  Portland Trail Blazers  38  240.7  40.8  91.9  0.444  12.4  34.3  0.361  28.4  57.5  0.494  19.6  25.6  0.767  11.8  36.2  47.9  23.6  7.2  5.3  13.0  19.9  113.6
25  26.0      Washington Wizards  36  240.7  43.4  89.1  0.487  12.3  33.3  0.369  31.1  55.8  0.557  21.1  26.9  0.786  10.6  35.9  46.5  25.6  7.0  5.5  15.6  21.3  120.1
26  27.0    New Orleans Pelicans  37  242.0  41.9  89.7  0.468  12.6  34.1  0.369  29.4  55.6  0.528  20.4  25.6  0.797   9.8  36.5  46.3  24.5  7.7  4.4  15.1  19.9  116.9
27  28.0       Charlotte Hornets  39  241.9  42.2  88.3  0.479  12.4  34.9  0.355  29.8  53.4  0.559  14.2  18.3  0.773  10.8  35.3  46.1  27.0  8.3  4.8  14.9  21.2  111.0
28  29.0           Atlanta Hawks  37  242.0  42.8  90.0  0.476  11.5  32.1  0.359  31.3  57.8  0.541  20.1  26.0  0.774  11.2  35.5  46.8  24.6  8.9  6.6  15.8  20.5  117.3
29  30.0       Memphis Grizzlies  38  240.7  41.8  90.1  0.464  12.3  33.8  0.365  29.5  56.3  0.524  20.3  25.7  0.790   9.9  35.1  44.9  25.1  7.9  5.4  14.6  19.8  116.3
30   NaN          League Average  37  241.7  40.4  88.9  0.455  11.9  33.6  0.355  28.5  55.3  0.516  17.6  22.9  0.771  10.3  35.0  45.3  24.0  7.6  5.1  14.8  20.8  110.4
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...