Как я могу правильно идентифицировать объект с проверкой, а затем очистить его? - PullRequest
0 голосов
/ 03 апреля 2020

Мне нужно почерпнуть с веб-сайта NASDAQ 2 основную информацию: Институциональная собственность и Всего акций в обращении (миллионы)

(здесь пример из XYLEM IN C. Company https://www.nasdaq.com/market-activity/stocks/xyl/institutional-holdings

Проблема в том, что я не могу найти данные! Если я ищу "div", отображается не так много.

Ниже кода

import pandas as pd
from bs4 import BeautifulSoup
import urllib.request as ur

index = 'XYL'
url = 'https://www.nasdaq.com/market-activity/stocks/' + index + '/institutional-holdings'
print(url)

read = ur.urlopen(url).read() 
soup_is = BeautifulSoup(read,'lxml')
soup_is

ls = []
for l in soup_is.find_all("div"):
    ls.append(l.string)
ls2= list(filter(None,ls))
ls2

Может ли кто-нибудь помочь мне с этим вопросом? Большое спасибо

PS. Вывод следующий

['\n',
 '\n',
 'Data is currently not available',
 'Data is currently not available',
 '\n',
 '\n',
 'Data is currently not available',
 'Data is currently not available',
 '\n',
 'Data is currently not available',
 '\n',
 'Data is currently not available',
 '\n',
 'Data is currently not available',
 '\n',
 'Institutional Holdings information is filed by major institutions on form 13-F with the Securities and Exchange Commission. Major institutions are defined as firms or individuals that exercise investment discretion, over the assets of others, in excess of $100 Million. Major institutions include financial holdings companies, banks, insurance companies, mutual fund managers, portfolio managers, self managed pension and endowment funds. The report is limited to equity securities, including common and equivalents, convertible preferred and convertible bonds. The report does not include fixed income, real estate, or cash equivalents. Reports are filed within 45 days after calendar quarter end with the vast majority of updates occurring near the 45th day of the quarter.',
 'Data is currently not available',
 'Data is currently not available',
 'Data is currently not available']
...