Прекрасный Суп Экстракт первого ТД - PullRequest
0 голосов
/ 03 апреля 2020

Я пытаюсь извлечь данные из таблицы HTML, используя красивый суп.

import requests
import urllib.request
import time
from bs4 import BeautifulSoup

import webbrowser
import httplib2

import pyodbc 
from datetime import datetime
from pprint import pprint

quote_page = 'https://ph.investing.com/economic-calendar/'

table = soup.find_all('table', attrs={'id': 'economicCalendarData'})

req = urllib.request.Request(quote_page,headers={'User-Agent':"Magic Browser"})
resp = urllib.request.urlopen(req)
data = resp.read()
html = data.decode('ISO-8859-1')
#print(html)
soup = BeautifulSoup(html, 'html5lib')
print (soup.prettify())

table = soup.find_all('table', attrs={'id': 'economicCalendarData'})
print(table)

res = []
for tr in table:
td = tr.find_all('td') 
if row: 
    res.append(row)
print (res)

Но для первого TD, в котором таблица содержит дату.

https://ph.investing.com/economic-calendar/

enter image description here

Я хочу сохранить эту дату в переменной, а затем перенести остальные данные в таблицу

import pandas as pd
df = pd.DataFrame(res)
df

Заранее спасибо.

Ответы [ 2 ]

0 голосов
/ 03 апреля 2020
import requests
import pandas as pd

headers = {'User-Agent': 'Mozila'}

r = requests.get(
    "https://ph.investing.com/economic-calendar/", headers=headers)

df = pd.read_html(r.content, attrs={'id': 'economicCalendarData'})[0]

date = df.iloc[0][0]

print(date)

df.to_csv("data.csv", index=False)

Вывод:

Friday, April 3, 2020

data.csv просмотр в режиме онлайн

enter image description here

0 голосов
/ 03 апреля 2020

Используйте следующий селектор css, чтобы получить значение даты в первом столбце таблицы.

import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36'}
r=requests.get("https://ph.investing.com/economic-calendar/",headers=headers)
soup=BeautifulSoup(r.text,"html.parser")
table=soup.find('table',attrs={"id":"economicCalendarData"})
print(table.select_one('tbody>tr>td.theDay').text)

Или вы можете использовать.

 print(soup.select_one('table#economicCalendarData>tbody>tr>td.theDay').text)

ИЛИ

  print(soup.select_one('table#economicCalendarData td.theDay').text)

Чтобы напечатать всю таблицу во фрейме данных и импортировать в CSV.

import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36'}
r=requests.get("https://ph.investing.com/economic-calendar/",headers=headers)
soup=BeautifulSoup(r.text,"html.parser")
print(soup.select_one('table#economicCalendarData td.theDay').text)
table=soup.find('table',attrs={"id":"economicCalendarData"})
df=pd.read_html(str(table))[0]
df1=df.iloc[1:,:7]
print(df1)
df1.to_csv("index.csv", index=False)

**Output**:



   Friday, April 3, 2020
     Time Cur. Imp.  ...   Actual Forecast Previous
     Time Cur. Imp.  ...   Actual Forecast Previous
1   05:00  KRW  NaN  ...  400.21B      NaN  409.17B
2   05:30  AUD  NaN  ...     37.9      NaN     42.7
3   06:00  AUD  NaN  ...     38.5     39.8     49.0
4   08:01  EUR  NaN  ...     32.5      NaN     59.9
5   08:30  AUD  NaN  ...     0.5%     0.4%    -0.3%
6   08:30  JPY  NaN  ...     33.8     32.7     46.8
7   08:30  HKD  NaN  ...     34.9      NaN     33.1
8   09:45  CNY  NaN  ...     43.0      NaN     26.5
9   13:00  SGD  NaN  ...    -8.6%      NaN    -5.3%
10  13:00  SGD  NaN  ...    -8.9%      NaN     0.2%
11  14:30  SEK  NaN  ...     46.9      NaN     56.4
12  14:45  EUR  NaN  ...   -35.2B      NaN   -20.0B
13  15:00  EUR  NaN  ...    -1.3%     2.1%    -2.2%
14  15:15  EUR  NaN  ...     23.0     25.5     52.1
15  15:15  ZAR  NaN  ...     44.5      NaN     48.4
16  15:30  THB  NaN  ...    34.4B      NaN    35.1B
17  15:30  THB  NaN  ...   227.2B      NaN   219.9B
18  15:45  EUR  NaN  ...     20.2      NaN     50.7
19  15:45  EUR  NaN  ...     17.4     22.0     52.1
20  15:50  EUR  NaN  ...     28.9     30.2     52.0
21  15:50  EUR  NaN  ...     27.4     29.0     52.5
22  15:55  EUR  NaN  ...     35.0     36.8     50.7
23  15:55  EUR  NaN  ...     31.7     34.3     52.5
24  16:00  EUR  NaN  ...    -2.4%      NaN     2.2%
25  16:00  NOK  NaN  ...   10.70%   13.50%    2.30%
26  16:00  EUR  NaN  ...     29.7     31.4     51.6
27  16:00  EUR  NaN  ...     26.4     28.4     52.6
28  16:30  GBP  NaN  ...     36.0     36.2     53.0
29  16:30  GBP  NaN  ...     34.5     34.8     53.2
30  17:00  NOK  NaN  ...    1.50%      NaN    3.60%
31  17:00  EUR  NaN  ...     0.9%     0.1%     0.7%
32  17:00  EUR  NaN  ...     3.0%     1.7%     2.2%
33  19:30  INR  NaN  ...  475.56B      NaN  475.56B
34  20:30  USD  NaN  ...     3.1%     3.0%     3.0%
35  20:30  USD  NaN  ...     0.4%     0.2%     0.3%
36  20:30  USD  NaN  ...     34.2     34.1     34.4
37  20:30  USD  NaN  ...    12.0K      NaN    33.0K
38  20:30  USD  NaN  ...     -18K     -20K      13K
39  20:30  USD  NaN  ...    -701K    -100K     275K
40  20:30  USD  NaN  ...    62.7%    63.3%    63.4%
41  20:30  USD  NaN  ...    -713K    -163K     242K
42  20:30  USD  NaN  ...     8.7%      NaN     7.0%
43  20:30  USD  NaN  ...     4.4%     3.8%     3.5%
44  21:00  BRL  NaN  ...     37.6      NaN     50.9
45  21:00  BRL  NaN  ...     34.5      NaN     50.4
46  21:00  SGD  NaN  ...     45.4      NaN     48.7
47  21:45  USD  NaN  ...      NaN     40.5     49.6
48  21:45  USD  NaN  ...      NaN     39.1     49.4
49  22:00  USD  NaN  ...      NaN     45.0     57.8
50  22:00  USD  NaN  ...      NaN      NaN     55.6
51  22:00  USD  NaN  ...      NaN      NaN     63.1
52  22:00  USD  NaN  ...      NaN     44.0     57.3
53  22:00  USD  NaN  ...      NaN      NaN     50.8
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...