Python Selenium не может получить содержимое тега с сайта mpob - PullRequest
0 голосов
/ 12 декабря 2018

Я пытаюсь удалить информацию с этого сайта http://bepi.mpob.gov.my/index.php/en/summary-2/893-summary-2018.html, чтобы провести некоторое моделирование, однако селен, похоже, не может получить таблицу и, следовательно, расширить какие-либо теги в таблице tr / td.

Вот мой код

from selenium import webdriver
from pandas import DataFrame
from bs4 import BeautifulSoup
import pandas as pd
import re

options = webdriver.ChromeOptions()
options.add_argument('headless')
path = '/Users/Applications/chromedriver'
driver = webdriver.Chrome(chrome_options= options, executable_path=path)

url = 'http://bepi.mpob.gov.my/index.php/en/summary-2.html'

driver.get(url)
summary_2018 = driver.find_element_by_link_text('Summary 2018')
summary_2018.click()

soup = BeautifulSoup(driver.page_source, 'lxml')
print soup

Ответы [ 3 ]

0 голосов
/ 12 декабря 2018

Чтобы извлечь содержимое тега, вам нужно заставить WebDriverWait для элемента с текстом как Сводка 2018 быть активным, и вы можете использовать следующее решение:

  • Кодовый блок:

    from selenium import webdriver
    from bs4 import BeautifulSoup
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions()
    # options.add_argument('headless') # older webdriver versions
    options.set_headless(True) # newer webdriver versions
    driver = webdriver.Chrome(chrome_options= options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    url = 'http://bepi.mpob.gov.my/index.php/en/summary-2.html'
    driver.get(url)
    WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, "Summary 2018"))).click()
    soup = BeautifulSoup(driver.page_source, "html.parser")
    print (soup)
    

Но если вы хотите получить содержимое таблицы, вам нужно переключиться на фрейм , вызывая WebDriverWait , и вы можете использовать следующее решение:

  • Блок кода:

    from selenium import webdriver
    from bs4 import BeautifulSoup
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions()
    # options.add_argument('headless') # older webdriver versions
    options.set_headless(True) # newer webdriver versions
    driver = webdriver.Chrome(chrome_options= options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    url = 'http://bepi.mpob.gov.my/index.php/en/summary-2.html'
    driver.get(url)
    WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, "Summary 2018"))).click()
    WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"blockrandom")))
    WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//font[contains(.,'SUMMARY OF THE MALAYSIAN OIL PALM INDUSTRY  2018')]"))).click()
    soup = BeautifulSoup(driver.page_source, "html.parser")
    print (soup)
    
  • Консольный вывод:

    <html xmlns="http://www.w3.org/1999/xhtml"><head></head><body><p>
    </p><p>
    </p><div align="center"><img src="./Images/mpob.png"/></div><div align="right"><img alt="Print this page" onclick="window.print() " src="./Images/printer.png" title="Print this page"/></div>
    <title>SUMMARY OF THE MALAYSIAN OIL PALM INDUSTRY  2018</title>
    <link href="./Themes/Style.css" rel="stylesheet" type="text/css"/>
    <style type="text/css">
        p.pagebreak {page-break-before: always}
        </style>
    <p align="center"><b><font color="#000000" face="Arial" size="3">SUMMARY OF THE MALAYSIAN OIL PALM INDUSTRY  2018</font></b></p>
    <table border="1" cellpadding="2" cellspacing="0" width="100%">
    <tbody><tr>
    <td align="center" class="PerfomanceTitleDataTD" width="15%"> </td>
    <td align="center" class="PerfomanceTitleDataTD" width="5%">Dec 17</td>
    <td align="center" class="PerfomanceTitleDataTD" width="5%">Jan 18</td>
    <td align="center" class="PerfomanceTitleDataTD" width="5%">Feb</td>
    <td align="center" class="PerfomanceTitleDataTD" width="5%">Mar</td>
    <td align="center" class="PerfomanceTitleDataTD" width="5%">Apr</td>
    <td align="center" class="PerfomanceTitleDataTD" width="5%">May</td>
    <td align="center" class="PerfomanceTitleDataTD" width="5%">Jun</td>
    <td align="center" class="PerfomanceTitleDataTD" width="5%">Jul</td>
    <td align="center" class="PerfomanceTitleDataTD" width="5%">Aug</td>
    <td align="center" class="PerfomanceTitleDataTD" width="5%">Sep</td>
    <td align="center" class="PerfomanceTitleDataTD" width="5%">Oct (r)</td>
    <td align="center" class="PerfomanceTitleDataTD" width="5%">Nov (p)</td>
    <td align="center" class="PerfomanceTitleDataTD" width="5%">Dec</td>
    </tr>
    <tr><td align="left" class="PerfomanceTitle1DataTD" colspan="14">PRODUCTION (TONNES)</td></tr><tr>
    <td align="left" class="PerfomanceContent1DataTD">Crude Palm Oil</td>
    <td align="right" class="PerfomanceContent1DataTD">1,834,165</td>
    <td align="right" class="PerfomanceContent1DataTD">1,586,653</td>
    <td align="right" class="PerfomanceContent1DataTD">1,342,805</td>
    <td align="right" class="PerfomanceContent1DataTD">1,574,079</td>
    <td align="right" class="PerfomanceContent1DataTD">1,558,769</td>
    <td align="right" class="PerfomanceContent1DataTD">1,525,490</td>
    <td align="right" class="PerfomanceContent1DataTD">1,332,704</td>
    <td align="right" class="PerfomanceContent1DataTD">1,503,220</td>
    <td align="right" class="PerfomanceContent1DataTD">1,620,605</td>
    <td align="right" class="PerfomanceContent1DataTD">1,853,602</td>
    <td align="right" class="PerfomanceContent1DataTD">1,964,954</td>
    <td align="right" class="PerfomanceContent1DataTD">1,845,219</td>
    <td align="right" class="PerfomanceContent1DataTD"> </td>
    </tr>
    <tr>
    <td align="left" class="PerfomanceContent1DataTD">Palm Kernel</td>
    <td align="right" class="PerfomanceContent1DataTD">465,062</td>
    <td align="right" class="PerfomanceContent1DataTD">418,424</td>
    <td align="right" class="PerfomanceContent1DataTD">340,708</td>
    <td align="right" class="PerfomanceContent1DataTD">404,924</td>
    <td align="right" class="PerfomanceContent1DataTD">393,003</td>
    <td align="right" class="PerfomanceContent1DataTD">380,987</td>
    <td align="right" class="PerfomanceContent1DataTD">315,731</td>
    <td align="right" class="PerfomanceContent1DataTD">361,220</td>
    <td align="right" class="PerfomanceContent1DataTD">397,683</td>
    <td align="right" class="PerfomanceContent1DataTD">462,795</td>
    <td align="right" class="PerfomanceContent1DataTD">485,666</td>
    <td align="right" class="PerfomanceContent1DataTD">451,725</td>
    <td align="right" class="PerfomanceContent1DataTD"> </td>
    </tr>
    <tr>
    <td align="left" class="PerfomanceContent1DataTD">Palm Kernel Oil</td>
    <td align="right" class="PerfomanceContent1DataTD">223,523</td>
    <td align="right" class="PerfomanceContent1DataTD">206,597</td>
    <td align="right" class="PerfomanceContent1DataTD">161,124</td>
    <td align="right" class="PerfomanceContent1DataTD">195,294</td>
    <td align="right" class="PerfomanceContent1DataTD">188,701</td>
    <td align="right" class="PerfomanceContent1DataTD">179,463</td>
    <td align="right" class="PerfomanceContent1DataTD">156,387</td>
    <td align="right" class="PerfomanceContent1DataTD">175,456</td>
    <td align="right" class="PerfomanceContent1DataTD">179,917</td>
    <td align="right" class="PerfomanceContent1DataTD">191,687</td>
    <td align="right" class="PerfomanceContent1DataTD">231,184</td>
    <td align="right" class="PerfomanceContent1DataTD">216,879</td>
    <td align="right" class="PerfomanceContent1DataTD"> </td>
    </tr>
    <tr>
    <td align="left" class="PerfomanceContent1DataTD">Palm Kernel Cake</td>
    <td align="right" class="PerfomanceContent1DataTD">250,856</td>
    <td align="right" class="PerfomanceContent1DataTD">230,641</td>
    <td align="right" class="PerfomanceContent1DataTD">181,118</td>
    <td align="right" class="PerfomanceContent1DataTD">219,538</td>
    <td align="right" class="PerfomanceContent1DataTD">208,806</td>
    <td align="right" class="PerfomanceContent1DataTD">206,034</td>
    <td align="right" class="PerfomanceContent1DataTD">176,622</td>
    <td align="right" class="PerfomanceContent1DataTD">199,221</td>
    <td align="right" class="PerfomanceContent1DataTD">203,637</td>
    <td align="right" class="PerfomanceContent1DataTD">218,588</td>
    <td align="right" class="PerfomanceContent1DataTD">260,064</td>
    <td align="right" class="PerfomanceContent1DataTD">244,215</td>
    <td align="right" class="PerfomanceContent1DataTD"> </td>
    </tr>
    <tr><td align="left" class="PerfomanceTitle1DataTD" colspan="14">CLOSING STOCK (TONNES)</td></tr><tr>
    <td align="left" class="PerfomanceContent1DataTD">Palm Oil</td>
    <td align="right" class="PerfomanceContent1DataTD">2,732,093</td>
    <td align="right" class="PerfomanceContent1DataTD">2,548,704</td>
    <td align="right" class="PerfomanceContent1DataTD">2,476,445</td>
    <td align="right" class="PerfomanceContent1DataTD">2,321,759</td>
    <td align="right" class="PerfomanceContent1DataTD">2,179,740</td>
    <td align="right" class="PerfomanceContent1DataTD">2,168,882</td>
    <td align="right" class="PerfomanceContent1DataTD">2,187,035</td>
    <td align="right" class="PerfomanceContent1DataTD">2,231,542</td>
    <td align="right" class="PerfomanceContent1DataTD">2,504,915</td>
    <td align="right" class="PerfomanceContent1DataTD">2,529,447</td>
    <td align="right" class="PerfomanceContent1DataTD">2,722,478</td>
    <td align="right" class="PerfomanceContent1DataTD">3,006,988</td>
    <td align="right" class="PerfomanceContent1DataTD"> </td>
    </tr>
    <tr>
    <td align="left" class="PerfomanceContent1DataTD">Palm Kernel</td>
    <td align="right" class="PerfomanceContent1DataTD">179,587</td>
    <td align="right" class="PerfomanceContent1DataTD">151,765</td>
    <td align="right" class="PerfomanceContent1DataTD">151,553</td>
    <td align="right" class="PerfomanceContent1DataTD">147,322</td>
    <td align="right" class="PerfomanceContent1DataTD">142,348</td>
    <td align="right" class="PerfomanceContent1DataTD">142,506</td>
    <td align="right" class="PerfomanceContent1DataTD">131,491</td>
    <td align="right" class="PerfomanceContent1DataTD">117,148</td>
    <td align="right" class="PerfomanceContent1DataTD">133,209</td>
    <td align="right" class="PerfomanceContent1DataTD">187,879</td>
    <td align="right" class="PerfomanceContent1DataTD">182,756</td>
    <td align="right" class="PerfomanceContent1DataTD">175,719</td>
    <td align="right" class="PerfomanceContent1DataTD"> </td>
    </tr>
    <tr>
    <td align="left" class="PerfomanceContent1DataTD">Palm Kernel Oil</td>
    <td align="right" class="PerfomanceContent1DataTD">289,375</td>
    <td align="right" class="PerfomanceContent1DataTD">294,874</td>
    <td align="right" class="PerfomanceContent1DataTD">274,647</td>
    <td align="right" class="PerfomanceContent1DataTD">309,892</td>
    <td align="right" class="PerfomanceContent1DataTD">328,152</td>
    <td align="right" class="PerfomanceContent1DataTD">324,221</td>
    <td align="right" class="PerfomanceContent1DataTD">282,597</td>
    <td align="right" class="PerfomanceContent1DataTD">261,693</td>
    <td align="right" class="PerfomanceContent1DataTD">262,010</td>
    <td align="right" class="PerfomanceContent1DataTD">300,521</td>
    <td align="right" class="PerfomanceContent1DataTD">355,495</td>
    <td align="right" class="PerfomanceContent1DataTD">386,830</td>
    <td align="right" class="PerfomanceContent1DataTD"> </td>
    </tr>
    <tr>
    <td align="left" class="PerfomanceContent1DataTD">Palm Kernel Cake</td>
    <td align="right" class="PerfomanceContent1DataTD">356,019</td>
    <td align="right" class="PerfomanceContent1DataTD">337,061</td>
    <td align="right" class="PerfomanceContent1DataTD">266,331</td>
    <td align="right" class="PerfomanceContent1DataTD">330,527</td>
    <td align="right" class="PerfomanceContent1DataTD">333,195</td>
    <td align="right" class="PerfomanceContent1DataTD">292,451</td>
    <td align="right" class="PerfomanceContent1DataTD">245,129</td>
    <td align="right" class="PerfomanceContent1DataTD">245,701</td>
    <td align="right" class="PerfomanceContent1DataTD">231,695</td>
    <td align="right" class="PerfomanceContent1DataTD">234,128</td>
    <td align="right" class="PerfomanceContent1DataTD">251,292</td>
    <td align="right" class="PerfomanceContent1DataTD">311,150</td>
    <td align="right" class="PerfomanceContent1DataTD"> </td>
    </tr>
    <tr><td align="left" class="PerfomanceTitle1DataTD" colspan="14">EXPORT (TONNES)</td></tr><tr>
    <td align="left" class="PerfomanceContent1DataTD">Palm Oil</td>
    <td align="right" class="PerfomanceContent1DataTD">1,427,425</td>
    <td align="right" class="PerfomanceContent1DataTD">1,472,219</td>
    <td align="right" class="PerfomanceContent1DataTD">1,243,215</td>
    <td align="right" class="PerfomanceContent1DataTD">1,565,746</td>
    <td align="right" class="PerfomanceContent1DataTD">1,530,139</td>
    <td align="right" class="PerfomanceContent1DataTD">1,291,517</td>
    <td align="right" class="PerfomanceContent1DataTD">1,129,515</td>
    <td align="right" class="PerfomanceContent1DataTD">1,196,653</td>
    <td align="right" class="PerfomanceContent1DataTD">1,099,739</td>
    <td align="right" class="PerfomanceContent1DataTD">1,619,317</td>
    <td align="right" class="PerfomanceContent1DataTD">1,578,263</td>
    <td align="right" class="PerfomanceContent1DataTD">1,375,217</td>
    <td align="right" class="PerfomanceContent1DataTD"> </td>
    </tr>
    <tr><td align="left" class="PerfomanceContent1DataTD">Palm Kernel Oil</td>
    <td align="right" class="PerfomanceContent1DataTD">120,564</td>
    <td align="right" class="PerfomanceContent1DataTD">72,938</td>
    <td align="right" class="PerfomanceContent1DataTD">108,106</td>
    <td align="right" class="PerfomanceContent1DataTD">74,470</td>
    <td align="right" class="PerfomanceContent1DataTD">71,493</td>
    <td align="right" class="PerfomanceContent1DataTD">67,093</td>
    <td align="right" class="PerfomanceContent1DataTD">73,378</td>
    <td align="right" class="PerfomanceContent1DataTD">74,696</td>
    <td align="right" class="PerfomanceContent1DataTD">78,987</td>
    <td align="right" class="PerfomanceContent1DataTD">62,923</td>
    <td align="right" class="PerfomanceContent1DataTD">77,385</td>
    <td align="right" class="PerfomanceContent1DataTD">99,640</td>
    <td align="right" class="PerfomanceContent1DataTD"> </td>
    </tr>
    <tr><td align="left" class="PerfomanceContent1DataTD">Palm Kernel Cake</td>
    <td align="right" class="PerfomanceContent1DataTD">221,622</td>
    <td align="right" class="PerfomanceContent1DataTD">217,817</td>
    <td align="right" class="PerfomanceContent1DataTD">217,874</td>
    <td align="right" class="PerfomanceContent1DataTD">132,448</td>
    <td align="right" class="PerfomanceContent1DataTD">178,733</td>
    <td align="right" class="PerfomanceContent1DataTD">214,149</td>
    <td align="right" class="PerfomanceContent1DataTD">201,625</td>
    <td align="right" class="PerfomanceContent1DataTD">150,521</td>
    <td align="right" class="PerfomanceContent1DataTD">213,898</td>
    <td align="right" class="PerfomanceContent1DataTD">171,234</td>
    <td align="right" class="PerfomanceContent1DataTD">225,901</td>
    <td align="right" class="PerfomanceContent1DataTD">166,217</td>
    <td align="right" class="PerfomanceContent1DataTD"> </td>
    </tr>
    <tr><td align="left" class="PerfomanceContent1DataTD">Oleochemical</td>
    <td align="right" class="PerfomanceContent1DataTD">264,532</td>
    <td align="right" class="PerfomanceContent1DataTD">238,304</td>
    <td align="right" class="PerfomanceContent1DataTD">229,578</td>
    <td align="right" class="PerfomanceContent1DataTD">254,816</td>
    <td align="right" class="PerfomanceContent1DataTD">262,984</td>
    <td align="right" class="PerfomanceContent1DataTD">259,849</td>
    <td align="right" class="PerfomanceContent1DataTD">231,532</td>
    <td align="right" class="PerfomanceContent1DataTD">265,304</td>
    <td align="right" class="PerfomanceContent1DataTD">246,954</td>
    <td align="right" class="PerfomanceContent1DataTD">245,310</td>
    <td align="right" class="PerfomanceContent1DataTD">280,806</td>
    <td align="right" class="PerfomanceContent1DataTD">246,964</td>
    <td align="right" class="PerfomanceContent1DataTD"> </td>
    </tr>
    <tr><td align="left" class="PerfomanceContent1DataTD">Biodiesel</td>
    <td align="right" class="PerfomanceContent1DataTD">14,220</td>
    <td align="right" class="PerfomanceContent1DataTD">33,087</td>
    <td align="right" class="PerfomanceContent1DataTD">34,875</td>
    <td align="right" class="PerfomanceContent1DataTD">24,373</td>
    <td align="right" class="PerfomanceContent1DataTD">28,838</td>
    <td align="right" class="PerfomanceContent1DataTD">41,822</td>
    <td align="right" class="PerfomanceContent1DataTD">41,762</td>
    <td align="right" class="PerfomanceContent1DataTD">66,197</td>
    <td align="right" class="PerfomanceContent1DataTD">39,333</td>
    <td align="right" class="PerfomanceContent1DataTD">28,037</td>
    <td align="right" class="PerfomanceContent1DataTD">55,857</td>
    <td align="right" class="PerfomanceContent1DataTD">44,091</td>
    <td align="right" class="PerfomanceContent1DataTD"> </td>
    </tr>
    <tr><td align="left" class="PerfomanceTitle1DataTD" colspan="14">IMPORT (TONNES)</td></tr><tr>
    <td align="left" class="PerfomanceContent1DataTD">Crude Palm Oil (CPO)</td>
    <td align="right" class="PerfomanceContent1DataTD">3,002</td>
    <td align="right" class="PerfomanceContent1DataTD">5,000</td>
    <td align="right" class="PerfomanceContent1DataTD">3,016</td>
    <td align="right" class="PerfomanceContent1DataTD">6,776</td>
    <td align="right" class="PerfomanceContent1DataTD">8,229</td>
    <td align="right" class="PerfomanceContent1DataTD">8,144</td>
    <td align="right" class="PerfomanceContent1DataTD">11,747</td>
    <td align="right" class="PerfomanceContent1DataTD">7,007</td>
    <td align="right" class="PerfomanceContent1DataTD">11,314</td>
    <td align="right" class="PerfomanceContent1DataTD">14,492</td>
    <td align="right" class="PerfomanceContent1DataTD">66,708</td>
    <td align="right" class="PerfomanceContent1DataTD">80,173</td>
    <td align="right" class="PerfomanceContent1DataTD"> </td>
    </tr>
    <tr>
    <td align="left" class="PerfomanceContent1DataTD">Processed Palm Oil (PPO)</td>
    <td align="right" class="PerfomanceContent1DataTD">11,989</td>
    <td align="right" class="PerfomanceContent1DataTD">29,825</td>
    <td align="right" class="PerfomanceContent1DataTD">64,149</td>
    <td align="right" class="PerfomanceContent1DataTD">32,851</td>
    <td align="right" class="PerfomanceContent1DataTD">27,396</td>
    <td align="right" class="PerfomanceContent1DataTD">24,117</td>
    <td align="right" class="PerfomanceContent1DataTD">74,141</td>
    <td align="right" class="PerfomanceContent1DataTD">37,023</td>
    <td align="right" class="PerfomanceContent1DataTD">68,877</td>
    <td align="right" class="PerfomanceContent1DataTD">47,107</td>
    <td align="right" class="PerfomanceContent1DataTD">50,561</td>
    <td align="right" class="PerfomanceContent1DataTD">54,179</td>
    <td align="right" class="PerfomanceContent1DataTD"> </td>
    </tr>
    <tr>
    <td align="left" class="PerfomanceContent1DataTD">Palm Oil (CPO+PPO)</td>
    <td align="right" class="PerfomanceContent1DataTD">14,991</td>
    <td align="right" class="PerfomanceContent1DataTD">34,825</td>
    <td align="right" class="PerfomanceContent1DataTD">67,165</td>
    <td align="right" class="PerfomanceContent1DataTD">39,626</td>
    <td align="right" class="PerfomanceContent1DataTD">35,624</td>
    <td align="right" class="PerfomanceContent1DataTD">32,260</td>
    <td align="right" class="PerfomanceContent1DataTD">85,889</td>
    <td align="right" class="PerfomanceContent1DataTD">44,030</td>
    <td align="right" class="PerfomanceContent1DataTD">80,191</td>
    <td align="right" class="PerfomanceContent1DataTD">61,599</td>
    <td align="right" class="PerfomanceContent1DataTD">117,269</td>
    <td align="right" class="PerfomanceContent1DataTD">134,352</td>
    <td align="right" class="PerfomanceContent1DataTD"> </td>
    </tr>
    <tr>
    <td align="left" class="PerfomanceContent1DataTD">Palm Kernel Oil (PKO)</td>
    <td align="right" class="PerfomanceContent1DataTD">11,419</td>
    <td align="right" class="PerfomanceContent1DataTD">3,329</td>
    <td align="right" class="PerfomanceContent1DataTD">17,219</td>
    <td align="right" class="PerfomanceContent1DataTD">22,566</td>
    <td align="right" class="PerfomanceContent1DataTD">12,029</td>
    <td align="right" class="PerfomanceContent1DataTD">11,144</td>
    <td align="right" class="PerfomanceContent1DataTD">11,357</td>
    <td align="right" class="PerfomanceContent1DataTD">14,700</td>
    <td align="right" class="PerfomanceContent1DataTD">28,436</td>
    <td align="right" class="PerfomanceContent1DataTD">23,464</td>
    <td align="right" class="PerfomanceContent1DataTD">28,940</td>
    <td align="right" class="PerfomanceContent1DataTD">32,488</td>
    <td align="right" class="PerfomanceContent1DataTD"> </td>
    </tr>
    <tr><td align="left" class="PerfomanceTitle1DataTD" colspan="14">PRICE (1% OER) (Local Ex-Mill)</td></tr><tr>
    <td align="left" class="PerfomanceContent1DataTD">Fresh Fruit Bunches (1% Equivalent) </td>
    <td align="right" class="PerfomanceContent1DataTD">27.16</td>
    <td align="right" class="PerfomanceContent1DataTD">28.07</td>
    <td align="right" class="PerfomanceContent1DataTD">27.24</td>
    <td align="right" class="PerfomanceContent1DataTD">25.99</td>
    <td align="right" class="PerfomanceContent1DataTD">25.71</td>
    <td align="right" class="PerfomanceContent1DataTD">25.43</td>
    <td align="right" class="PerfomanceContent1DataTD">24.29</td>
    <td align="right" class="PerfomanceContent1DataTD">23.23</td>
    <td align="right" class="PerfomanceContent1DataTD">23.22</td>
    <td align="right" class="PerfomanceContent1DataTD">23.11</td>
    <td align="right" class="PerfomanceContent1DataTD">21.57</td>
    <td align="right" class="PerfomanceContent1DataTD">18.66</td>
    <td align="right" class="PerfomanceContent1DataTD"> </td>
    </tr>
    </tbody></table>
    <br/><br/>
    <table align="center" bgcolor="#99CCFF" border="0" bordercolor="#111111" cellpadding="0" cellspacing="0" style="border-collapse: collapse; border-width: 0" width="605">
    <tbody><tr>
    <td colspan="2" width="100%">
    <b>Explanatory Notes:</b></td>
    </tr>
    <tr>
    <td align="center" width="8%"> </td>
    <td> </td>
    </tr>
    <tr>
    <td align="center" valign="top" width="8%">(p)</td>
    <td>Preliminary</td>
    </tr>
    <tr>
    <td align="center" width="8%"> </td>
    <td> </td>
    </tr>
    <tr>
    <td align="center" valign="top" width="8%">(r)</td>
    <td>The figures for the month of October 2018 are revised by taking into account corrections made by the licensees and from late receipt of Customs No. 1 and 2 (Rev. 8/89) after 12 November 2018.</td>
    </tr>
    <tr>
    <td align="center" width="8%"> </td>
    <td> </td>
    </tr>
    </tbody></table>
    <p></p>
    </body></html>
    
0 голосов
/ 12 декабря 2018

Вы можете сделать это быстрее, используя requests, чтобы получить URL-адрес iframe, а затем pandas, чтобы получить таблицу с этим URL-адресом

import pandas as pd
import requests 
from bs4 import BeautifulSoup

res = requests.get('http://bepi.mpob.gov.my/index.php/en/summary-2/893-summary-2018.html')
soup = BeautifulSoup(res.content, 'lxml')
iframeURL = soup.select_one('iframe')['src']
results = pd.read_html(iframeURL)
df = results[0].fillna('')
df.iloc[0][0] = 'Category'
print(df)
.
0 голосов
/ 12 декабря 2018

Это происходит потому, что сам источник не содержит никаких данных, которые вам нужны.Данные, вероятно, извлекаются AJAX, а затем отображается то, что вы видите на веб-сайтах.С селеном вы можете использовать driver.find_elements_by_xpath() (или любым другим, что предоставляется с селеном) для удаления html-элемента, хранящегося в tr и td.Или, может быть, вы можете удалить innerHTML таблицы, а затем выполнить обработку для извлечения данных.Используя этот метод find_element или find_elements, вы можете получить то, что на самом деле отображается на веб-сайте, а не только то, какой источник страницы имеет.

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...