Как зациклить клик с селеном и очистить каждую таблицу с помощью bs4? - PullRequest
0 голосов
/ 19 января 2019

Я пытаюсь очистить некоторые скрытые таблицы (15 таблиц на страницу), которые раскрываются после нажатия стрелки. (Я прилагаю фотографии: Нерасширенные таблицы Расширенные таблицы )

Я тоже прикрепляю HTML (извините, он немного длинный)

<table class="footable table toggle-arrow-tiny default breakpoint footable-loaded" transparenturl="Images/arrow_none.gif" ascendingurl="Images/arrow_up.gif" customsortdirection="Ascending" custompageindex="0" customsortfield="fullname" custompagealphaindex="A" custompagemode="ABC" custompagealpharelative="A" descendingurl="Images/arrow_down.gif" customvirtualcount="1605" id="MainContent_gw_partners" style="border-collapse:collapse;" cellspacing="0">
    <thead>
        <tr>
            <th data-toggle="true" scope="col" class="footable-visible footable-first-column"> &nbsp;&nbsp;</th><th data-ignore="true" data-hide="phone, tablet" scope="col" class="footable-visible"> &nbsp;&nbsp;</th><th data-ignore="true" data-hide="phone, tablet" scope="col" class="footable-visible">Titolo&nbsp;&nbsp;</th><th scope="col" class="footable-visible">Cognome&nbsp;&nbsp;</th><th data-ignore="true" data-hide="phone, tablet" scope="col" class="footable-visible">NPA&nbsp;&nbsp;</th><th data-ignore="true" data-hide="phone" scope="col" class="footable-visible">Luogo&nbsp;&nbsp;</th><th data-ignore="true" data-hide="phone" scope="col" class="footable-visible footable-last-column">Cantone&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">Discipline(s) thérapeutique(s)&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">Società&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">Cognome&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">C/O&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">Via&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">NPA&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">Luogo&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">Tel / Cellulare&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">Cellulare  &nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">Fax&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">e-mail&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">Sito WEB&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">Altri luoghi di lavoro&nbsp;&nbsp;</th><th data-hide="all" scope="col" style="display: none;">Discipline(s) thérapeutique(s)&nbsp;&nbsp;</th>
        </tr>
    </thead><tbody>
        <tr class="row_white footable-detail-show">
            <td class="footable-visible footable-first-column"><span class="footable-toggle"></span>&nbsp;</td><td class="footable-visible">

                    </td><td class="footable-visible">&nbsp;</td><td class="footable-visible">

                        ABBONDANZIERI Katia
                    </td><td class="footable-visible">
                        1204
                        <br>

                    </td><td class="footable-visible">
                        Genève
                        <br>

                    </td><td class="footable-visible footable-last-column">
                        GE
                        <br>

                    </td><td style="display: none;">
                        197.&nbsp;Omeopatia, 202.&nbsp;Linfodrenaggio&nbsp;manuale, 205.&nbsp;Massaggio&nbsp;classico, 664.&nbsp;Riflessoterapia&nbsp;generale
                    </td><td style="display: none;">

                    </td><td style="display: none;">
                        ABBONDANZIERI Katia
                    </td><td style="display: none;">


                    </td><td style="display: none;">
                        Place du Cirque, 2
                    </td><td style="display: none;">
                        1204
                    </td><td style="display: none;">
                        Genève
                    </td><td style="display: none;">
                        022 328 23 44 
                    </td><td style="display: none;">
                        079 601 92 75 
                    </td><td style="display: none;">

                    </td><td style="display: none;">

                    </td><td style="display: none;">

                    </td><td style="display: none;">

                    </td><td style="display: none;">
                        <div class="thZone"><div class="zCat">METHODES DE MASSAGE</div><div class="zThr">Linfodrenaggio manuale</div><div class="zThr">Massaggio classico</div><div class="zCat">METHODES PRESCRIPTIVES</div><div class="zThr">Omeopatia</div><div class="zCat">METHODES REFLEXES</div><div class="zThr">Riflessoterapia generale</div></div>
                    </td>
        </tr><tr class="footable-row-detail" style="display: table-row;"><td class="footable-row-detail-cell" colspan="7"><div class="footable-row-detail-inner"><div class="footable-row-detail-row"><div class="footable-row-detail-name">Discipline(s) thérapeutique(s):</div><div class="footable-row-detail-value">197.&nbsp;Omeopatia, 202.&nbsp;Linfodrenaggio&nbsp;manuale, 205.&nbsp;Massaggio&nbsp;classico, 664.&nbsp;Riflessoterapia&nbsp;generale</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Cognome:</div><div class="footable-row-detail-value">ABBONDANZIERI Katia</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Via:</div><div class="footable-row-detail-value">Place du Cirque, 2</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">NPA:</div><div class="footable-row-detail-value">1204</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Luogo:</div><div class="footable-row-detail-value">Genève</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Tel / Cellulare:</div><div class="footable-row-detail-value">022 328 23 44</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Cellulare:</div><div class="footable-row-detail-value">079 601 92 75</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Discipline(s) thérapeutique(s):</div><div class="footable-row-detail-value"><div class="thZone"><div class="zCat">METHODES DE MASSAGE</div><div class="zThr">Linfodrenaggio manuale</div><div class="zThr">Massaggio classico</div><div class="zCat">METHODES PRESCRIPTIVES</div><div class="zThr">Omeopatia</div><div class="zCat">METHODES REFLEXES</div><div class="zThr">Riflessoterapia generale</div></div></div></div></div></td></tr><tr class="row_grey footable-detail-show">
            <td class="footable-visible footable-first-column"><span class="footable-toggle"></span>&nbsp;</td><td class="footable-visible">

                            <a href="http://www.kinesiopourtous.ch" target="_blank">
                                <img title="Link internet" alt="" style="MARGIN-RIGHT: 7px" src="Images/pictoSiteInternet.jpg" width="12" height="12" border="0">
                            </a>

                    </td><td class="footable-visible">&nbsp;</td><td class="footable-visible">
                        <img id="MainContent_gw_partners_img1_1" src="Images/multi.gif">
                        ABEGG Sophie
                    </td><td class="footable-visible">
                        1212
                        <br>
                        1875<br>
                    </td><td class="footable-visible">
                        Grand-Lancy
                        <br>
                        <nobr>Morgins</nobr><nobr><br>
                    </nobr></td><td class="footable-visible footable-last-column">
                        GE
                        <br>
                        VS<br>
                    </td><td style="display: none;">
                        199.&nbsp;Kinesiologia
                    </td><td style="display: none;">
                        Kinéso pour tous
                    </td><td style="display: none;">
                        ABEGG Sophie
                    </td><td style="display: none;">


                    </td><td style="display: none;">
                        Rue du Bachet 8
                    </td><td style="display: none;">
                        1212
                    </td><td style="display: none;">
                        Grand-Lancy
                    </td><td style="display: none;">

                    </td><td style="display: none;">
                        076 365 63 86
                    </td><td style="display: none;">

                    </td><td style="display: none;">

                            <a href="mailto:sophie@kinesiopourtous.ch">sophie[at]kinesiopourtous.ch
                            </a>

                    </td><td style="display: none;">

                            <a href="http://www.kinesiopourtous.ch" target="_blank">
                                www.kinesiopourtous.ch
                            </a>

                    </td><td style="display: none;">
                        Résidence Bellevue, Rte de France 22, 1875 Morgins, CH<br>
                    </td><td style="display: none;">
                        <div class="thZone"><div class="zCat">METHODES ENERGETIQUES MANUELLES</div><div class="zThr">Kinesiologia</div></div>
                    </td>
        </tr><tr class="footable-row-detail"><td class="footable-row-detail-cell" colspan="7"><div class="footable-row-detail-inner"><div class="footable-row-detail-row"><div class="footable-row-detail-name">Discipline(s) thérapeutique(s):</div><div class="footable-row-detail-value">199.&nbsp;Kinesiologia</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Società:</div><div class="footable-row-detail-value">Kinéso pour tous</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Cognome:</div><div class="footable-row-detail-value">ABEGG Sophie</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Via:</div><div class="footable-row-detail-value">Rue du Bachet 8</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">NPA:</div><div class="footable-row-detail-value">1212</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Luogo:</div><div class="footable-row-detail-value">Grand-Lancy</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Cellulare:</div><div class="footable-row-detail-value">076 365 63 86</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">e-mail:</div><div class="footable-row-detail-value"><a href="mailto:sophie@kinesiopourtous.ch">sophie[at]kinesiopourtous.ch
                            </a></div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Sito WEB:</div><div class="footable-row-detail-value"><a href="http://www.kinesiopourtous.ch" target="_blank">
                                www.kinesiopourtous.ch
                            </a></div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Altri luoghi di lavoro:</div><div class="footable-row-detail-value">Résidence Bellevue, Rte de France 22, 1875 Morgins, CH<br></div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Discipline(s) thérapeutique(s):</div><div class="footable-row-detail-value"><div class="thZone"><div class="zCat">METHODES ENERGETIQUES MANUELLES</div><div class="zThr">Kinesiologia</div></div></div></div></div></td></tr><tr class="row_white">
            <td class="footable-visible footable-first-column"><span class="footable-toggle"></span>&nbsp;</td><td class="footable-visible">

Так что я использую Selenium для нажатия и BeautifulSoup 4 для чистки таблиц.

Я хотел бы создать цикл, чтобы щелкнуть каждую стрелку (15 стрелок на каждой странице) и очистить данные из каждой таблицы (13 строк в каждой таблице. Если данные отсутствуют, ячейка должна быть пуста в выходном файле Excel).

Любая помощь, пожалуйста?

Ответы [ 3 ]

0 голосов
/ 19 января 2019

Селен способ расширить эти таблицы. Есть лучший способ справиться с галстуком, который требуется для загрузки, но я просто хотел получить это как можно скорее, поэтому просто пошел с time.sleep

from selenium import webdriver
import time


url = 'http://www.asca.ch/Partners.aspx?lang=it'

driver = webdriver.Chrome()
driver.get(url)

# Click the dropdown, select GE, click Confermo, click Ricerca
driver.find_element_by_xpath('//*[@id="ctl00_MainContent_ddl_cantons_Arrow"]').click()
time.sleep(2)

driver.find_element_by_xpath('//*[@id="ctl00_MainContent_ddl_cantons_DropDown"]/div/ul/li[9]').click()
driver.find_element_by_xpath('//*[@id="MainContent__chkDisclaimer"]').click()
driver.find_element_by_xpath('//*[@id="MainContent_btn_submit"]').click()
time.sleep(5)

#Function to Expand Tables
def expand_tables():
    rows = driver.find_elements_by_xpath('//*[@id="MainContent_gw_partners"]/tbody/tr')
    for row in rows:
        row.click()

# Function to Click Next Page        
def click_next_page():
    driver.find_element_by_xpath('//*[@id="MainContent_btnNextPackId"]').click()



page = 1
num_of_pages = True
while num_of_pages == True:
    print ('Page: %s' %page)
    expand_tables()

    ## Your code to Parse the Tables ## 

    try:
        click_next_page()
        page += 1
    except:
        print ('You are at the end')


    time.sleep(2)






# When finished
driver.close()
0 голосов
/ 19 января 2019

Извините, я не смог вписать свой код в комментарии, поэтому я публикую в качестве ответа.

Это мой код для разбора таблиц:

# To find all the tables
table = soup.find('table', {'class': 'footable'})

# To get all rows in that table
rows = table.find_all('tr')

# A function to process each row
def processRow(row):
    #All rows with hidden data
    dataFields = row.find_all('td', {'style': True}
    output = {}
    #Fixed index numbers are not ideal but in this case will work
    output['Discipline'] = dataFields[0].text
    output['Cogome'] = dataFields[2].text
    output['Cellulare'] = dataFields[8].text
    output['email'] = dataFields[10].text
    return output

# Declaring a list to store all results
results = []

# Iterating over all the rows and storing the processed result in a list
for row in rows:
    results.append(processRow(row))

print(results)


    click_next_page()
    time.sleep(3)
    count += 1

Я думаю, что-то не так. Я получаю «SyntaxError: неверный синтаксис» на «output = {}» ниже # Функция для обработки каждой строки.

0 голосов
/ 19 января 2019

Если вы проверяете, вы можете увидеть, что это метод запроса: POST, поэтому использовали другой метод.

Если вы предпочитаете использовать селен, просто дайте мне знать, и я тоже могу попытаться это сделать.

Вам нужно будет взять данные формы и скопировать их в словарь полезной нагрузки. Я не включил все это, потому что оно слишком длинное, но я включил его фрагмент в код, чтобы вы могли увидеть формат.

enter image description here

Тогда я просто использовал панд, чтобы взять таблицу с данными.

import requests
import bs4
import pandas as pd


url = 'http://www.asca.ch/Partners.aspx?lang=it'
headers = {'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.9',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
'Content-Length': '55755',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Cookie': '_ga=GA1.2.1140629371.1547917375; _gid=GA1.2.1588639047.1547917375; ASP.NET_SessionId=fmxjh5jxwuq10awmqch1ztjz; __AntiXsrfToken=1d9c575ab1494ab29d2e796e2853eaac; _gat=1',
'Host': 'www.asca.ch',
'Origin': 'http://www.asca.ch',
'Referer': 'http://www.asca.ch/Partners.aspx?lang=it',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
'X-MicrosoftAjax': 'Delta=true',
'X-Requested-With': 'XMLHttpRequest'}


payload = {
'ctl00$RadScriptManagerMaster': 'ctl00$RadScriptManagerMaster|ctl00$MainContent$btn_submit',
'RadStyleSheetManager1_TSSM': ';|636398747139118389:c7e0c438;|636304438089400012:39e38b4c;|636304438089880540:19119943;|636304438090200892:b81c9af7;|636304438090180870:bb009068;|636304438089390001:e78ed9b3;|636325253237635520:dedafabf;|636304438089530155:5961cfc1;|636304438090290991:d08fa23c;|636304438089530155:7fafd27a',
'RadScriptManagerMaster_TSM': ';;System.Web.Extensions, Version=4.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35:en-US:af7dd01d-1544-48f6-a85d-1285ae370050:ea597d4b:b25378d2;||:460a097d:7a38c288:ace9a216;Telerik.Web.UI, Version=2014.1.403.40, Culture=neutral, PublicKeyToken=121fae78165ba3d4:en-US:ca584452-327f-4858-bf00-fb22c6f6fd75:16e4e7cd:ed16cbdc:f7645509:24ee1bba:f46195d3:2003d0b8:88144a7a:1e771326:aa288e2d:258f1c72:7165f74;',
'ctl00$MainContent$ddl_partners':'' ,
'ctl00_MainContent_ddl_partners_ClientState':'' ,
'ctl00$MainContent$ddl_countries': 'Suisse',
'ctl00_MainContent_ddl_countries_ClientState': '',
'ctl00$MainContent$ddl_cantons': 'GE',

...
...

'__ASYNCPOST': 'true',
'RadAJAXControlID': 'ctl00_MainContent_RadAjaxManager1'
}


r = requests.post(url, headers=headers, data=payload)
soup = r.text

tables = pd.read_html(r.text)
data = tables[0]

Выход:

print (data)
    Unnamed: 0                        ...                                           Discipline(s) thérapeutique(s).1
0          NaN                        ...                          METHODES DE MASSAGELinfodrenaggio manualeMassa...
1          NaN                        ...                                METHODES ENERGETIQUES MANUELLESKinesiologia
2          NaN                        ...                                      METHODES DE MASSAGEMassaggio classico
3          NaN                        ...                          METHODES AYURVEDIQUESHatha YogaMETHODES PSYCHO...
4          NaN                        ...                          METHODES DE MASSAGEMassaggio classicoMETHODES ...
5          NaN                        ...                                            METHODES PRESCRIPTIVESOmeopatia
6          NaN                        ...                          METHODES ENERGETIQUES MANUELLESReikiMETHODES O...
7          NaN                        ...                          METHODES DE MASSAGEMassaggio tradizionale thai...
8          NaN                        ...                          METHODES DE MASSAGEMassaggio classicoMassaggio...
9          NaN                        ...                                      METHODES DE MASSAGEMassaggio empirico
10         NaN                        ...                          METHODES PSYCHOLOGIQUES COMPLEMENTAIRESConsigl...
11         NaN                        ...                          METHODES PRESCRIPTIVESConsigli dietetici (MCO)...
12         NaN                        ...                          METHODES DE MASSAGEMassaggio classicoMassaggio...
13         NaN                        ...                                   METHODES DE MASSAGEMassaggio terapeutico
14         NaN                        ...                          METHODES DE MASSAGELinfodrenaggio manualeMETHO...

[15 rows x 21 columns]
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...