Итерация по таблице div с использованием BeautifulSoup - PullRequest
2 голосов
/ 12 июля 2020

A div из class="tableBody" имеет много div дочерних элементов. Я хочу получить весь его div дочерний элемент и строку, которую я выделил на этом рисунке.

import bs4 as bs
import urllib.request
source = urllib.request.urlopen("https://www.ungm.org/Public/Notice").read()
soup = bs.BeautifulSoup(source,'lxml')

t_body = soup.find("div", class_="tableBody")
t_divs = t_body.find_all("div")

приведенный выше код возвращает мне пустой список. введите описание изображения здесь

Я пытаюсь выучить BS4. Буду признателен, если вы поможете мне с кодом.

1 Ответ

2 голосов
/ 12 июля 2020

Данные, которые вы видите на странице, загружаются динамически через JavaScript. Вы можете использовать модуль requests для имитации.

Например:

import requests
from bs4 import BeautifulSoup


url = 'https://www.ungm.org/Public/Notice/Search'

payload = {
  "PageIndex": 0,
  "PageSize": 15,
  "Title": "",
  "Description": "",
  "Reference": "",
  "PublishedFrom": "",
  "PublishedTo": "12-Jul-2020",
  "DeadlineFrom": "12-Jul-2020",
  "DeadlineTo": "",
  "Countries": [],
  "Agencies": [],
  "UNSPSCs": [],
  "NoticeTypes": [],
  "SortField": "DatePublished",
  "SortAscending": False,
  "isPicker": False,
  "NoticeTASStatus": [],
  "IsSustainable": False,
  "NoticeDisplayType": None,
  "NoticeSearchTotalLabelId": "noticeSearchTotal",
  "TypeOfCompetitions": []
}

soup = BeautifulSoup( requests.post(url, json=payload).content, 'html.parser' )

for row in soup.select('.tableRow'):
    cells = [cell.get_text(strip=True) for cell in row.select('.tableCell')]
    print(cells[1])
    print('{:<30}{:<15}{:<15}{:<25}{:<45}{:<15}'.format(*cells[2:]))
    print('-'*80)

Печать:

Supply and delivery of 78 smartphones
13-Jul-2020 11:00 (GMT 2.00)  11-Jul-2020    FAO            Request for quotation    2020/FRMLW/FRMLW/106096                      Malawi         
--------------------------------------------------------------------------------
Supply of LEGUMES SEEDS for rainfed season
23-Jul-2020 14:00 (GMT 2.00)  11-Jul-2020    FAO            Invitation to bid        2020/FRMLW/FRMLW/106051                      Malawi         
--------------------------------------------------------------------------------
Supply of MAIZE SEEDS for rainfed season
22-Jul-2020 14:00 (GMT 2.00)  11-Jul-2020    FAO            Invitation to bid        2020/FRMLW/FRMLW/106050                      Malawi         
--------------------------------------------------------------------------------
Procurement of Supply and Installation of Outdoor Metal Furniture for Rooftop Terrace at FAO Headquarters in Rome, Italy
10-Aug-2020 12:00 (GMT 2.00)  11-Jul-2020    FAO            Invitation to bid        2020/CSAPC/CSDID/105286                      Italy          
--------------------------------------------------------------------------------
Procurement of Silo for Emergency Project
13-Jul-2020 13:00 (GMT 5.00)  11-Jul-2020    FAO            Invitation to bid        2020/FABGD/FABGD/106145                      Bangladesh     
--------------------------------------------------------------------------------
Procurement of Concentrate Ruminant Feed
13-Jul-2020 13:00 (GMT 5.00)  11-Jul-2020    FAO            Invitation to bid        2020/FABGD/FABGD/106064                      Bangladesh     
--------------------------------------------------------------------------------
Purchase of Waste Collection Vehicles - (Two Tractors)
22-Jul-2020 06:30 (GMT 0.00)  11-Jul-2020    UNOPS          Request for quotation    RFQ/2020/15298                               Sri Lanka      
--------------------------------------------------------------------------------
Procurement of Laboratory Equipment and Material
24-Jul-2020 22:23 (GMT -1.00) 11-Jul-2020    FAO            Invitation to bid        2020/FRGAM/FRGAM/106143                      Gambia         
--------------------------------------------------------------------------------
Compra de chalecos para promotores comunitarios para la Oficina de Unicef Bolivar - LRFQ-2020-9159352
16-Jul-2020 23:59 (GMT -3.00) 11-Jul-2020    UNICEF         Request for proposal     LRFQ-2020-9159352                            Venezuela      
--------------------------------------------------------------------------------
Call for Proposals Quality Based Fixed Budget (CFPFB):
26-Jul-2020 17:00 (GMT 3.00)  11-Jul-2020    UNDP           Request for proposal     UNDP-SYR-RPA-051-20                          Syrian Arab Republic
--------------------------------------------------------------------------------
Innovation and Design Specialist
27-Jul-2020 00:00 (GMT -5.00) 11-Jul-2020    UNDP           Not set                  Innovation and Design Specialist             Turkey         
--------------------------------------------------------------------------------
(RFI) from national and/or international CSOs/NGOs for potential partnership with UNDP and its pooled funding mechanism, the Darfur Community Peace and Stability Fund (DCPSF),
26-Jul-2020 08:00 (GMT -7.00) 11-Jul-2020    UNDP           Request for information  RFI-SDN-20-002                               Sudan          
--------------------------------------------------------------------------------
IRAQ-LRPS-017-2020-9159660 Rehabilitation of 3 water projects at Avrek, Grey Basi and Sarsenk in Duhok
26-Jul-2020 12:00 (GMT 3.00)  11-Jul-2020    UNICEF         Request for proposal     9159660                                      Iraq           
--------------------------------------------------------------------------------
106142 INVITACIÓN A COTIZAR PARA LA ADQUISICIÓN DE FERTILIZANTES, HERRAMIENTAS Y MATERIALES PARA ECA DE CACAO
21-Jul-2020 22:00 (GMT -5.00) 10-Jul-2020    FAO            Request for quotation    2020/FLCOL/FLCOL/106142                      Colombia       
--------------------------------------------------------------------------------
Achat de tablettes, de GPS et batteries rechargeable (206 tablettes, 68 GPS, et 181 pack chargeurs et batteries rechargeables) à livrer sur  Dakar
28-Jul-2020 12:00 (GMT 0.00)  10-Jul-2020    FAO            Invitation to bid        2020/FRSEN/FRSEN/106093                      United Kingdom 
--------------------------------------------------------------------------------

РЕДАКТИРОВАТЬ: Чтобы получить все страницы, отфильтруйте только страну «Афганистан» и сохраните в CSV, вы можете использовать этот пример:

import csv
import requests
from bs4 import BeautifulSoup


url = 'https://www.ungm.org/Public/Notice/Search'

payload = {
  "PageIndex": 0,
  "PageSize": 15,
  "Title": "",
  "Description": "",
  "Reference": "",
  "PublishedFrom": "",
  "PublishedTo": "12-Jul-2020",
  "DeadlineFrom": "12-Jul-2020",
  "DeadlineTo": "",
  "Countries": [],
  "Agencies": [],
  "UNSPSCs": [],
  "NoticeTypes": [],
  "SortField": "DatePublished",
  "SortAscending": False,
  "isPicker": False,
  "NoticeTASStatus": [],
  "IsSustainable": False,
  "NoticeDisplayType": None,
  "NoticeSearchTotalLabelId": "noticeSearchTotal",
  "TypeOfCompetitions": []
}

page, all_data = 0, []
while True:
    print('Page {}...'.format(page))

    payload['PageIndex'] = page
    soup = BeautifulSoup( requests.post(url, json=payload).content, 'html.parser' )
    rows = soup.select('.tableRow')
    if not rows:
        break

    for row in rows:
        cells = [cell.get_text(strip=True) for cell in row.select('.tableCell')]
        print(cells[1])
        print('{:<30}{:<15}{:<15}{:<25}{:<45}{:<15}'.format(*cells[2:]))
        print('-'*80)

        # we are only interested in Afghanistan:
        if 'afghanistan' in cells[7].lower():
            all_data.append([row['data-noticeid'], *cells[1:]])

    page += 1

# write to csv file:
with open('data.csv', 'w', newline='') as csvfile:
    csv_writer = csv.writer(csvfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    for row in all_data:
        csv_writer.writerow(row)

Сохранено data.csv (снимок экрана из LibreOffice):

введите описание изображения здесь

...