Как я могу извлечь (.find ()) div, у которого нет класса - PullRequest
0 голосов
/ 06 августа 2020

я хочу извлечь идентификатор веб-страницы из html этого {{ ссылка }} сайта.

мой код это ...

import requests
from bs4 import BeautifulSoup

URL = f"https://stackoverflow.com/jobs?q=python&sort%20=i"

def extract_jobs(last_page):
    jobs = []
    for page in range(last_page):
        result = requests.get(f"{URL}&pg={page+1}")
        soup = BeautifulSoup(result.text, "html.parser").find("div", {"class": "listResults"})
        id_list = soup.find_all("div", {"data-jobid"})
        for id in id_list:
            jobs.append(id)
    return jobs


print(extract_jobs(1))

но мой код дает только мне [] (пустой список). Зачем...? и как я могу получить data-jobid из этого html?

Я пытаюсь извлечь data_jobid снизу Html

<div data-jobid="185876" data-result-id="185876" data-preview-url="/jobs/185876?a=10kTLndTtq3S&amp;so=i&amp;sec=False&amp;pg=1&amp;offset=0&amp;total=163&amp;srp=True&amp;so_medium=Internal&amp;so_source=JobSearchPreview" data-beacon-url="/jobs/n/v/185876?url=%2Fjobs%2F185876%3Fa%3D10kTLndTtq3S%26so%3Di%26sec%3DFalse%26pg%3D1%26offset%3D0%26total%3D163%26srp%3DTrue%26so_medium%3DInternal%26so_source%3DJobSearchPreview&amp;referrer=http%3A%2F%2Fcareers.stackoverflow.com%2Fso-proxy%2Fjobs%3Fq%3Dpython%26sort%20%2B%3Di" class="-job js-result js-dismiss-overlay-container ps-relative _selected js-selected p12 pl24 _featured">
        <div class="dismiss-overlay ps-absolute ta-center t0 r0 b0 l0 grid ai-center jc-center o90 bg-black-050 z-active">
            <p class="mb0">Okay, you won’t see this job anymore. <a href="#" class="js-undismiss-job" data-id="185876">Undo</a></p>
        </div>
    
    <div class="grid">
                <div class="grid--cell fl-shrink mr12 w48 h48">
                    <img src="https://i.stack.imgur.com/UI3Jl.png?s=48" class="w48 h48 bar-sm">
                </div>
        <div class="grid--cell fl1 ">
                <span class="float-right ml12 mrn12 bg-yellow-200 fc-yellow-900 px8 py4 tt-uppercase fw-bold fs-fine bar-sm">featured</span>

            <h2 class="mb4 fc-black-800 fs-body3">
<a href="/jobs/185876/senior-software-engineer-frontend-deepfield-networks?a=10kTLndJTsqc&amp;so=i&amp;pg=1&amp;offset=0&amp;total=163&amp;so_medium=Internal&amp;so_source=JobSearch&amp;q=python" title="Senior Software Engineer (Frontend)" class="s-link stretched-link">Senior Software Engineer (Frontend)</a>            </h2>

            <h3 class="fc-black-700 fs-body1 mb4">
                <span>Deepfield Networks
                </span>
                •
                <span class="fc-black-500">
Ann Arbor, MI                </span>
            </h3>

1 Ответ

0 голосов
/ 06 августа 2020

Поместите {"data-jobid": True} внутрь .find_all():

import requests
from bs4 import BeautifulSoup

URL = f"https://stackoverflow.com/jobs?q=python&sort%20=i"


def extract_jobs(last_page):
    jobs = []
    for page in range(last_page):
        result = requests.get(f"{URL}&pg={page+1}")
        soup = BeautifulSoup(result.text, "html.parser").find(
            "div", {"class": "listResults"}
        )
        id_list = soup.find_all("div", {"data-jobid": True})  # <-- put :True here
        for id in id_list:
            jobs.append(id)
    return jobs


print(extract_jobs(1))

Распечатайте:

[<div class="-job js-result js-dismiss-overlay-container ps-relative _selected js-selected p12 pl24 _featured" data-beacon-url="" data-jobid="420299" 

...and so on.
...