Question

Ошибка при вызове ниже text.strip():

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-278-135ac185ec3f> in <module>
     20         if isinstance(b, Tag):
     21 
---> 22             location = [a.text.strip() for a in b.find('span', attrs = {'class': 'location'})]
     23             job_title = [a.text.strip() for a in b.find('a', attrs = {'data-tn-element':'jobTitle'})]
     24 

TypeError: 'NoneType' object is not iterable

Мой код см. Ниже:

import requests
from bs4 import BeautifulSoup, NavigableString, Tag, Comment
import pandas as pd

    df = pd.DataFrame(columns=["location", 'company', 'job_title', 'salary'])

    for start in range(1,100,10):
        url = 'https://www.indeed.com/jobs?q=python+sql&l=San+Francisco&start={}'

        #format url above to request the various search pages
        new_url = url.format(start)

        #conducting a request of the stated URL above:
        page = requests.get(new_url)

        #specifying a desired format of “page” using the html parser - this allows python to read the various components of the page, rather than treating it as one long string.
        soup = BeautifulSoup(page.text, 'html.parser')

        #loop through the tag elements
        for b in soup.find_all(name = 'div', attrs={'class':'jobsearch-SerpJobCard'}):
            print(type(b))
            if isinstance(b,NavigableString):
                continue
            if isinstance(b, Tag):    

                location = [a.text.strip() for a in b.find('span', attrs = {'class': 'location'})]
                job_title = [a.text.strip() for a in b.find('a', attrs = {'data-tn-element':'jobTitle'})]

                try:
                    company = [a.text.strip() for a in b.find('span', attrs = {'class':'company'})]
                except:
                    company = 'NA'
                try:
                    salary = [a.text.strip() for a in b.find('span', attrs = {'class' : 'salaryText'}).find('nobr')]
                except:
                    salary = 'NA'
                df = df.append({"location":location,"company":company, "job_title": job_title, "salary": salary}, ignore_index=True)

William Clavier · Answer 1 · 28 мая 2020

Не найдено, потому что на странице нет атрибута class, установленного на 'location'. Есть такие, у которых атрибут класса установлен на «местоположение». Вот моя модифицированная версия, все еще не идеальная, так как некоторые места не захвачены. Идея состоит в том, чтобы просто пропустить те, у которых нет работы или местоположения, если эти два параметра необходимы. Вы можете достичь sh этого, заменив действие except, присвоив 'NA' значение continue

import requests
from bs4 import BeautifulSoup, NavigableString, Tag, Comment
import pandas as pd

df = pd.DataFrame(columns=["location", 'company', 'job_title', 'salary'])

for start in range(1,100,10):
    url = 'https://www.indeed.com/jobs?q=python+sql&l=San+Francisco&start={}'

    #format url above to request the various search pages
    new_url = url.format(start)

    #conducting a request of the stated URL above:
    page = requests.get(new_url)

    #specifying a desired format of “page” using the html parser - this allows python to read the various components of the page, rather than treating it as one long string.
    soup = BeautifulSoup(page.text, 'html.parser')

    #loop through the tag elements
    for b in soup.find_all(name = 'div', attrs={'class':'jobsearch-SerpJobCard'}):
        print(type(b))
        if isinstance(b,NavigableString):
            continue
        if isinstance(b, Tag):
            try:
                location = [a.strip() for a in b.find('div', attrs = {'class': 'location'})]
            except TypeError:
                location = 'NA'
            try:
                job_title = [a.strip() for a in b.find('a', attrs = {'data-tn-element':'jobTitle'})]
            except TypeError:
                job_title = 'NA'

            try:
                company = [a.text.strip() for a in b.find('span', attrs = {'class':'company'})]
            except:
                company = 'NA'
            try:
                salary = [a.text.strip() for a in b.find('span', attrs = {'class' : 'salaryText'}).find('nobr')]
            except:
                salary = 'NA'
            df = df.append({"location":location,"company":company, "job_title": job_title, "salary": salary}, ignore_index=True)

Sushanth · Answer 2 · 28 мая 2020

Вам нужно будет добавить проверку для значений None, find вернет None, если элементы не найдены.

location = [a.text.strip() 
            for a in b.find('span', attrs = {'class': 'location'}) 
            if a]

Как мне извлечь текст из элементов тега bs4 в моем коде? Функция использования содержимого не работает

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 2 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Как мне извлечь текст из элементов тега bs4 в моем коде? Функция использования содержимого не работает

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 2 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Похожие темы