Question

Я хотел бы почистить некоторые цитаты и авторов, но не нашел способа отделить цитату от автора во время очистки.

import requests
from bs4 import BeautifulSoup

#url = 'https://www.goodreads.com/quotes'
#r = requests.get(url)
#soup = BeautifulSoup(r.content, 'html.parser')

html = """
       <div class="quoteText">&ldquo;Insanity is doing the same thing, over and over again, but expecting different results.&rdquo; <br>  &#8213;
       <span class="authorOrTitle">Narcotics Anonymous</span>
       </div>
"""

soup = BeautifulSoup(html, 'html.parser')

quotes = soup.find_all('div', {'class': 'quoteText'})

for quote in quotes:
    if quote.text is not None:
        print(quote.text)

Andersson · Answer 1 · 18 января 2019

Вы можете попробовать использовать stripped_strings свойство:

for quote in quotes:
    if quote.text is not None:
        strings = [string for string in quote.stripped_strings]
        quote_body = strings[0]
        quote_author = strings[2]
        print(quote_body) 
        print(quote_author)

madik_atma · Answer 2 · 18 января 2019

import requests
from bs4 import BeautifulSoup

#url = 'https://www.goodreads.com/quotes'
#r = requests.get(url)
#soup = BeautifulSoup(r.content, 'html.parser')

html = """
       <div class="quoteText">&ldquo;Insanity is doing the same thing, over and over again, but expecting different results.&rdquo; <br>  &#8213;
       <span class="authorOrTitle">Narcotics Anonymous</span>
       </div>
"""

soup = BeautifulSoup(html, 'html.parser')

quotes = soup.find_all('div', {'class': 'quoteText'})

for quote in quotes:
    if quote.text is not None:
        quote_ = quote.text
        quote_data = quote_.split(" ―")
        quote_without_author = quote_data[0]
        quote_author = quote_data[1]
        print(quote_without_author.strip())
        print(quote_author.strip())

Вы можете разделить данные на -, как и элемент [0] вашей цитаты и [1] вашего автора.

Выход:

“Insanity is doing the same thing, over and over again, but expecting different results.”
Narcotics Anonymous

BeautifulSoup, выберите текст для извлечения

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 2 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

BeautifulSoup, выберите текст для извлечения

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 2 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Нет похожих вопросов