Question

<div id="div_1">
    <p class="keywords">
        <strong> Those are the main keywords </strong>
        <ol>
            <li>Decentralization</li>
            <li>Planning</li>
        </ol>
    </p> 
</div>
<div id="div_2">
<p class="keywords">
    <strong>This is the first paragraph of the second div </strong>
    <strong>This is the second paragraph of the second div </strong>
</p> 
</div>
<div id="div_3">
<p> This is the first paragraph of the second div </p> 
</div>

Я хочу проанализировать этот файл html, чтобы у меня был каждый div в строке, что означает следующий вывод:

Those are the main keywords Decentralization Planning
This is the first paragraph of the second div This is the second paragraph of the second div
This is the first paragraph of the third div

Это мой код:

soup = BeautifulSoup (open(document, encoding = "utf8"), "html.parser")
myDivs = soup.findAll("div", id = re.compile("^div_"))
for div in myDivs:
    txt = div.text + "\n"
    print (txt)

Это возвращает мне текст

, но с каждым из его тегов (

, ) в строке

Есть идеи, как мне это сделать?

αԋɱҽԃ αмєяιcαη · Answer 1 · 16 апреля 2020

import re
from bs4 import BeautifulSoup

html = """
<div id="div_1">
    <p class="keywords">
        <strong> Those are the main keywords </strong>
        <ol>
            <li>Decentralization</li>
            <li>Planning</li>
        </ol>
    </p> 
</div>
<div id="div_2">
<p class="keywords">
    <strong>This is the first paragraph of the second div </strong>
    <strong>This is the second paragraph of the second div </strong>
</p> 
</div>
<div id="div_3">
<p> This is the first paragraph of the second div </p> 
</div>
"""

soup = BeautifulSoup(html, 'html.parser')

for item in soup.findAll("div", id=re.compile("^div_")):
    target = [a.get_text(strip=True, separator=" ") for a in item.findAll("p")]
    print(*target)

Выход:

Those are the main keywords Decentralization Planning
This is the first paragraph of the second div This is the second paragraph of the second div
This is the first paragraph of the second div

0m3r · Answer 2 · 16 апреля 2020

Ход Yap для l oop на div> P

https://jsfiddle.net/0m3r_/e7Loa96z/1/

<html>
	<head></head>
		<body>
			<div id="div_1">
				<p class="keywords">
					<strong> Those are the main keywords </strong>
					<ol>
						<li>Decentralization</li>
						<li>Planning</li>
					</ol>
				</p> 
			</div>
			
			
			<div id="div_2">
				<p class="keywords">
					<strong>This is the first paragraph of the second div </strong>
					<strong>This is the second paragraph of the second div </strong>
				</p> 
			</div>
			
			<div id="div_3">
				<p> This is the first paragraph of the second div </p> 
			</div>
		</body>
</html>

from bs4 import BeautifulSoup

url = r"D:\Temp\example.html"

with open(url, "r") as page:
    contents = page.read()
    html = BeautifulSoup(contents, 'html.parser')

    html_body = html.find('body')
    elements = html.find_all('div')

    for div in elements:
        p = div.find_all('p')
        text = [i.text for i in p]
        print(text)

Получить каждый <div>файла html в одной строке

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 2 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Получить каждый <div>файла html в одной строке

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 2 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Нет похожих вопросов