Извлечение двух предложений в теге Paragraph с использованием python - PullRequest
0 голосов
/ 13 января 2020

В каждом paragraph tag я извлекаю свой местный язык в list, как я могу извлечь значение и перевод в другой список

from bs4 import BeautifulSoup
import re


html = """
[<div class="excerpt">
 <p>A ki i fi ara eni se oogun alokunna. Translation: One does not use oneself as an ingredient in a medicine requiring that the ingredients be pulverized. Meaning; Self-preservation is a compulsory project for all.</p>
 </div>, <div class="excerpt">
 <p>A ki i fi ai-mo-we mookun. Translation: One does not dive under water without knowing how to swim. Meaning: Never engage in a project for which you lack the requisite skills.</p>
 </div>, <div class="excerpt">
 <p>A fun o lobe o tami si; o gbon ju olobe lo. Translation: You are given some stew and you add water; you must be wiser than the cook. Meaning: Adding water is a means of stretching stew. A person who thus stretches the stew he or she is given would seem to know better than the person who served it how much would suffice for the meal.</p>
 </div>] 
       """

soup = BeautifulSoup(html,'html.parser')

yoruba = []
translation = []
meaning = []
for i in soup5.findAll("div",'excerpt'):
    a = i.get_text(strip=True).split('Translation')[0].strip().replace('\xa0',' ')
    yoruba.append(a)

1 Ответ

0 голосов
/ 13 января 2020

Для этого можно использовать регулярное выражение и некоторые манипуляции со строками.

Попробуйте этот код.

    html = """
[<div class="excerpt">
 <p>A ki i fi ara eni se oogun alokunna. Translation: One does not use oneself as an ingredient in a medicine requiring that the ingredients be pulverized. Meaning; Self-preservation is a compulsory project for all.</p>
 </div>, <div class="excerpt">
 <p>A ki i fi ai-mo-we mookun. Translation: One does not dive under water without knowing how to swim. Meaning: Never engage in a project for which you lack the requisite skills.</p>
 </div>, <div class="excerpt">
 <p>A fun o lobe o tami si; o gbon ju olobe lo. Translation: You are given some stew and you add water; you must be wiser than the cook. Meaning: Adding water is a means of stretching stew. A person who thus stretches the stew he or she is given would seem to know better than the person who served it how much would suffice for the meal.</p>
 </div>] 
       """

soup = BeautifulSoup(html,'html.parser')

yoruba = []
translation = []
meaning = []
for i in soup.findAll("div",'excerpt'):
    for item in i.find_all('p'):

        data=re.sub(r'Translation:\s*', '', item.get_text(strip=True))
        translation.append(data.split('.')[1].strip())
        data1=re.sub(r'Meaning?\s*', '', data)
        if ':' in data1:
            meaning.append(data1.split(':')[-1].strip())
        if (';' in data1) and (':' not in data1) :
            meaning.append(data1.split(';')[-1].strip())

print(translation)
print(meaning)

Вывод : перевод

['One does not use oneself as an ingredient in a medicine requiring that the ingredients be pulverized', 'One does not dive under water without knowing how to swim', 'You are given some stew and you add water; you must be wiser than the cook']

, что означает

['Self-preservation is a compulsory project for all.', 'Never engage in a project for which you lack the requisite skills.', 'Adding water is a means of stretching stew. A person who thus stretches the stew he or she is given would seem to know better than the person who served it how much would suffice for the meal.']
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...