Question

html="""<div class="practice-location">
<strong>Primary Location of Practice</strong><br/>
                        Suite 100<br/>2010 Eglinton Avenue West<br/>TorontoÂ ONÂ Â M6E 2K3<br/><strong>
</div>"""

У меня проблема с извлечением адреса.

Я хочу, чтобы строка выглядела как

mystr=Suite 100,2010 Eglinton Avenue West, TorontoÂ ONÂ Â M6E 2K3

Мой код:

   dt = soup.find(class_ ={"practice-location"})
   print dt
   ele=dt.find_all('strong')
   print ele
   add=[]
   for x in ele.find_next_siblings(text=True):
     add.append(x.text)
   location=','.join(add)
   print location

chitown88 · Answer 1 · 28 ноября 2018

вы могли бы просто сделать .text или .extract, но я думал, что вы хотите, чтобы они были отделены с ','

, это сделает это.

from bs4 import BeautifulSoup, Tag   



def split_at_br(text):
    string = ''
    for x in text:

        if isinstance(x, str) and '\n' not in x:
            string += x

        if isinstance(x, str) and '\n' in x:
            x = x.split('\n')
            x_temp = []
            for ele in x:
                ele = ele.strip()
                x_temp.append(ele)
            x = ' '.join(x_temp)
            x = x.strip()
            string += x

        if isinstance(x, Tag):
            if x.name != 'br':
                x = x.text
                string += x
            else:
                x = ','
                string += x

    string = string[:-2].strip()
    return string

дает вывод:

html="""<div class="practice-location">
<strong>Primary Location of Practice</strong><br/>
                        Suite 100<br/>2010 Eglinton Avenue West<br/>TorontoÂ ONÂ Â M6E 2K3<br/><strong>
</div>"""

soup = BeautifulSoup(html, 'html.parser')

text = soup.select('div.practice-location')
text = text[0].contents

mystr = split_at_br(text)

затем

In [1]: print (mystr)
Primary Location of Practice,Suite 100,2010 Eglinton Avenue West,TorontoÂ ONÂ Â M6E 2K3

ewwink · Answer 2 · 28 ноября 2018

используйте .extract() для удаления тега и .replace_with для замены тега

from bs4 import BeautifulSoup

html="""<div class="practice-location">
<strong>Primary Location of Practice</strong><br/>
                        Suite 100<br/>2010 Eglinton Avenue West<br/>TorontoÂ ONÂ Â M6E 2K3<br/><strong>
</div>"""

soup = BeautifulSoup(html, 'html.parser')
dt = soup.find(class_ ={"practice-location"})
# remove "strong" here
dt.strong.extract()
for br in dt.select('br'):
    br.replace_with(', ')
print(dt.text.strip().strip(',').strip())

# Suite 100, 2010 Eglinton Avenue West, TorontoÂ ONÂ Â M6E 2K3

примерно в 3x strip(), после замены <br> на , будет получена строка

, 
                    Suite 100, 2010 Eglinton Avenue West, TorontoÂ ONÂ Â M6E 2K3,

первый .strip() удалить пробел и новую строку, второй удалить запятую, а третий заменить снова пробел и символ новой строки.

красивая строка разбора супа python, которая имеет только закрывающий тег br

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 2 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

красивая строка разбора супа python, которая имеет только закрывающий тег br

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 2 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Нет похожих вопросов