Я очищаю несколько ссылок с BeautifulSoap.
Вот соответствующая часть исходного кода URL, который я отправляю:
<div class="description">
Planet Nine was initially proposed to explain the clustering of orbits
Of Planet Nine's other effects, one was unexpected, the perpendicular orbits, and the other two were found after further analysis. Although other mechanisms have been offered for many of these peculiarities, the gravitational influence of Planet Nine is the only one that explains all four.
</div>
Вот мой код BeautifulSoap (соответствующая частьтолько), чтобы получить текст в тегах description
:
quote_page = sys.argv[1]
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
description_box = soup.find('div', {'class':'description'})
description = description_box.get_text(separator=" ").strip()
print description
Запуск сценария с использованием python script.py https://example.com/page/2000 дает следующий вывод:
Planet Nine was initially proposed to explain the clustering of orbits
Of Planet Nine's other effects, one was unexpected, the perpendicular orbits, and the other two were found after further analysis. Although other mechanisms have been offered for many of these peculiarities, the gravitational influence of Planet Nine is the only one that explains all four.
Как заменить разрыв строки на точку, за которой следует пробел, чтобы она выглядела следующим образом:
Planet Nine was initially proposed to explain the clustering of orbits. Of Planet Nine's other effects, one was unexpected, the perpendicular orbits, and the other two were found after further analysis. Although other mechanisms have been offered for many of these peculiarities, the gravitational influence of Planet Nine is the only one that explains all four.
Есть идеи, как мне это сделать?