Причина, по которой вы не получаете абзац, заключается в следующем:
if '\n' not in text:
Требуемый абзац:
'Ayurveda is considered by many scholars to be the oldest healing science. In Sanskrit, Ayurveda means “The Science of Life.” Ayurvedic knowledge originated\n in India more than 5,000 years ago and is often called the “Mother of All Healing.” It stems from the ancient Vedic culture and was taught for many\n thousands of years in an oral tradition from accomplished masters to their disciples. Some of this knowledge was set to print a few thousand years\n ago, but much of it is inaccessible. The principles of many of the natural healing systems now familiar in the West have their roots in Ayurveda, including\n Homeopathy and Polarity Therapy.'
HAS \n
, поэтому он не добавляетсяэтот текст к вашему tdlist
.Когда вы используете .strip()
, он удалит только эти новые строки и пробелы в начале и конце строки.Поэтому вам нужно найти другое условие.
Таким образом, вы можете просто добавить дополнительное условие, которое захватывает тот конкретный контент, который следует за тегом <p class="bitter">
Я предполагаю, что все ссылкиследуя этому формату.
Измените свою функцию:
def get_text(value):
tdlist = []
for i in soup.findAll(value): # Reduce data to those with html tag
if i.text != "":
text = i.text
text = text.strip()
if '\n' not in text or i.find_previous(value).attrs == {'class': ['bitter']}: # Remove unnecessary data
tdlist.append(text)
return tdlist
Вывод:
print (c_list)
['by Vasant Lad, BAM&S, MASc', 'Ayurveda is considered by many scholars to be the oldest healing science. In Sanskrit, Ayurveda means “The Science of Life.” Ayurvedic knowledge originated\n in India more than 5,000 years ago and is often called the “Mother of All Healing.” It stems from the ancient Vedic culture and was taught for many\n thousands of years in an oral tradition from accomplished masters to their disciples. Some of this knowledge was set to print a few thousand years\n ago, but much of it is inaccessible. The principles of many of the natural healing systems now familiar in the West have their roots in Ayurveda, including\n Homeopathy and Polarity Therapy.', 'Copyright © 2006, Vasant Lad, MASc, and The Ayurvedic Institute. All Rights Reserved.', 'Copyright © 2006, Vasant Lad, MASc, and The Ayurvedic Institute. All Rights Reserved.']