Я изо всех сил пытаюсь найти метод для раздельного извлечения всего текста между всеми тегами hr, присутствующими в тексте в этом документе:
<html>
<head>
<!--Created 6-11-96 by Dan Axtell-->
<meta content="101 Elementary School Mission Statements compiled from the Web 11 June 1996" name="DESCRIPTION"/>
</head>
<body><a name="TOP">
<h2>101 Elementary School Mission Statements</h2>
11 June 1996
<p>
This list was compiled when the web was young. Most links don't work now. Note that some of these mission statements may be copyrighted. All material was pasted verbatim from the web pages, which accounts for the odd formating.
</p><hr/>
WINDSOR Elementary School,
in partnership with its children, families,
community and Richland District Two,
guarantees each child a superior education
by providing quality instruction and challenging learning experiences
in a safe and orderly environment
which will foster life-long learning and responsible citizenship.
</a>
<a href="http://www.scsn.net/users/rich2/elem/windsor/text.htm">http://www.scsn.net/users/rich2/elem/windsor/text.htm</a>
<hr/>
This We Believe...
Yokayo Elementary School provides a nurturing environment committed to achiving excellence. All students are challenged to
reach their maximum potential by learning at their functional level to provide a solid foundation of skills, knowledge and values.
This foundation enables each student to become a well-educated, productive adult able to cope with an ever changing world.
We believe that all learners must become:
Effective Communicators who will use verbal, written, artistic and technological forms of communication to give,
send, and receive information.
Inspired Learners who are accountable for demonstrating, assessing, and directing their present and life-long
intellectual growth.
Productive Workers who perform collaboratively and independently to create quality products and services that
reflect personal pride and responsiblility.
Responsible Citizens who have a global and multi-cultural perspective, and who take the initiative for improving the
quality of life for self and others.
Resourceful Thinkers who independently and creatively strive to solve complex problems through reflection, risk
taking, and critical evaluation.
<a href="http://happy.yokayo.uusd.k12.ca.us/Goals.html">http://happy.yokayo.uusd.k12.ca.us/Goals.html</a>
<hr/>
University Elementary School
Mission Statement
At University Elementary School, students should be accepted, appreciated, nurtured, and
challanged according to their individual needs.
Through their education at school, students should gain the skills, strategies, and desire
necessary for continued learning. They should also develop a strong sense of responsibility for
themselves and toward each other, their community, and the earth's resources.
To this end, faculty and staff should create a rich multicultural environment for learning; design
an integrated curriculum with strong science, fine arts, and social studies components; provide for
children to become self-directed learners; and share their enthusiasm for learning, in an
atmosphere of mutual respect and appreciation.
<a href="http://www.intersource.com/~wmorales/ue/mission.html">http://www.intersource.com/~wmorales/ue/mission.html</a>
<hr/>
В документе 100 выдержек, и это всего лишь пример. Но форматирование всего остального остается неизменным. Я попытался использовать .nextSibling следующим образом:
for i in soup.find_all('hr'):
print(i.nextSibling)
и получил вывод
WINDSOR Elementary School,
University Elementary School
Altamont Elementary School
...
Как я могу расширить эту функцию, чтобы включить все до следующего тега hr, чтобы я мог извлечь все утверждения, как:
WINDSOR Elementary School,
in partnership with its children, families,
community and Richland District Two,
guarantees each child a superior education
by providing quality instruction and challenging learning experiences
in a safe and orderly environment
which will foster life-long learning and responsible citizenship.
</a>
<a href="http://www.scsn.net/users/rich2/elem/windsor/text.htm">http://www.scsn.net/users/rich2/elem/windsor/text.htm</a>