Получение нескольких классов из html в scrapy - PullRequest
0 голосов
/ 16 июня 2020

У меня есть html вот так

<div id="ctl00_ContentPlaceHolder1_pnlRequirement" class="BlockContent">

            <h2 class="Header">Requirements for Canadian Students</h2>
            <div class="Content"><p></p><p>For admission to King's University College, applicants will have completed their Ontario Secondary School Diploma (OSSD) with at least six Grade 12U/M courses including Grade 12U English. Students applying from other provinces in Canada can contact the Office of Enrolment Services at King's or review our website: <a target="_blank" href="https://www.kings.uwo.ca/future-students/admissions/admission-requirements/high-school/">https://www.kings.uwo.ca/future-students/admissions/admission-requirements/high-school/</a>. The minimum grade average required for most programs is 79%.</p><p></p></div>

</div>
<div id="ctl00_ContentPlaceHolder1_pnlIRequirement" class="BlockContent">

            <h2 class="Header">Requirements for International Students</h2>
            <div class="Content"><p></p><p>Admissions requirements will vary by country curriculum. Please refer to <a target="_blank" href="https://www.kings.uwo.ca/future-students/admissions/admission-requirements/international-students/">https://www.kings.uwo.ca/future-students/admissions/admission-requirements/international-students/</a>. If you do not see your curriculum listed, please contact the Office of Enrolment Services directly at <a target="_blank" href="https://www.kings.uwo.ca/">kings.uwo.ca</a> or by phone at (519) 433-3491. Applicants will also be required to provide proof of English language proficiency (ELP) if English is not their first language. Please refer to our website for ELP requirements:  <a target="_blank" href="https://www.kings.uwo.ca/future-students/admissions/admission-requirements/english-proficiency/">https://www.kings.uwo.ca/future-students/admissions/admission-requirements/english-proficiency/</a>.</p><p></p></div>

</div>

Я пытаюсь получить весь текст в таком контенте, как это ниже, в python, но полностью потерял то, как он будет работать. Любая помощь приветствуется. Большое спасибо!

Requirements_for_Canadian_Students=''.join(response.css("#ctl00_ContentPlaceHolder1_pnlRequirement .Content *::text").getall())
           Requirements_for_International_Students=''.join(response.css("#ctl00_ContentPlaceHolder1_pnlRequirement .Content *::text").getall())

1 Ответ

1 голос
/ 16 июня 2020

Как насчет использования XPath и string() функции:

Requirements_for_Canadian_Students = response.xpath('string(//h2[.="Requirements for Canadian Students"]/following-sibling::div[1])').get()
Requirements_for_International_Students= response.xpath('string(//h2[.="Requirements for International Students"]/following-sibling::div[1])').get()
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...