Как получить текст, включая теги, с помощью python webbot? - PullRequest
0 голосов
/ 25 марта 2020

У меня есть этот текст в HTML файле:

<section id="question-a.1" class="level2">
<p>Blabla</p>
<p>if a==b:</p>
<p>c = True</p>
<p>elif a &gt; b+10:</p>
<p>c = True</p>
<p>else:</p>
<p>c = False</p>
</section>

Но когда с Python webbot я пытаюсь получить этот элемент с:

web = Browser()
web.go_to("there")
ques = web.find_elements(id=f"question-a.1")[0].text

Я получил текст в порядке, но без тегов р, но они мне нужны. Есть ли способ получить весь текст внутри тега раздела, включая теги p (или любой другой тег, например, math, et c.)

Спасибо

1 Ответ

1 голос
/ 25 марта 2020

Вы можете использовать атрибут .get_attribute('outerHTML') для получения фактических html выбранных элементов, например

from webbot import Browser
web = Browser()
web.go_to('google.com')
web.find_elements(id="jhp big")[0].get_attribute('outerHTML')

даст

'<a class="gb_Ld gb_od" role="button" tabindex="0" style="color:#ffffff;background-color:#4285F4">مراجعة</a>'

или вы можете использовать innerHTML, чтобы получить внутренний html без тега переноса

web.find_elements(id="fbar")[0].get_attribute('innerHTML') 

даст

<div class="fbar"><div class="b2hzT"><style data-iml="1585133904756">.b0KoTc{color:rgba(0,0,0,.54);padding-right:27px}.Q8LRLc{font-size:15px}.b0KoTc{margin-right:30px;text-align:right}.b2hzT{border-bottom:1px solid #e4e4e4}</style><div class="b0KoTc"><span class="Q8LRLc">مصر</span></div></div><span id="fsr"><a class="Fx4vi" href="https://policies.google.com/privacy?fg=1" ping="/url?sa=t&amp;rct=j&amp;source=webhp&amp;url=https://policies.google.com/privacy%3Ffg%3D1&amp;ved=0ahUKEwiOvK76u7XoAhVEEncKHf5pA1YQ8awCCBE">الخصوصية</a><a class="Fx4vi" href="https://policies.google.com/terms?fg=1" ping="/url?sa=t&amp;rct=j&amp;source=webhp&amp;url=https://policies.google.com/terms%3Ffg%3D1&amp;ved=0ahUKEwiOvK76u7XoAhVEEncKHf5pA1YQ8qwCCBI">البنود</a><span style="display:inline-block;position:relative"><a class="Fx4vi" href="https://www.google.com/preferences?hl=ar" id="fsettl" aria-controls="fsett" aria-expanded="false" aria-haspopup="true" role="button" jsaction="foot.cst" ping="/url?sa=t&amp;rct=j&amp;source=webhp&amp;url=https://www.google.com/preferences%3Fhl%3Dar&amp;ved=0ahUKEwiOvK76u7XoAhVEEncKHf5pA1YQzq0CCBM">الإعدادات</a><span id="fsett" aria-labelledby="fsettl" role="menu" style="display:none"><a href="https://www.google.com/preferences?hl=ar&amp;fg=1" role="menuitem">إعدادات البحث</a><a href="/advanced_search?hl=ar&amp;fg=1" role="menuitem">بحث متقدم</a><a href="/history/privacyadvisor/search/unauth?utm_source=googlemenu&amp;fg=1" role="menuitem">بياناتك في خدمة "بحث"</a><a href="/history/optout?hl=ar&amp;fg=1" role="menuitem">السجلّ</a><a href="//support.google.com/websearch/?p=ws_results_help&amp;hl=ar&amp;fg=1" role="menuitem">مساعدة البحث</a><a href="#" data-bucket="websearch" role="menuitem" id="dk2qOd" target="_blank" jsaction="gf.sf" data-ved="0ahUKEwiOvK76u7XoAhVEEncKHf5pA1YQLggU">إرسال تعليقات</a></span></span></span><span id="fsl"><a class="Fx4vi" href="https://www.google.com/intl/ar_eg/ads/?subid=ww-ww-et-g-awa-a-g_hpafoot1_1!o2&amp;utm_source=google.com&amp;utm_medium=referral&amp;utm_campaign=google_hpafooter&amp;fg=1" ping="/url?sa=t&amp;rct=j&amp;source=webhp&amp;url=https://www.google.com/intl/ar_eg/ads/%3Fsubid%3Dww-ww-et-g-awa-a-g_hpafoot1_1!o2%26utm_source%3Dgoogle.com%26utm_medium%3Dreferral%26utm_campaign%3Dgoogle_hpafooter%26fg%3D1&amp;ved=0ahUKEwiOvK76u7XoAhVEEncKHf5pA1YQkdQCCBU">الإعلانات</a><a class="Fx4vi" href="https://www.google.com/services/?subid=ww-ww-et-g-awa-a-g_hpbfoot1_1!o2&amp;utm_source=google.com&amp;utm_medium=referral&amp;utm_campaign=google_hpbfooter&amp;fg=1" ping="/url?sa=t&amp;rct=j&amp;source=webhp&amp;url=https://www.google.com/services/%3Fsubid%3Dww-ww-et-g-awa-a-g_hpbfoot1_1!o2%26utm_source%3Dgoogle.com%26utm_medium%3Dreferral%26utm_campaign%3Dgoogle_hpbfooter%26fg%3D1&amp;ved=0ahUKEwiOvK76u7XoAhVEEncKHf5pA1YQktQCCBY">الأعمال</a><a class="Fx4vi" href="https://about.google/?utm_source=google-EG&amp;utm_medium=referral&amp;utm_campaign=hp-footer&amp;fg=1" ping="/url?sa=t&amp;rct=j&amp;source=webhp&amp;url=https://about.google/%3Futm_source%3Dgoogle-EG%26utm_medium%3Dreferral%26utm_campaign%3Dhp-footer%26fg%3D1&amp;ved=0ahUKEwiOvK76u7XoAhVEEncKHf5pA1YQkNQCCBc">حول</a><a class="Fx4vi" href="//google.com/search/howsearchworks/?fg=1">  آلية عمل "بحث Google" </a></span></div>

против outerHTML даст

<div class="EvHmz hRvfYe" id="fbar"><div class="fbar"><div class="b2hzT"><style data-iml="1585133904756">.b0KoTc{color:rgba(0,0,0,.54);padding-right:27px}.Q8LRLc{font-size:15px}.b0KoTc{margin-right:30px;text-align:right}.b2hzT{border-bottom:1px solid #e4e4e4}</style><div class="b0KoTc"><span class="Q8LRLc">مصر</span></div></div><span id="fsr"><a class="Fx4vi" href="https://policies.google.com/privacy?fg=1" ping="/url?sa=t&amp;rct=j&amp;source=webhp&amp;url=https://policies.google.com/privacy%3Ffg%3D1&amp;ved=0ahUKEwiOvK76u7XoAhVEEncKHf5pA1YQ8awCCBE">الخصوصية</a><a class="Fx4vi" href="https://policies.google.com/terms?fg=1" ping="/url?sa=t&amp;rct=j&amp;source=webhp&amp;url=https://policies.google.com/terms%3Ffg%3D1&amp;ved=0ahUKEwiOvK76u7XoAhVEEncKHf5pA1YQ8qwCCBI">البنود</a><span style="display:inline-block;position:relative"><a class="Fx4vi" href="https://www.google.com/preferences?hl=ar" id="fsettl" aria-controls="fsett" aria-expanded="false" aria-haspopup="true" role="button" jsaction="foot.cst" ping="/url?sa=t&amp;rct=j&amp;source=webhp&amp;url=https://www.google.com/preferences%3Fhl%3Dar&amp;ved=0ahUKEwiOvK76u7XoAhVEEncKHf5pA1YQzq0CCBM">الإعدادات</a><span id="fsett" aria-labelledby="fsettl" role="menu" style="display:none"><a href="https://www.google.com/preferences?hl=ar&amp;fg=1" role="menuitem">إعدادات البحث</a><a href="/advanced_search?hl=ar&amp;fg=1" role="menuitem">بحث متقدم</a><a href="/history/privacyadvisor/search/unauth?utm_source=googlemenu&amp;fg=1" role="menuitem">بياناتك في خدمة "بحث"</a><a href="/history/optout?hl=ar&amp;fg=1" role="menuitem">السجلّ</a><a href="//support.google.com/websearch/?p=ws_results_help&amp;hl=ar&amp;fg=1" role="menuitem">مساعدة البحث</a><a href="#" data-bucket="websearch" role="menuitem" id="dk2qOd" target="_blank" jsaction="gf.sf" data-ved="0ahUKEwiOvK76u7XoAhVEEncKHf5pA1YQLggU">إرسال تعليقات</a></span></span></span><span id="fsl"><a class="Fx4vi" href="https://www.google.com/intl/ar_eg/ads/?subid=ww-ww-et-g-awa-a-g_hpafoot1_1!o2&amp;utm_source=google.com&amp;utm_medium=referral&amp;utm_campaign=google_hpafooter&amp;fg=1" ping="/url?sa=t&amp;rct=j&amp;source=webhp&amp;url=https://www.google.com/intl/ar_eg/ads/%3Fsubid%3Dww-ww-et-g-awa-a-g_hpafoot1_1!o2%26utm_source%3Dgoogle.com%26utm_medium%3Dreferral%26utm_campaign%3Dgoogle_hpafooter%26fg%3D1&amp;ved=0ahUKEwiOvK76u7XoAhVEEncKHf5pA1YQkdQCCBU">الإعلانات</a><a class="Fx4vi" href="https://www.google.com/services/?subid=ww-ww-et-g-awa-a-g_hpbfoot1_1!o2&amp;utm_source=google.com&amp;utm_medium=referral&amp;utm_campaign=google_hpbfooter&amp;fg=1" ping="/url?sa=t&amp;rct=j&amp;source=webhp&amp;url=https://www.google.com/services/%3Fsubid%3Dww-ww-et-g-awa-a-g_hpbfoot1_1!o2%26utm_source%3Dgoogle.com%26utm_medium%3Dreferral%26utm_campaign%3Dgoogle_hpbfooter%26fg%3D1&amp;ved=0ahUKEwiOvK76u7XoAhVEEncKHf5pA1YQktQCCBY">الأعمال</a><a class="Fx4vi" href="https://about.google/?utm_source=google-EG&amp;utm_medium=referral&amp;utm_campaign=hp-footer&amp;fg=1" ping="/url?sa=t&amp;rct=j&amp;source=webhp&amp;url=https://about.google/%3Futm_source%3Dgoogle-EG%26utm_medium%3Dreferral%26utm_campaign%3Dhp-footer%26fg%3D1&amp;ved=0ahUKEwiOvK76u7XoAhVEEncKHf5pA1YQkNQCCBc">حول</a><a class="Fx4vi" href="//google.com/search/howsearchworks/?fg=1">  آلية عمل "بحث Google" </a></span></div></div>
...