как разобрать текст с <script>html с python - PullRequest
0 голосов
/ 10 апреля 2020

это ответ от request.get (url) .text

<!DOCTYPE html><html lang=en-GB><head><script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
          new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
          j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
          'https://www.googletagmanager.com/gtm.js?id='+i+dl+ '&gtm_auth=DzGdL0-L0v5Zw2pLyxQ5wQ&gtm_preview=env-2&gtm_cookies_win=x';f.parentNode.insertBefore(j,f);
          })(window,document,'script','dataLayer','GTM-NXKZFLP');</script><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1"><link rel=icon type=image/png sizes=32x32 href=/static/img/icons/favicon-32x32.png><link rel=icon type=image/png sizes=16x16 href=/static/img/icons/favicon-16x16.png><!--[if IE]><link rel="shortcut icon" href="/static/img/icons/favicon.ico"><![endif]--><link rel=manifest href=/static/manifest.json><meta name=theme-color content=#f8982d><meta name=apple-mobile-web-app-capable content=yes><meta name=apple-mobile-web-app-status-bar-style content=black><meta name=apple-mobile-web-app-title content=j-force><link rel=apple-touch-icon href=/static/img/icons/apple-touch-icon-152x152.png><link rel=mask-icon href=/static/img/icons/safari-pinned-tab.svg color=#f8982d><meta name=msapplication-TileImage content=/static/img/icons/msapplication-icon-144x144.png><meta name=msapplication-TileColor content=#000000><meta name=description content="Jumia Central Authentication"><link href="https://fonts.googleapis.com/css?family=Roboto:300,400,500,700%7CMaterial+Icons" rel=stylesheet type=text/css><title>Jumia Central Authentication</title><script src="https://www.google.com/recaptcha/api.js?onload=vueRecaptchaApiLoaded&render=explicit" async defer=defer></script><link href=/static/css/app.b4c4f019abb4c965aecbdd1c64642b1c.css rel=stylesheet></head><body style="margin: 0;"><noscript>This is your fallback content in case JavaScript fails to load.</noscript><div id=app style="background-color: #f07838;height: 100vh;display: flex;justify-content: center;;align-items: center;"><svg xmlns=http://www.w3.org/2000/svg xmlns:xlink=http://www.w3.org/1999/xlink width=174 height=30 viewBox="0 0 87 15"><defs><path id=a d="M0 15h87V0H0z"/></defs><g fill=none fill-rule=evenodd><path fill=#FFF d="M10.885 12.757a2.705 2.705 0 0 1-.625-1.108c-.182-.433-.26-.946-.26-1.595V0h2.237v9.892c0 1.19 1.015 1.784 3.045 1.784h3.487c2.004 0 3.019-.595 3.019-1.784V0H24v10.054c0 1.324-.443 2.351-1.3 2.973-.886.649-2.317.973-4.295.973H15.57c-.417 0-.833 0-1.249-.028a9.742 9.742 0 0 1-1.276-.161 5.757 5.757 0 0 1-1.197-.378c-.364-.136-.676-.379-.962-.676zM47 13.868V1.488c0-.452-.109-.824-.355-1.09C46.4.133 46.1 0 45.69 0c-.572 0-.982.212-1.282.664l-7.313 10.334L29.62.664c-.327-.425-.764-.638-1.337-.638-.382 0-.682.107-.927.346C27.109.61 27 .956 27 1.382v12.486h2.347v-9.67l6.248 9.005c.382.532.845.797 1.446.797.273 0 .518-.08.764-.185.245-.134.463-.346.654-.612l6.167-8.953v9.618H47z"/><mask id=b fill=#fff><use xlink:href=#a /></mask><path fill=#FFF d="M49 14h2V0h-2zM61.96 3.01l2.89 4.783h-5.726L61.96 3.01zm4.268 7.121l2.2 3.869H71L63.418.86C63.074.297 62.623 0 62.066 0c-.53 0-.98.296-1.325.887L53 14h2.705l2.147-3.869h8.376zM5.58 7.743c.083 2.185-.494 3.453-1.787 3.967-1.154.457-2.914.62-3.793.673L.027 15c.385-.027 1.43-.135 3.52-.593C6.57 13.76 7.89 11.71 7.89 7.743L8 0H5.635v2.51l-.054 5.233zM80 0a7 7 0 1 1 0 14 7 7 0 0 1 0-14zm0 1.835l-1.573 3.224-3.565.499 2.568 2.516-.602 3.515L80 9.937l3.172 1.652-.603-3.515 2.57-2.516-3.54-.499L80 1.835z" mask=url(#b) /></g></svg></div> <script>window.csrfToken = 'g9dlVS6D-8KFzc_wWQrlJIXe4NfSKPMzKEQc';</script><script type=text/javascript src=/static/js/manifest.45c399eced566aee080e.js></script><script type=text/javascript src=/static/js/vendor.5470502edae47e7f99b6.js></script><script type=text/javascript src=/static/js/app.82d04e8a2afed59eb21f.js></script></body></html>

Я хочу проанализировать эту переменную (window.csrfToken)

1 Ответ

2 голосов
/ 10 апреля 2020
soup = BeautifulSoup(html, 'html.parser')

script = soup.findAll("script")[2].text

print(script.split("'")[1])

Вывод:

g9dlVS6D-8KFzc_wWQrlJIXe4NfSKPMzKEQc

Или

script = [item.text for item in soup.findAll(
    "script") if "window.csrfToken" in item.text]

print(script[0].split("'")[1])

Вывод:

g9dlVS6D-8KFzc_wWQrlJIXe4NfSKPMzKEQc

Или использование re при условии, что HTML string или вы можете преобразовать его в строку с помощью str(html):

import re
target = re.search(r"window.csrfToken = '(.+)'", html).group(1)

print(target)

Вывод:

g9dlVS6D-8KFzc_wWQrlJIXe4NfSKPMzKEQc
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...