Не удается получить исходный код страницы после выполнения JavaScript - PullRequest
0 голосов
/ 25 мая 2018

Я столкнулся с ситуацией, когда по какой-то причине я не могу получить исходный код страницы после выполнения JavaScript:

#!/usr/bin/python

from selenium import webdriver
import time

driver = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true',
                                            '--ssl-protocol=any'])
driver.set_window_size(1124, 850)

driver.get('https://semanticscholar.org/search?q=The+iterative+deepening+A*')
time.sleep(20)
print driver.page_source.encode('utf-8')

Раньше у меня была стратегия ожидания в моем коде, но я переключился на простой сондля этого минимального примера.

Есть ли что-то особенное на странице, источник которой я пытаюсь прочитать?

РЕДАКТИРОВАТЬ: Интересно, что он попытался использовать Chrome без головы вместо PhantomJSи это сработало!Вот код:

#!/usr/bin/python

import os  
from selenium import webdriver  
from selenium.webdriver.common.keys import Keys  
from selenium.webdriver.chrome.options import Options
import time

chrome_options = Options()  
chrome_options.add_argument("--headless")  
chrome_options.binary_location = '/usr/bin/google-chrome'    

driver = webdriver.Chrome(executable_path=os.path.abspath("chromedriver"),   chrome_options=chrome_options)  
driver.set_window_size(1124, 850)

driver.get('https://semanticscholar.org/search?q=The+iterative+deepening+A*')
time.sleep(20)
print driver.page_source.encode('utf-8')

1 Ответ

0 голосов
/ 25 мая 2018

В соответствии с вашими подробностями вы видите мои наблюдения:

  • Chrome без головы:

    • Блок кода:

      # -*- coding: UTF-8 -*-
      
      from selenium import webdriver  
      import sys,time
      
      options = webdriver.ChromeOptions() 
      options.add_argument('--headless')
      options.add_argument("start-maximized")
      options.add_argument('disable-infobars')
      driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
      driver.get('https://semanticscholar.org/search?q=The+iterative+deepening+A*')
      time.sleep(20)
      print (driver.page_source.encode('utf-8'))  
      
    • Выход на консоль:

b'
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">

<head lang="en">\n
  <title>The iterative deepening A* - Semantic Scholar</title>\n
  <meta name="robots" content="noarchive" />\n \n
  <meta charset="utf-8" />\n
  <meta name="s2-ui-version" content="ba8c638c99e69c67adcdd27274a81e822503dded" />\n
  <meta name="description" content="An academic search engine that utilizes artificial intelligence methods to provide highly relevant results and novel tools to filter them with ease." />\n
  <meta name="twitter:description" content="An academic search engine that utilizes artificial intelligence methods to provide highly relevant results and novel tools to filter them with ease." />\n
  <meta property="og:description" content="An academic search engine that utilizes artificial intelligence methods to provide highly relevant results and novel tools to filter them with ease." />\n
  <meta property="og:title" content="The iterative deepening A* - Semantic Scholar" />\n
  <meta name="twitter:title" content="The iterative deepening A* - Semantic Scholar" />\n
  <meta property="og:image" content="https://www.semanticscholar.org/img/semantic_scholar_og.jpg" />\n
  <meta property="og:image:secure_url" content="https://www.semanticscholar.org/img/semantic_scholar_og.jpg" />\n
  <meta property="og:image:width" content="1110" />\n
  <meta property="og:image:height" content="582" />\n
  <meta name="twitter:image" content="https://www.semanticscholar.org/img/semantic_scholar_og.jpg" />\n
  <meta property="og:type" content="website" />\n
  <meta property="og:locale" content="en_US" />\n
  <meta name="twitter:card" content="summary_large_image" />\n
  <meta name="twitter:site" content="@allenai_org" />\n
  <link rel="icon" href="/img/favicon.png" sizes="32x32" />\n
  <link href="https://dab4rbh62k56j.cloudfront.net/css/main.cb86592cd7.css" rel="stylesheet" />\n
  <script type="text/javascript" src="https://bam.nr-data.net/1/a59e40bc78?a=21497303&amp;sa=1&amp;v=974.7d740e1&amp;t=Unnamed%20Transaction&amp;rst=3526&amp;ref=https://www.semanticscholar.org/search&amp;be=2476&amp;fe=291&amp;dc=137&amp;af=err,xhr,stn,ins,spa&amp;perf=%7B%22timing%22:%7B%22of%22:1527254077256,%22n%22:0,%22f%22:1749,%22dn%22:1749,%22dne%22:1749,%22c%22:1749,%22ce%22:1749,%22rq%22:1363,%22rp%22:1742,%22rpe%22:2025,%22dl%22:1751,%22di%22:2612,%22ds%22:2612,%22de%22:2612,%22dc%22:2766,%22l%22:2766,%22le%22:2768%7D,%22navigation%22:%7B%7D%7D&amp;jsonp=NREUM.setToken"></script>
  <script src="https://js-agent.newrelic.com/nr-spa-974.min.js"></script>
  <script type="text/javascript" async="" src="https://cdn.heapanalytics.com/js/heap-2424575119.js"></script>
  <script async="" src="//www.google-analytics.com/analytics.js"></script>
  <script src="https://cdn.polyfill.io/v2/polyfill.min.js?features=Promise,Array.from,Array.prototype.find,Object.values&amp;flags=gated"></script>\n
  <script>
    \
    n(function(i, s, o, g, r, a, m) {
          i[\'GoogleAnalyticsObject\']=r;i[r]=i[r]||function(){\n      (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),\n      m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)\n      })(window,document,\'script\',\'//www.google-analytics.com/analytics.js\',\'ga\');\n      ga(\'create\', \'UA-67668211-2\', \'auto\', { \'allowLinker\': true });\n      ga(\'require\', \'linker\');\n      ga(\'linker:autoLink\', [\'pdfs.semanticscholar.org\']);\n
  </script>\n
  <!-- Heap Analytics Snippet -->\n
  <script type="text/javascript">
    \
    n window.heap = window.heap || [], heap.load = function(e, t) {
        window.heap.appid = e, window.heap.config = t = t || {};
        var r = t.forceSSL || "https:" === document.location.protocol,
          a = document.createElement("script");
        a.type = "text/javascript", a.async = !0, a.src = (r ? "https:" : "http:") + "//cdn.heapanalytics.com/js/heap-" + e + ".js";
  • PhantomJS:

    • Кодовый блок:

      # -*- coding: UTF-8 -*-
      
      from selenium import webdriver  
      import sys,time
      
      driver = webdriver.PhantomJS(executable_path=r'C:\Utility\phantomjs-2.1.1-windows\bin\phantomjs.exe', service_args=['--ignore-ssl-errors=true','--ssl-protocol=any'])
      driver.set_window_size(1124, 850)
      driver.get('https://semanticscholar.org/search?q=The+iterative+deepening+A*')
      time.sleep(20)
      print (driver.page_source.encode('utf-8'))  
      
    • Консольный вывод:

b'
<!DOCTYPE html>
<html>

<head lang="en">\n
  <title>The iterative deepening A* - Semantic Scholar</title>\n
  <meta name="robots" content="noarchive">\n \n
  <meta charset="utf-8">\n
  <meta name="s2-ui-version" content="ba8c638c99e69c67adcdd27274a81e822503dded">\n
  <meta name="description" content="An academic search engine that utilizes artificial intelligence methods to provide highly relevant results and novel tools to filter them with ease.">\n
  <meta name="twitter:description" content="An academic search engine that utilizes artificial intelligence methods to provide highly relevant results and novel tools to filter them with ease.">\n
  <meta property="og:description" content="An academic search engine that utilizes artificial intelligence methods to provide highly relevant results and novel tools to filter them with ease.">\n
  <meta property="og:title" content="The iterative deepening A* - Semantic Scholar">\n
  <meta name="twitter:title" content="The iterative deepening A* - Semantic Scholar">\n
  <meta property="og:image" content="https://www.semanticscholar.org/img/semantic_scholar_og.jpg">\n
  <meta property="og:image:secure_url" content="https://www.semanticscholar.org/img/semantic_scholar_og.jpg">\n
  <meta property="og:image:width" content="1110">\n
  <meta property="og:image:height" content="582">\n
  <meta name="twitter:image" content="https://www.semanticscholar.org/img/semantic_scholar_og.jpg">\n
  <meta property="og:type" content="website">\n
  <meta property="og:locale" content="en_US">\n
  <meta name="twitter:card" content="summary_large_image">\n
  <meta name="twitter:site" content="@allenai_org">\n
  <link rel="icon" href="/img/favicon.png" sizes="32x32">\n
  <link href="https://dab4rbh62k56j.cloudfront.net/css/main.cb86592cd7.css" rel="stylesheet">\n
  <script type="text/javascript" src="https://bam.nr-data.net/1/a59e40bc78?a=21497303&amp;sa=1&amp;v=974.7d740e1&amp;t=Unnamed%20Transaction&amp;rst=4682&amp;ref=https://www.semanticscholar.org/search&amp;be=3584&amp;fe=398&amp;dc=93&amp;af=err,xhr,ins,spa&amp;perf=%7B%22timing%22:%7B%22of%22:1527254324796,%22n%22:0,%22f%22:1428,%22dn%22:1428,%22dne%22:1428,%22c%22:1428,%22ce%22:1428,%22rq%22:1428,%22rp%22:1428,%22rpe%22:3139,%22dl%22:2811,%22di%22:3677,%22ds%22:3677,%22de%22:3677,%22dc%22:3981,%22l%22:3981,%22le%22:3983%7D,%22navigation%22:%7B%7D%7D&amp;jsonp=NREUM.setToken"></script>
  <script src="https://js-agent.newrelic.com/nr-spa-974.min.js"></script>
  <script type="text/javascript" async="" src="https://cdn.heapanalytics.com/js/heap-2424575119.js"></script>
  <script async="" src="//www.google-analytics.com/analytics.js"></script>
  <script src="https://cdn.polyfill.io/v2/polyfill.min.js?features=Promise,Array.from,Array.prototype.find,Object.values&amp;flags=gated"></script>\n
  <script>
    \
    n(function(i, s, o, g, r, a, m) {
          i[\'GoogleAnalyticsObject\']=r;i[r]=i[r]||function(){\n      (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),\n      m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)\n      })(window,document,\'script\',\'//www.google-analytics.com/analytics.js\',\'ga\');\n      ga(\'create\', \'UA-67668211-2\', \'auto\', { \'allowLinker\': true });\n      ga(\'require\', \'linker\');\n      ga(\'linker:autoLink\', [\'pdfs.semanticscholar.org\']);\n
  </script>\n
  <!-- Heap Analytics Snippet -->\n
  <script type="text/javascript">
    \
    n window.heap = window.heap || [], heap.load = function(e, t) {
        window.heap.appid = e, window.heap.config = t = t || {};
        var r = t.forceSSL || "https:" === document.location.protocol,
          a = document.createElement("script");
        a.type = "text/javascript", a.async = !0, a.src = (r ? "https:" : "http:") + "//cdn.heapanalytics.com/js/heap-" + e + ".js";
        var n = document.getElementsByTagName("script")[0];
        n.parentNode.insertBefore(a, n);

Заключение

Хотя существует некоторая разница между Page Source , возвращаемым через ChromeDriver и PhantomJSDriver , но оба варианта WebDriver предоставляют соответствующий Источник страницы .

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...