Question

Я хотел бы запустить поиск с использованием селена и нажать кнопку «Дополнительные результаты» в конце поиска DDG.

Поиск DDG больше не показывает кнопку, когда он показывает все результаты для запрос.

Я бы хотел выйти из попытки l oop в случае, если нет кнопки.

Я поделюсь тем, что пытаюсь сейчас сделать. Я также пробовал ранее эти два варианта: If len(button_element) > 0: button_element.click() и я пробовал If button_element is not None: button_element.click().

Я бы хотел, чтобы решение использовало Selenium, чтобы он отображал браузер, потому что это полезно для отладки

Это это мой код с воспроизводимым примером:

    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys
    from selenium.webdriver.chrome.options import Options
    from bs4 import BeautifulSoup

    browser = webdriver.Chrome()        
    browser.get("https://duckduckgo.com/")
    search = browser.find_element_by_name('q')
    search.send_keys("this is a search" + Keys.RETURN)
    html = browser.page_source

    try:
        button_element = browser.find_element_by_class_name('result--more__btn')

        try:
            button_element.click()
        except SystemExit:
            print("No more pages")

    except:
        pass

Andrej Kesely · Answer 1 · 21 июня 2020

Вы можете использовать чистую HTML версию DDG по URL https://duckduckgo.com/html/?q=. Таким образом, вы можете использовать чистый метод requests / beautifulsoup и легко получить все страницы:

import requests
from bs4 import BeautifulSoup


q = '"centre of intelligence"'
url = 'https://duckduckgo.com/html/?q={q}'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}

soup = BeautifulSoup(requests.get(url.format(q=q), headers=headers).content, 'html.parser')

while True:
    for t, a, s in zip(soup.select('.result__title'), soup.select('.result__a'), soup.select('.result__snippet')):
        print(t.get_text(strip=True, separator=' '))
        print(a['href'])
        print(s.get_text(strip=True, separator=' '))
        print('-' * 80)

    f = soup.select_one('.nav-link form')
    if not f:
        break

    data = {}
    for i in f.select('input'):
        if i['type']=='submit':
            continue
        data[i['name']] = i.get('value', '')

    soup = BeautifulSoup(requests.post('https://duckduckgo.com' + f['action'], data=data, headers=headers).content, 'html.parser')

Печать:

Centre Of Intelligence - Home | Facebook
https://www.facebook.com/Centre-Of-Intelligence-937637846300833/
Centre Of Intelligence . 73 likes. Non-profit organisation. Facebook is showing information to help you better understand the purpose of a Page.
--------------------------------------------------------------------------------
centre of intelligence | English examples in context | Ludwig
https://ludwig.guru/s/centre+of+intelligence
(Glasgow was "the centre of the intelligence of England" according to the Grand Duke Alexis, who attended the launch of his father Tsar Alexander II's steam yacht there in 1880).
--------------------------------------------------------------------------------
Chinese scientists who studied bats in Aus at centre of intelligence ...
https://www.youtube.com/watch?v=UhcFXXzf2hc
Intelligence agencies are looking into two Chinese scientists in a bid to learn the true origin of COVID-19. Two Chinese scientists who studied live bats in...
--------------------------------------------------------------------------------

... and so on.

DebanjanB · Answer 2 · 21 июня 2020

Чтобы нажать кнопку Дополнительные результаты в конце результатов поиска duckduck go с использованием Selenium WebDriver , вам необходимо вызвать WebDriverWait для element_to_be_clickable(), и вы можете использовать любую из следующих Стратегий локатора :

Блок кода:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import TimeoutException

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://duckduckgo.com/')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "q"))).send_keys("this is a search" + Keys.RETURN)
while True:
      try:
          WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a.result--more__btn"))).click()
          print("Clicked on More Results button")
      except TimeoutException:
          print("No more More Results button")
          break
driver.quit()

Вывод в консоль:

Clicked on More Results button
Clicked on More Results button
Clicked on More Results button
Clicked on More Results button
Clicked on More Results button
No more More Results button

Вы можете найти соответствующее обсуждение в Как извлечь текст из результатов поиска утки go с использованием селена Python

Hexception · Answer 3 · 21 июня 2020

Используйте WebDriverWait, чтобы подождать, пока не появится дополнительная кнопка

wait = WebDriverWait(browser, 15) # 15 seconds timeout 
wait.until(expected_conditions.visibility_of_element_located((By.CLASS_NAME, 'result--more__btn')))

В этом примере кода кнопка нажимается до тех пор, пока больше не будет кнопки для chrome replace firefox с chrome

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions

browser = webdriver.Firefox()        
browser.get("https://duckduckgo.com/")
search = browser.find_element_by_name('q')
search.send_keys("this is a search" + Keys.RETURN)

while True:
    try:
        wait = WebDriverWait(browser, 15) # 15 seconds timeout
        wait.until(expected_conditions.visibility_of_element_located((By.CLASS_NAME, 'result--more__btn')))

        button_element = browser.find_element_by_class_name('result--more__btn')
        button_element.click()
    except:
        break

Как справиться с ошибкой из попытки l oop с использованием Selenium и Python

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 3 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Как справиться с ошибкой из попытки l oop с использованием Selenium и Python

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 3 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Похожие темы