Question

Это должно быть довольно просто.Я хочу посчитать ссылки, созданные в результате поиска на веб-странице.В этом примере ищите для «gwen stefani» в переполнении стека.На момент написания, количество результатов составляет 15.

import bs4 #  beautiful soup 4
import requests
import webbrowser

url = "https://stackoverflow.com/search?q=gwen+stefani"

myURL = url
webbrowser.open(myURL)

page = requests.get(url).text
r = requests.get(myURL)
html_content = r.text

soup = bs4.BeautifulSoup(html_content, "html.parser")

print soup.title

for link in soup.find_all("a"):
    print(link.get("href"))

Когда ссылки распечатаны, они не содержат ни одного из упомянутых результатов.Я новичок в супе, и я не уверен, где я иду не так в этот момент.

hygull · Answer 1 · 12 ноября 2018

Вы также можете попробовать код ниже, где вам не нужно использовать класс div элемента.

Просто осмотрите страницу и найдите ссылку на класс вопроса.

import bs4 #  beautiful soup 4
import requests
import webbrowser
import json

url = "https://stackoverflow.com/search?q=gwen+stefani"

webbrowser.open(url)

r = requests.get(url)
html_content = r.text

# with open('response.html', 'w', encoding="utf-8") as f:
#   f.write(html_content)

soup = bs4.BeautifulSoup(html_content, "html.parser")

print(soup.title)
links = soup.find_all("a", class_='question-hyperlink')

valid_links = {}

for i, link in enumerate(links):
    href = link.get('href').strip()

    if href.startswith('/questions/'):
        valid_links[href] = link.text.strip()

print(json.dumps(valid_links, indent=4)) # pretty printing dictionary
print(len(valid_links)) # 15

Выход

<title>Posts containing 'gwen stefani' - Stack Overflow</title>
{
    "/questions/39268369/what-does-minus-minus-do-in-excel": "Q: What does \u2014 (minus minus) do in Excel? [duplicate]",
    "/questions/53264513/using-beautiful-soup-to-count-links-on-requested-page": "Q: Using Beautiful Soup to count links on requested page",
    "/questions/31074289/is-there-a-script-that-can-transfer-text-from-an-excel-file-into-an-adobe-design/31100563#31100563": "A: Is there a script that can transfer text from an excel file into an Adobe design program?",
    "/questions/39268369/what-does-minus-minus-do-in-excel/39268800#39268800": "A: What does \u2014 (minus minus) do in Excel?",
    "/questions/1668447/launch-failed-binary-not-found-snow-leopard-and-eclipse-c-c-ide-issue/8463357#8463357": "A: \u201cLaunch Failed. Binary Not Found.\u201d Snow Leopard and Eclipse C/C++ IDE issue",
    "/questions/33023818/split-and-rejoin-path-without-trailing-backslash": "Q: Split and rejoin path without trailing backslash",
    "/questions/36986461/regex-match-return-remaining-rest-of-string": "Q: Regex match, return remaining rest of string",
    "/questions/44686123/pass-variable-from-javascript-to-windows-batch-file": "Q: Pass variable from JavaScript to Windows batch file",
    "/questions/44686123/pass-variable-from-javascript-to-windows-batch-file/44686309#44686309": "A: Pass variable from JavaScript to Windows batch file",
    "/questions/52465425/reversing-a-list-with-single-element-gives-none": "Q: Reversing a list with single element gives None [duplicate]",
    "/questions/22196612/array-length-outside-of-a-method": "Q: Array length outside of a method",
    "/questions/13300815/not-getting-expected-results-from-select-query/13300920#13300920": "A: Not getting expected results from SELECT query",
    "/questions/32884087/slicing-string-from-start": "Q: Slicing string from start [duplicate]",
    "/questions/53264513/using-beautiful-soup-to-count-links-on-requested-page/53265048#53265048": "A: Using Beautiful Soup to count links on requested page",
    "/questions/23337218/recursive-conditions-missing-base-case": "Q: Recursive conditions - missing base case"
}
15

Kamikaze_goldfish · Answer 2 · 12 ноября 2018

Я использую python 3.x, поэтому вам, возможно, придется подстроиться под это, но я получаю все 15 ссылок.

from bs4 import BeautifulSoup
import requests

url = 'https://stackoverflow.com/search?q=gwen+stefani'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'hmtl.parser')
for link in soup.findAll('div', class_='result-link'):
    print('https://stackoverflow.com'+link.a['href'])

Использование Beautiful Soup для подсчета ссылок на запрашиваемой странице

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 2 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Использование Beautiful Soup для подсчета ссылок на запрашиваемой странице

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 2 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Нет похожих вопросов