Question

В настоящее время я создаю программу, которая анализирует википедию для отображения гор страны на карте.

Мне удалось найти интересующий URL, однако у меня возникли проблемы с перенаправлением нановый URL (где лежат все нужные данные).

Любые предложения, включая использование других библиотек, приветствуются !!

import requests
from bs4 import BeautifulSoup
from  csv import writer
import urllib3

#Requests country name from user
user_input=input('Enter Country:')
fist_letter=user_input[0:1].upper()
country=fist_letter+user_input[1:] #takes the country name and capatalizes 
the first letter

#Request response for wikipedia parse
response=requests.get('https://en.wikipedia.org/wiki/Category:
Lists_of_mountains_by_country')
bs=BeautifulSoup(response.text,'html.parser')

#country query
for content in bs.find_all(class_='mw-category')[1]:
    category_letter=content.find('h3')

    #Locates target category to find the country of interest
    if fist_letter in category_letter:
    country_lists=category_letter.find_next_sibling('ul')

    #Locates the country of interest from the lists of countries in target 
    #category
        target=country_lists.find('li',text="List of mountains in 
        "+str(country))

    #Grabs the link which will redirect to the page containing the list of 
    #mountains for the country of interest.

        target_link=target.find('a')
        link=target_link.get('href')
        new_link='https://enwikipedia.org'+link

        #Attempts to redirect to the target link
        new_response=requests.get(new_link)
        BS=BeautifulSoup(new_response.text,'html.parser')
        mountain_list=content.find('tbody')
        print(mountain_list)

    else:
        pass

Kingsley · Answer 1 · 11 декабря 2018

Мне нравится анализировать HTML через строку Pythons split() и find().Разделение только с одним вырезом позволяет получить левый и правый результат и просто взять любой из них с синтаксической нотацией массива, например: html_str.split('<a href="', 1)[1]

В любом случае, как только код разбивает правильный URL, этопросто вопрос повторного разбора аналогичным образом.О, и это может стоить проверить на ошибки HTTP.

import requests
import urllib3

#Requests country name from user
user_input = input('Enter Country:')
country = user_input.strip().lower().capitalize()

#Request response for wikipedia parse
response = requests.get('https://en.wikipedia.org/wiki/Category:Lists_of_mountains_by_country')
response_body = str( response.content, "utf-8" )

# Find the "By Country" section in the HTML result
# This section begins at the Title "Lists of mountains by country"
country_section = response_body.split( 'Pages in category "Lists of mountains by country"' )[1]
search_term = "in_" + country

if ( country_section.find( search_term ) != -1 ):
    # each country URL begins "<li><a href="/wiki/List_of_mountains_..."
    country_urls = country_section.split('<li><a href="')
    for url in country_urls:
        if ( url.find( search_term ) != -1 ):
            # The URL ends "..._in_Uganda" title="List o..."
            # Split off the Right-Side text
            found_url = "https://en.wikipedia.org" + url.split('" title=')[0]
            print( "DEBUG: URL Is [" + found_url + "]" )

            ## Now fetch the country-url
            response = requests.get( found_url )
            response_body = str( response.content, "utf-8" )
            ### TODO - process mountain list
else:
    print( "That country [" + country + "] does not have an entry" )

Pedro Lobito · Answer 2 · 11 декабря 2018

https://enwikipedia.org не должно ли это быть https://en.wikipedia.org?

В любом случае было бы проще добавить только название страны:

https://en.wikipedia.org/wiki/Category:Lists_of_mountains_of_**COUNTRYNAME**

Перенаправление на новый URL для анализа

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 2 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Перенаправление на новый URL для анализа

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 2 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Похожие темы