Question

Итак, я пытаюсь очистить все цитаты в формате AMA из результатов поиска PubMed из каждой статьи. Следующий код предназначен только для получения данных цитирования из первой статьи.

import requests
import xlsxwriter
from bs4 import BeautifulSoup


URL = 'https://pubmed.ncbi.nlm.nih.gov/?term=infant+formula&size=200'
response = requests.get(URL)

html_soup = BeautifulSoup(response.text, 'html5lib')
article_containers = html_soup.find_all('article', class_ = 'labs-full-docsum')

first_article = article_containers[0]
citation_text = first_article.find('div', class_ = 'docsum-wrap').find('div', class_ = 'result-actions-bar').div.div.find('div', class_ = 'content').div.div.text

print(citation_text)

Сценарий возвращает пустую строку, хотя, когда я проверяю источник через Google Chrome, текст отчетливо виден внутри этого «div».

Это как-то связано с JavaScript, и если да, то как мне это исправить?

Andrej Kesely · Answer 1 · 16 июня 2020

Этот скрипт получит все цитаты в формате «AMA» с предоставленного URL:

import json
import requests
from bs4 import BeautifulSoup


URL = 'https://pubmed.ncbi.nlm.nih.gov/?term=infant+formula&size=200'
response = requests.get(URL)

html_soup = BeautifulSoup(response.text, 'html5lib')

for article in html_soup.select('article'):
    print(article.select_one('.labs-docsum-title').get_text(strip=True, separator=' '))
    citation_id = article.input['value']
    data = requests.get('https://pubmed.ncbi.nlm.nih.gov/{citation_id}/citations/'.format(citation_id=citation_id)).json()
    # uncomment this to print all data:
    # print(json.dumps(data, indent=4))
    print(data['ama']['orig'])
    print('-' * 80)

Выводит:

Review of Infant Feeding: Key Features of Breast Milk and Infant Formula .
Martin CR, Ling PR, Blackburn GL. Review of Infant Feeding: Key Features of Breast Milk and Infant Formula. Nutrients. 2016;8(5):279. Published 2016 May 11. doi:10.3390/nu8050279
--------------------------------------------------------------------------------
Prebiotics in infant formula .
Vandenplas Y, De Greef E, Veereman G. Prebiotics in infant formula. Gut Microbes. 2014;5(6):681-687. doi:10.4161/19490976.2014.972237
--------------------------------------------------------------------------------
Effects of infant formula composition on long-term metabolic health.
Lemaire M, Le Huërou-Luron I, Blat S. Effects of infant formula composition on long-term metabolic health. J Dev Orig Health Dis. 2018;9(6):573-589. doi:10.1017/S2040174417000964
--------------------------------------------------------------------------------
Selenium in infant formula milk.
He MJ, Zhang SQ, Mu W, Huang ZW. Selenium in infant formula milk. Asia Pac J Clin Nutr. 2018;27(2):284-292. doi:10.6133/apjcn.042017.12
--------------------------------------------------------------------------------

... and so on.

Очистить текст цитаты из результатов поиска PubMed с помощью BeautifulSoup и Python?

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

1 Ответ

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Очистить текст цитаты из результатов поиска PubMed с помощью BeautifulSoup и Python?

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

1 Ответ

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Похожие темы