Спецификация Webscrape c отрывков с python - PullRequest
1 голос
/ 17 июня 2020
<div _ngcontent-c22="" class="abstract-text row">
    <!---->

    <!----><div _ngcontent-c22="" class="col-12">
        <!---->
        <!---->
        <div _ngcontent-c22="" class="u-mb-1">
            <strong _ngcontent-c22=""> Abstract:</strong>
            <div _ngcontent-c22="" xplmathjax="">Text Text Text</div>

Я хочу очистить текст. Это мой текущий код, но я не могу продолжить. Есть рекомендации?

from bs4 import BeautifulSoup as bs
import requests
url_list=["https://ieeexplore.ieee.org/abstract/document/7414512"]
for url in url_list:
    try:
            headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
            result = requests.get(url, headers=headers)

            soup = bs(result.text, 'html.parser')

            table = soup.findAll('div',attrs={"class":"u-mb-1"})
    except:
        print("\nERROR")

1 Ответ

2 голосов
/ 17 июня 2020

Данные, которые вы видите на странице, встроены в строку Json. Вы можете использовать модули re / json для его извлечения.

Например:

import re
import json
import requests


url = 'https://ieeexplore.ieee.org/abstract/document/7414512'
data = json.loads(re.search(r'global\.document\.metadata=(.*?);', requests.get(url).text).group(1))

# uncomment this to see all data:
# print(json.dumps(data, indent=4))

print(data['title'])
print()
print(data['abstract'])

Печать:

Supplier Involvement and Contract Design During New Product Development

Early supplier involvement (ESI) infuses upfront supplier resources and expertise to accelerate the research and development (R&D) timeline, and allow for risk sharing. Successful implementation of ESI in a new product development setting, however, remains elusive due to the intricacy of interfirm collaboration while dealing with unproven technology and market uncertainty. Extending from prior ESI studies on supplier selection, resource integration, and relationship management, we propose game theoretical contracting strategies to achieve manufacturer objectives, such as predictable design timelines, sufficient supplier commitment, and radical in-process innovations. Taking into account various project factors, such as revenue forecast, technical uncertainty, market competition, and team capability, we propose an incentive compatible mechanism based on real option analysis to suggest which project stage to best engage the supplier. The supplier, in turn, can follow our analysis to determine whether to participate, and if so, the appropriate level of resource commitment. The equilibrium analysis provides managerial insights into how to best balance the time-to-market mandate with the need for accruing significant innovations through supply chain partnerships.
...