Получение данных из сложного тега html с помощью Python Beautifulsoup - PullRequest
0 голосов
/ 16 января 2020

У меня есть следующие HTML данные:

<div class="display-info">
    <div class="record-icon pubtype"><span class="pubtype-icon pt-academicJournal" title="Academic Journal"> </span>
        <p class="caption">Academic Journal</p>
    </div>By: Stein, Mark. <strong>Organization Studies</strong>. 2007, Vol. 28 Issue 8, p1223-1241. 19p. Abstract: While the literature on front-line service work utilizes a variety of productive images, I argue that these images do not capture certain of the more problematic experiences of front-line service employees. Drawing on words used by these workers themselves, and using concepts from psychoanalysis and its application to organizational dynamics, I therefore propose a new image, that of toxicity. I argue that — especially when under severe pressure from customers — front-line workers may have the unconscious fantasy that they have been polluted by toxic substances. The unconscious experience of the entry of toxic material is likely to result in further <strong>contagion</strong> of relationships such as those among employees and between employees and customers. This may also result in workers retaliating against customers by exacting revenge on them. A downward spiralling of relationships may follow, with the result that large parts of the work environment are experienced as toxic. The implications for theory are explored. In conclusion, I argue that the theme of toxicity helps us connect the employee-customer interface with a deep reservoir of primordial human experience that links the body with emotions. [ABSTRACT FROM AUTHOR] DOI: 10.1177/0170840607079527. (<cite>AN: 26198405</cite>)
    <p class="subjectResults"><strong>Subjects:
    </strong>Industrial relations; Personnel management; Customer relations; Corporate image; Public relations; Consumer behavior; Sales personnel; Administration of Human Resource Programs (except Education, Public Health, and Veterans' Affairs Programs); Human Resources Consulting Services; Public Relations Agencies; Psychoanalysis; Social interaction</p><span class="record-additional"><span class="item add-to-folder"><a class="folder-toggle item-not-in-folder" data-folder='{"db":"bth","uiTerm":"26198405","uiTag":"AN","ebookFormat":"false","abookFormat":"false","title":"Toxicity and the Unconscious Experience of the Body at the Employee--Customer Interface. ","resultID":"50","doid":"","segid":""}' data-isaddtofolder="true" data-itemid="50" href="#" id="add_50" name="addToFolder" title="To print, e-mail, or save multiple items">Add to folder</a> <a class="folder-toggle item-in-folder" data-folder='{"db":"bth","uiTerm":"26198405","uiTag":"AN","ebookFormat":"false","abookFormat":"false","title":"Toxicity and the Unconscious Experience of the Body at the Employee--Customer Interface. ","resultID":"50","doid":"","segid":""}' data-isaddtofolder="false" data-itemid="50" href="#" id="added_50" style="display: none;" title="Remove result from folder">Remove from folder</a></span><span class="result-list-cite-ref-label"><a data-title="Cited References" href="javascript:__doLinkPostBack('','sl~~ref||su~~50','_top');" id="references50" title="Cited References">Cited References: (92) </a></span><span class="result-list-cite-link"><a data-title="Times Cited in this Database" href="javascript:__doLinkPostBack('','sl~~cit||su~~50','_top');" id="citations50" title="Times Cited in this Database">Times Cited in this Database: (20) </a></span> </span>
    <div class="record-formats-wrapper externalLinks"><span><span class="custom-link"><a class="ils-link" href="/ehost/SmartLink/OpenIlsLink?sid=42487fcc-c655-469f-b8ed-2802260b3983@sessionmgr102&amp;vid=15&amp;sl=smartlink&amp;st=ilslink_new&amp;sv=sdbn%253Dbth%2526pbt%253DAcademic%2520Journal%2526issn%253D01708406%2526ttl%253DOrganization%252520Studies%2526stp%253DC%2526asi%253DY%2526ldc%253DCheck%252520full%252520text%252520availability%2526lna%253DFull%252520Text%252520Finder%252520%25252D%252520INSEAD%2526lca%253DfullText%2526lo%255Fan%253D26198405&amp;su=http%3A%2F%2Fresolver%2Eebscohost%2Ecom%2Fopenurl%3Fcustid%3Ds8362180%26group%3Dmain%26authtype%3Dip%2Cuid%26sid%3DEBSCO%3Abth%26genre%3Darticle%26issn%3D01708406%26ISBN%3D%26volume%3D28%26issue%3D8%26date%3D20070801%26spage%3D1223%26pages%3D1223%2D1241%26title%3DOrganization%20Studies%26atitle%3DToxicity%2520and%2520the%2520Unconscious%2520Experience%2520of%2520the%2520Body%2520at%2520the%2520Employee%2D%2DCustomer%2520Interface%2E%26aulast%3DStein%252C%2520Mark%26id%3DDOI%3A10%2E1177%2F0170840607079527" id="linkILSLink50_1" onblur="self.status='';return true" onfocus="self.status='check full text availability.';return true" onmouseout="self.status='';return true" onmouseover="self.status='check full text availability.';return true" target="_new" title="check full text availability."><img align="middle" alt="check full text availability." border="0" class="icon-image" data-defer-image="https://s3.amazonaws.com/libapps/customers/2023/images/logo-INSEAD_blanc-sur-vert_250.jpg" id="imgILSLink50_1" src="https://if.ebsco-content.com/interfacefiles/17.232.0.2749/blank.gif"/>Check full text availability</a></span></span>
    </div>
</div>

Мне нужно получить By: Stein, Mark. и Abstract: While the literature on front-line service work utilizes a variety of productive images, I argue that these images do not capture certain of the more problematic experiences of front-line service employees. Drawing on words used by these workers themselves, and using concepts from psychoanalysis and its application to organizational dynamics, I therefore propose a new image, that of toxicity. I argue that — especially when under severe pressure from customers — front-line workers may have the unconscious fantasy that they have been polluted by toxic substances. The unconscious experience of the entry of toxic material is likely to result in further <strong>contagion</strong> of relationships such as those among employees and between employees and customers. This may also result in workers retaliating against customers by exacting revenge on them. A downward spiralling of relationships may follow, with the result that large parts of the work environment are experienced as toxic. The implications for theory are explored. In conclusion, I argue that the theme of toxicity helps us connect the employee-customer interface with a deep reservoir of primordial human experience that links the body with emotions.

С soup.select(".display-info")[0].text Я получаю

 Academic JournalBy: Stein, Mark. Organization Studies. 2007, Vol. 28 Issue 8, p1223-1241. 19p. Abstract: While the literature on front-line service work utilizes a variety of productive images, I argue that these images do not capture certain of the more problematic experiences of front-line service employees. Drawing on words used by these workers themselves, and using concepts from psychoanalysis and its application to organizational dynamics, I therefore propose a new image, that of toxicity. I argue that — especially when under severe pressure from customers — front-line workers may have the unconscious fantasy that they have been polluted by toxic substances. The unconscious experience of the entry of toxic material is likely to result in further contagion of relationships such as those among employees and between employees and customers. This may also result in workers retaliating against customers by exacting revenge on them. A downward spiralling of relationships may follow, with the result that large parts of the work environment are experienced as toxic. The implications for theory are explored. In conclusion, I argue that the theme of toxicity helps us connect the employee-customer interface with a deep reservoir of primordial human experience that links the body with emotions. [ABSTRACT FROM AUTHOR] DOI: 10.1177/0170840607079527. (AN: 26198405)Subjects:
    Industrial relations; Personnel management; Customer relations; Corporate image; Public relations; Consumer behavior; Sales personnel; Administration of Human Resource Programs (except Education, Public Health, and Veterans' Affairs Programs); Human Resources Consulting Services; Public Relations Agencies; Psychoanalysis; Social interactionAdd to folder Remove from folderCited References: (92) Times Cited in this Database: (20)  Check full text availability 

Ответы [ 2 ]

1 голос
/ 16 января 2020

Используйте следующее регулярное выражение.

from bs4 import BeautifulSoup
import re
html='''<div class="display-info">
    <div class="record-icon pubtype"><span class="pubtype-icon pt-academicJournal" title="Academic Journal"> </span>
        <p class="caption">Academic Journal</p>
    </div>By: Stein, Mark. <strong>Organization Studies</strong>. 2007, Vol. 28 Issue 8, p1223-1241. 19p. Abstract: While the literature on front-line service work utilizes a variety of productive images, I argue that these images do not capture certain of the more problematic experiences of front-line service employees. Drawing on words used by these workers themselves, and using concepts from psychoanalysis and its application to organizational dynamics, I therefore propose a new image, that of toxicity. I argue that — especially when under severe pressure from customers — front-line workers may have the unconscious fantasy that they have been polluted by toxic substances. The unconscious experience of the entry of toxic material is likely to result in further <strong>contagion</strong> of relationships such as those among employees and between employees and customers. This may also result in workers retaliating against customers by exacting revenge on them. A downward spiralling of relationships may follow, with the result that large parts of the work environment are experienced as toxic. The implications for theory are explored. In conclusion, I argue that the theme of toxicity helps us connect the employee-customer interface with a deep reservoir of primordial human experience that links the body with emotions. [ABSTRACT FROM AUTHOR] DOI: 10.1177/0170840607079527. (<cite>AN: 26198405</cite>)
    <p class="subjectResults"><strong>Subjects:
    </strong>Industrial relations; Personnel management; Customer relations; Corporate image; Public relations; Consumer behavior; Sales personnel; Administration of Human Resource Programs (except Education, Public Health, and Veterans' Affairs Programs); Human Resources Consulting Services; Public Relations Agencies; Psychoanalysis; Social interaction</p><span class="record-additional"><span class="item add-to-folder"><a class="folder-toggle item-not-in-folder" data-folder='{"db":"bth","uiTerm":"26198405","uiTag":"AN","ebookFormat":"false","abookFormat":"false","title":"Toxicity and the Unconscious Experience of the Body at the Employee--Customer Interface. ","resultID":"50","doid":"","segid":""}' data-isaddtofolder="true" data-itemid="50" href="#" id="add_50" name="addToFolder" title="To print, e-mail, or save multiple items">Add to folder</a> <a class="folder-toggle item-in-folder" data-folder='{"db":"bth","uiTerm":"26198405","uiTag":"AN","ebookFormat":"false","abookFormat":"false","title":"Toxicity and the Unconscious Experience of the Body at the Employee--Customer Interface. ","resultID":"50","doid":"","segid":""}' data-isaddtofolder="false" data-itemid="50" href="#" id="added_50" style="display: none;" title="Remove result from folder">Remove from folder</a></span><span class="result-list-cite-ref-label"><a data-title="Cited References" href="javascript:__doLinkPostBack('','sl~~ref||su~~50','_top');" id="references50" title="Cited References">Cited References: (92) </a></span><span class="result-list-cite-link"><a data-title="Times Cited in this Database" href="javascript:__doLinkPostBack('','sl~~cit||su~~50','_top');" id="citations50" title="Times Cited in this Database">Times Cited in this Database: (20) </a></span> </span>
    <div class="record-formats-wrapper externalLinks"><span><span class="custom-link"><a class="ils-link" href="/ehost/SmartLink/OpenIlsLink?sid=42487fcc-c655-469f-b8ed-2802260b3983@sessionmgr102&amp;vid=15&amp;sl=smartlink&amp;st=ilslink_new&amp;sv=sdbn%253Dbth%2526pbt%253DAcademic%2520Journal%2526issn%253D01708406%2526ttl%253DOrganization%252520Studies%2526stp%253DC%2526asi%253DY%2526ldc%253DCheck%252520full%252520text%252520availability%2526lna%253DFull%252520Text%252520Finder%252520%25252D%252520INSEAD%2526lca%253DfullText%2526lo%255Fan%253D26198405&amp;su=http%3A%2F%2Fresolver%2Eebscohost%2Ecom%2Fopenurl%3Fcustid%3Ds8362180%26group%3Dmain%26authtype%3Dip%2Cuid%26sid%3DEBSCO%3Abth%26genre%3Darticle%26issn%3D01708406%26ISBN%3D%26volume%3D28%26issue%3D8%26date%3D20070801%26spage%3D1223%26pages%3D1223%2D1241%26title%3DOrganization%20Studies%26atitle%3DToxicity%2520and%2520the%2520Unconscious%2520Experience%2520of%2520the%2520Body%2520at%2520the%2520Employee%2D%2DCustomer%2520Interface%2E%26aulast%3DStein%252C%2520Mark%26id%3DDOI%3A10%2E1177%2F0170840607079527" id="linkILSLink50_1" onblur="self.status='';return true" onfocus="self.status='check full text availability.';return true" onmouseout="self.status='';return true" onmouseover="self.status='check full text availability.';return true" target="_new" title="check full text availability."><img align="middle" alt="check full text availability." border="0" class="icon-image" data-defer-image="https://s3.amazonaws.com/libapps/customers/2023/images/logo-INSEAD_blanc-sur-vert_250.jpg" id="imgILSLink50_1" src="https://if.ebsco-content.com/interfacefiles/17.232.0.2749/blank.gif"/>Check full text availability</a></span></span>
    </div>
</div>'''

soup=BeautifulSoup(html,'html.parser')
divtext=soup.find('div',class_='display-info')
print(re.findall("By:?\s.*Mark.",divtext.text)[0])
print(re.findall("Abstract:?\s.*\[",divtext.text)[0][:-1])

Выход :

By: Stein, Mark.
Abstract: While the literature on front-line service work utilizes a variety of productive images, I argue that these images do not capture certain of the more problematic experiences of front-line service employees. Drawing on words used by these workers themselves, and using concepts from psychoanalysis and its application to organizational dynamics, I therefore propose a new image, that of toxicity. I argue that — especially when under severe pressure from customers — front-line workers may have the unconscious fantasy that they have been polluted by toxic substances. The unconscious experience of the entry of toxic material is likely to result in further contagion of relationships such as those among employees and between employees and customers. This may also result in workers retaliating against customers by exacting revenge on them. A downward spiralling of relationships may follow, with the result that large parts of the work environment are experienced as toxic. The implications for theory are explored. In conclusion, I argue that the theme of toxicity helps us connect the employee-customer interface with a deep reservoir of primordial human experience that links the body with emotions.
1 голос
/ 16 января 2020

Для этой задачи лучше использовать re и bs4 вместе.

Если переменная txt содержит HTML текст из вопроса, то этот скрипт:

import re
from bs4 import BeautifulSoup

soup = BeautifulSoup(txt, 'html.parser')

txt = soup.select_one('.display-info').get_text(strip=True, separator='\n')

author = re.findall(r'By:.*', txt)[0]
abstract = re.findall(r'Abstract:.*?(?=\[ABSTRACT FROM AUTHOR\])', txt, flags=re.S)[0]

from textwrap import wrap
print(author)
print(*wrap(abstract.replace('\n', ' ')), sep='\n')

# or in case Python2 just:
# print author
# print abstract

Отпечатки:

By: Stein, Mark.
Abstract: While the literature on front-line service work utilizes a
variety of productive images, I argue that these images do not capture
certain of the more problematic experiences of front-line service
employees. Drawing on words used by these workers themselves, and
using concepts from psychoanalysis and its application to
organizational dynamics, I therefore propose a new image, that of
toxicity. I argue that — especially when under severe pressure from
customers — front-line workers may have the unconscious fantasy that
they have been polluted by toxic substances. The unconscious
experience of the entry of toxic material is likely to result in
further contagion of relationships such as those among employees and
between employees and customers. This may also result in workers
retaliating against customers by exacting revenge on them. A downward
spiralling of relationships may follow, with the result that large
parts of the work environment are experienced as toxic. The
implications for theory are explored. In conclusion, I argue that the
theme of toxicity helps us connect the employee-customer interface
with a deep reservoir of primordial human experience that links the
body with emotions.
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...