Как избавиться от жирного тэга из XML-документа в Python 3, не удаляя вложенный текст? - PullRequest
0 голосов
/ 20 марта 2019

Я пытаюсь удалить жирный тег (<b> Some text in bold here </b>) из этого XML-документа (но хочу, чтобы текст, охватываемый тегами, оставался нетронутым). Жирные метки присутствуют вокруг следующих слов / текста: цели, дизайн, обстановка, участники, вмешательства, основные показатели результата, результаты, заключение и регистрации испытаний.

Это мой код Python:

import requests
import urllib
from urllib.request import urlopen
import xml.etree.ElementTree as etree
from time import sleep
import json    

urlHead = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&rettype=abstract&id='
pmid = "28420629"
completeUrl = urlHead + pmid    
response = urllib.request.urlopen(completeUrl)
tree = etree.parse(response)
studyAbstractParts = tree.findall('.//AbstractText')
for studyAbstractPart in studyAbstractParts:
    print(studyAbstractPart.text)

Проблема с этим кодом заключается в том, что он находит весь текст в теге «AbstractText», но останавливает (или игнорирует) текст жирным шрифтом и после него. В принципе, мне нужен весь текст между тегами "<AbstractText> </AbstractText>", но жирное форматирование <b> </b> - просто дерьмовое препятствие.

1 Ответ

1 голос
/ 20 марта 2019

Вы можете использовать метод itertext(), чтобы получить весь текст в <AbstractText> и его подэлементах.

studyAbstractParts = tree.findall('.//AbstractText')
for studyAbstractPart in studyAbstractParts:
    for t in studyAbstractPart.itertext():
        print(t)

Вывод:

Objectives
 To determine whether preoperative dexamethasone reduces postoperative vomiting in patients undergoing elective bowel surgery and whether it is associated with other measurable benefits during recovery from surgery, including quicker return to oral diet and reduced length of stay.
Design
 Pragmatic two arm parallel group randomised trial with blinded postoperative care and outcome assessment.
Setting
 45 UK hospitals.
Participants
 1350 patients aged 18 or over undergoing elective open or laparoscopic bowel surgery for malignant or benign pathology.
Interventions
 Addition of a single dose of 8 mg intravenous dexamethasone at induction of anaesthesia compared with standard care.
Main outcome measures
 Primary outcome: reported vomiting within 24 hours reported by patient or clinician.
vomiting with 72 and 120 hours reported by patient or clinician; use of antiemetics and postoperative nausea and vomiting at 24, 72, and 120 hours rated by patient; fatigue and quality of life at 120 hours or discharge and at 30 days; time to return to fluid and food intake; length of hospital stay; adverse events.
Results
 1350 participants were recruited and randomly allocated to additional dexamethasone (n=674) or standard care (n=676) at induction of anaesthesia. Vomiting within 24 hours of surgery occurred in 172 (25.5%) participants in the dexamethasone arm and 223 (33.0%) allocated standard care (number needed to treat (NNT) 13, 95% confidence interval 5 to 22; P=0.003). Additional postoperative antiemetics were given (on demand) to 265 (39.3%) participants allocated dexamethasone and 351 (51.9%) allocated standard care (NNT 8, 5 to 11; P<0.001). Reduction in on demand antiemetics remained up to 72 hours. There was no increase in complications.
Conclusions
 Addition of a single dose of 8 mg intravenous dexamethasone at induction of anaesthesia significantly reduces both the incidence of postoperative nausea and vomiting at 24 hours and the need for rescue antiemetics for up to 72 hours in patients undergoing large and small bowel surgery, with no increase in adverse events.
Trial registration
 EudraCT (2010-022894-32) and ISRCTN (ISRCTN21973627).
...