Зачеркнутый внутри div, но не знаю, как разделить элементы как элементы списка, чтобы они появлялись в новой строке в сгенерированном csv - PullRequest
0 голосов
/ 30 марта 2020

Я пытаюсь удалить текст внутри <a class="speciality"></a>, а тот, что внутри <div class="links"> <a href=""></a> </div>, из нижеследующего блока

Мне удалось удалить текст внутри вышеуказанных ментонных элементов, дело в элементах внутри <div class="links"> <a href=""></a> </div> Я не знаю, как разделить элементы как элементы списка ... пожалуйста, руководство. Ниже я поставил html, который я пытаюсь проанализировать, и ниже приведен код, который я использовал .. Также опубликуйте решение, если это возможно, чтобы вытащить их все как один массив

<div class="column-block" id="hematology">
<h3 class="panel-title names strong">
<a class="speciality" rel="hematology" href="https://www.lyfboat.com/hospitals/hematology-hospitals-and-costs/">
Hematology </a>
</h3>
<div class="links">
<a target="_blank" href="https://www.lyfboat.com/procedures/allogenic/">Allogenic Bone Marrow Transplant</a><a target="_blank" href="https://www.lyfboat.com/aplastic-anemia-treatment-in-india/">Aplastic Anemia</a><a target="_blank" href="https://www.lyfboat.com/procedures/autologous-for-multiple-lymphomas/">Autologous Bone Marrow Transplant</a><a target="_blank" href="https://www.lyfboat.com/blood-cancer-treatment-hospitals-costs-in-india/">Blood Cancer Treatment</a><a target="_blank" href="https://www.lyfboat.com/bone-marrow-transplant-hospitals-costs-india/">Bone Marrow Transplant (BMT)</a><a target="_blank" href="https://www.lyfboat.com/fanconis-anemia-treatment-in-india/">Fanconi Anemia</a><a target="_blank" href="https://www.lyfboat.com/leukemia-treatment-cost-hospitals-surgeons-in-india/">Leukemia Treatment</a><a target="_blank" href="https://www.lyfboat.com/lymphoma-treatment-costs-hospitals-surgeons-in-india/">Lymphoma Treatment</a><a target="_blank" href="https://www.lyfboat.com/multiple-sclerosis-treatment-in-india/">Multiple Sclerosis</a><a target="_blank" href="https://www.lyfboat.com/hospitals/myeloma-blood-cancer-hospitals-and-costs/">Myeloma Treatment</a><a target="_blank" href="https://www.lyfboat.com/hospitals/pediatric-bone-marrow-transplant-hospitals-and-costs/">Pediatric Bone Marrow Transplant</a><a target="_blank" href="https://www.lyfboat.com/sickle-cell-anemia-treatment-in-india/">Sickle Cell Disease</a><a target="_blank" href="https://www.lyfboat.com/hospitals/thalassemia-transplant-hospitals-and-costs/">Thalassemia Transplant</a> </div>
</div>
<div class="column-block" id="pediatric-cardiology">
<h3 class="panel-title names strong">
<a class="speciality" rel="pediatric-cardiology" href="https://www.lyfboat.com/hospitals/pediatric-cardiology-hospitals-and-costs/">
Pediatric Cardiology </a>
</h3>
<div class="links">
<a target="_blank" href="https://www.lyfboat.com/hospitals/arterial-switch-operation-truncus-arteriosis-hospitals-and-costs/">Arterial switch operation/ Truncus arteriosis</a><a target="_blank" href="https://www.lyfboat.com/asd-closure-cost-surgeon-hospitals-in-india/">Atrial Septal Defect Closure (ASD)</a><a target="_blank" href="https://www.lyfboat.com/hospitals/atrioventricular-canal-defect-av-canal-hospitals-and-costs/">Atrioventricular Canal Defect</a><a target="_blank" href="https://www.lyfboat.com/hospitals/balloon-atrial-septostomy-hospitals-and-costs/">Balloon Atrial Septostomy</a><a target="_blank" href="https://www.lyfboat.com/hospitals/double-outlet-right-ventricle-dorv-hospitals-and-costs/">Double Outlet Right Ventricle</a><a target="_blank" href="https://www.lyfboat.com/hospitals/fontan-hospitals-and-costs/">Fontan</a><a target="_blank" href="https://www.lyfboat.com/hospitals/glenn-hospitals-and-costs/">Glenn Procedure</a><a target="_blank" href="https://www.lyfboat.com/procedures/patent-ductus-arteriosus-pda-device-closure/">Patent Ductus Arteriosus Device Closure Catheterization</a><a target="_blank" href="https://www.lyfboat.com/fallots-tetralogy-treatment-cost-hospitals-in-india/">Tetralogy of Fallot</a><a target="_blank" href="https://www.lyfboat.com/hospitals/total-anomalous-pulmonary-venous-connection-tapvc-hospitals-and-costs/">Total Anomalous Pulmonary Venous Connection</a><a target="_blank" href="https://www.lyfboat.com/hospitals/transposition-of-the-great-arteries-tga-hospitals-and-costs/">Transposition of the Great Arteries (TGA)</a><a target="_blank" href="https://www.lyfboat.com/hospitals/valvuplasty-hospitals-and-costs/">Valvuplasty</a> </div>
</div>```

from bs4 import BeautifulSoup
import requests
import lxml
import pandas as pd

base_url = "https://www.lyfboat.com/procedures/"

page = requests.get(base_url)
if page.status_code == requests.codes.ok:
  bs = BeautifulSoup(page.text, 'lxml')

data = {
  "Department" : [],
  "Conditions" : []
}

containers = bs.findAll('div', class_='column-block')

for department in containers:
    if(department.find('a')):
      data['Department'].append(department.find('a', {'class': 'speciality'}).text)
      data['Conditions'].append(department.find('div', {'class': 'links'}).text)[0:]

print(data['Department'])
print(data['Conditions'])

1 Ответ

1 голос
/ 30 марта 2020

Заменить:

for department in containers:
    if(department.find('a')):
      data['Department'].append(department.find('a', {'class': 'speciality'}).text)
      data['Conditions'].append(department.find('div', {'class': 'links'}).text)[0:]

С:

for department in containers:
    if(department.find('a')):
        data['Department'].append(department.find('a', {'class': 'speciality'}).text)
        links = department.find('div', {'class': 'links'})
        for link in links.find_all("a"):
             data['Conditions'].append(link.get_text())
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...