в приведенном ниже фрагменте у меня есть две разные записи.
В ЛИНИИ мы видим, что один - это Заработок, а другой - сообщение о приобретении.
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Record key="18AD026E657696BE1A7AE7C0D1CE94EF321EFD4C203B31A1F87DD27DEF345872" req_sym="OUT1V-FI">
<Fields>
<Field id="7000" name="HEADLINE" value="CORRECTED TRANSCRIPT: Outokumpu Oyj(OUT1V-FI), Q3 2019 Earnings Call, 31-October-2019 9:00 AM ET" />
<Field id="7001" name="SOURCE" value="FCST" />
<Field id="7003" name="ALL_IDS" value="OUT1V-FI" />
<Field id="7046" name="PRIMARY_IDS" value="OUT1V-FI" />
<Field id="7004" name="STORY_DATE" value="20191101" />
<Field id="7005" name="STORY_TIME" value="041606" />
<Field id="7007" name="CATEGORIES" value="CN:FI,DT:EARN,DT:ERNS,DT:ER_GEN,DT:EVTS,DT:EV_ME,DT:FILNS_TS_TR,IN:METAL,LN:EN,RN:EU,RN:NE,SB:ERNS,SB:ER_GEN,SB:EVTS,SB:EV_ME" />
<Field id="7002" name="SEARCH_IDS" value="OUT1V-FI" />
<Field id="7011" name="LINK1" value="https://datadirect.factset.com/services/docretrieval?report=feed&key=U2FsdGVkX1%2fEHwXn0zpAkqjR%2bJOkauoxw0LQ2BhLtraPMDZwyAwoN9WuYQ8PMM4ZKNAXx8VpWFsDe2T%2fZ7WNdQ%3d%3d&timezone=America/New_York" />
<Field id="7039" name="FILING_SIZE" value="NULL" />
<Field id="8000" name="EVENT_IDS" value="1201149455" />
<Field id="8001" name="REPORT_IDS" value="2314010" />
<Field id="8002" name="EVENTDATE-REPORTID-TRANSCRIPTTYPE" value="20191031-2314010-C" />
<Field id="8003" name="EVENT" value="E" />
<Field id="8004" name="UPLOAD_DATE_TIME" value="2019-11-01 22:36:48" />
<Field id="8005" name="VERSION_ID" value="4379596" />
</Fields>
<Record key="0BB357A317B871E3ED0FD0ECBD210D771E8331097964E1D9223C9BEE844E68F2" req_sym="SUBC-NO">
<Fields>
<Field id="7000" name="HEADLINE" value="CORRECTED TRANSCRIPT: Subsea 7 SA(SUBC-NO), Acquisition of McDermott International,Inc by Subsea 7 S.A Call, 23-April-2018 9:00 AM ET" />
<Field id="7001" name="SOURCE" value="FCST" />
<Field id="7003" name="ALL_IDS" value="SUBC-NO" />
<Field id="7046" name="PRIMARY_IDS" value="SUBC-NO" />
<Field id="7004" name="STORY_DATE" value="20180423" />
<Field id="7005" name="STORY_TIME" value="142404" />
<Field id="7007" name="CATEGORIES" value="CN:GB,DT:CA_MNA_GEN,DT:CORPS,DT:FILNS_TS_TR,DT:MANDA,IN:OIL,LN:EN,RN:EU,SB:EVTS,SB:MANDA" />
<Field id="7002" name="SEARCH_IDS" value="SUBC-NO" />
<Field id="7011" name="LINK1" value="https://datadirect.factset.com/services/docretrieval?report=feed&key=U2FsdGVkX1%2bJsxYfwGoI5ggt7BF%2bBr8ttuTeQZmMIWBDSxPjFIksm%2bjEDqkK5hq4NDxszCncdCgA18qo3qN5SQ%3d%3d&timezone=America/New_York" />
<Field id="7039" name="FILING_SIZE" value="NULL" />
<Field id="8000" name="EVENT_IDS" value="6235691" />
<Field id="8001" name="REPORT_IDS" value="2081721" />
<Field id="8002" name="EVENTDATE-REPORTID-TRANSCRIPTTYPE" value="20180423-2081721-C" />
<Field id="8003" name="EVENT" value="SS" />
<Field id="8004" name="UPLOAD_DATE_TIME" value="2018-04-26 22:35:20" />
<Field id="8005" name="VERSION_ID" value="3453250" />
</Fields>
</Record>
</Response>
Right теперь мой код не может различить две записи.
from bs4 import BeautifulSoup
import pandas as pd
import xml.etree.ElementTree as ET
import glob
import os
path = "/Users/User/Downloads/Thesis papers/links/"
for filename in glob.glob(os.path.join(path, "*")):
with open(filename) as open_file:
content = open_file.read()
bs = BeautifulSoup(content, "xml")
for individual_xml in bs.find_all("Response"):
for link in individual_xml.find_all("Fields"):
for fields in link.find_all("Field", {"id":"7000"}):
print(fields[])
Как я могу указать, что я хочу, чтобы записи содержались только в том случае, если включены слова "Заработок", как первая запись во фрагменте xml?