Я немного подумал об этом и подумал о лучшем решении. Вместо того, чтобы «собирать» мое первоначальное решение, я решил добавить здесь второе решение:
Итак, подумав еще раз и следуя моей логике разделения html по заголовкам (по сути, разбивая его там, где мы находим <strong>
теги), я выбираю преобразование в строки, используя .prettify()
, а затем делю на эти конкретные строки / теги и читать обратно в BeautifulSoup, чтобы вытащить текст. Из того, что я вижу, похоже, что он ничего не пропустил, но вам придется поискать через фрейм данных, чтобы дважды проверить:
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'http://www.intermediary.natwest.com/intermediary-solutions/lending-criteria.html'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
sections = soup.find_all('div',{'class':'accordion-section-content'})
results = {}
for section in sections:
splits = section.prettify().split('<strong>')
for each in splits:
try:
headline, content = each.split('</strong>')[0].strip(), each.split('</strong>')[1]
headline = BeautifulSoup(headline, 'html.parser').text.strip()
content = BeautifulSoup(content, 'html.parser').text.strip()
content_split = content.split('\n')
content = ' '.join([ text.strip() for text in content_split if text != ''])
results[headline] = content
except:
continue
df = pd.DataFrame(results.items(), columns = ['Headings','Content'])
df.to_csv('C:/test.csv', index=False)
Выход:
print (df)
Headings Content
0 Age requirements Applicants must be at least 18 years old at th...
1 Affordability Our affordability calculator is the same one u...
2 Agricultural restriction The only acceptable agricultural tie is where ...
3 Annual percentage rate of charge (APRC) The APRC is all fees associated with the mortg...
4 Adverse credit We consult credit reference agencies to look a...
5 Applicants (number of) The maximum number of applicants is two.
6 Armed Forces personnel Unsecured personal loans are only acceptable f...
7 Back to back Back to back is typically where the vendor has...
8 Customer funded purchase: when the customer has funded the purchase usin...
9 Bridging: residential mortgage applications where the cu...
10 Inherited: a recently inherited property where the benefi...
11 Porting: where a fixed/discounted rate was ported to a ...
12 Repossessed property: where the vendor is the mortgage lender in pos...
13 Part exchange: where the vendor is a large national house bui...
14 Bank statements We accept internet bank statements in paper fo...
15 Bonus For guaranteed bonuses we will consider an ave...
16 British National working overseas Applicants must be resident in the UK. Applica...
17 Builder's Incentives The maximum amount of acceptable incentive is ...
18 Buy-to-let (purpose) A buy-to-let mortgage can be used for: Purcha...
19 Capital Raising - Acceptable purposes permanent home improvem...
20 Buy-to-let (affordability) Buy to Let affordability must be assessed usin...
21 Buy-to-let (eligibility criteria) The property must be in England, Scotland, Wal...
22 Definition of a portfolio landlord We define a portfolio landlord as a customer w...
23 Carer's Allowance Carer's Allowance is paid to people aged 16 or...
24 Cashback Where a mortgage product includes a cashback f...
25 Casual employment Contract/agency workers with income paid throu...
26 Certification of documents When submitting copies of documents, please en...
27 Child Benefit We can accept up to 100% of working tax credit...
28 Childcare costs We use the actual amount the customer has decl...
29 When should childcare costs not be included? There are a number of situations where childca...
.. ... ...
108 Shared equity We lend on the Government-backed shared equity...
109 Shared ownership We do not lend against Shared Ownership proper...
110 Solicitors' fees We have a panel of solicitors for our fees ass...
111 Source of deposit We reserve the right to ask for proof of depos...
112 Sole trader/partnerships We will take an average of the last two years'...
113 Standard variable rate A standard variable rate (SVR) is a type of v...
114 Student loans Repayment of student loans is dependent on rec...
115 Tenure Acceptable property tenure: Feuhold, Freehold,...
116 Term Minimum term is 3 years Residential - Maximum...
117 Unacceptable income types The following forms of income are classed as u...
118 Bereavement allowance: paid to widows, widowers or surviving civil pa...
119 Employee benefit trusts (EBT): this is a tax mitigation scheme used in conjun...
120 Expenses: not acceptable as they're paid to reimburse pe...
121 Housing Benefit: payment of full or partial contribution to cla...
122 Income Support: payment for people on low incomes, working les...
123 Job Seeker's Allowance: paid to people who are unemployed or working 1...
124 Stipend: a form of salary paid for internship/apprentic...
125 Third Party Income: earned by a spouse, partner, parent who are no...
126 Universal Credit: only certain elements of the Universal Credit ...
127 Universal Credit The Standard Allowance element, which is the n...
128 Valuations: day one instruction We are now instructing valuations on day one f...
129 Valuation instruction A valuation will be automatically instructed w...
130 Valuation fees A valuation will always be obtained using a pa...
131 Please note: W hen upgrading the free valuation for a home...
132 Adding fees to the loan Product fees are the only fees which can be ad...
133 Product fee This fee is paid when the mortgage is arranged...
134 Working abroad Previously, we required applicants to be empl...
135 Acceptable - We may consider applications from people who: ...
136 Not acceptable - We will not consider applications from people...
137 Working and Family Tax Credits We can accept up to 100% of Working Tax Credit...
[138 rows x 2 columns]