Я бы посоветовал вам сохранить каждую запись в словаре, затем вы можете легко извлечь нужные поля в конце (вы, кажется, не хотите 2011?):
from bs4 import BeautifulSoup
import re
html = """
<h4>Production Capacity (year)</h4>
<div class="profile-area">
Vehicle 1,140,000 units /year
</div>
<h4>Output</h4>
<div class="profile-area">
Vehicle 809,000 units ( 2016 )
</div>
<div class="profile-area">
Vehicle 815,000 units ( 2015 )
</div>
<div class="profile-area">
Vehicle 836,000 units ( 2014 )
</div>
<div class="profile-area">
Vehicle 807,000 units ( 2013 )
</div>
<div class="profile-area">
Vehicle 760,000 units ( 2012 )
</div>
<div class="profile-area">
Vehicle 805,000 units ( 2011 )
</div>
"""
soup = BeautifulSoup(html, 'lxml')
units = {}
for item in soup.find_all(['h4', 'div']):
if item.name == 'h4':
for h4 in ['capacity', 'output', 'models']:
if h4 in item.text.lower():
break
elif item.get('class', [''])[0] == 'profile-area':
vehicle = item.get_text(strip=True)
if h4 == 'output':
re_year = re.search(r'\( (\d+) \)', vehicle)
if re_year:
year = re_year.group(1)
else:
year = 'unknown'
units[year] = vehicle
else:
units[h4] = vehicle
req_fields = ['models', 'capacity', '2012', '2013', '2014', '2015', '2016']
print(';'.join([units.get(field, '') for field in req_fields]))
Это отобразит:
;Vehicle 1,140,000 units /year;Vehicle 760,000 units ( 2012 );Vehicle 807,000 units ( 2013 );Vehicle 836,000 units ( 2014 );Vehicle 815,000 units ( 2015 );Vehicle 809,000 units ( 2016 )
Регулярное выражение используется для извлечения года из записи транспортного средства.Затем он используется в качестве ключа в словаре.
Для HTML в pastebin это дает:
Volkswagen Golf, Golf Variant(Estate), Golf Plus, CrossGolf (2006-), e-Golf (2014-)Volkswagen Touran, CrossTouran (2007-), Tiguan (2007-);I.D. electric vehicles based on MEB (planning);SEAT new SUV MQB-A2 platform (2018- planning);Components:press shop, chassis, plastics technology;Vehicle 1,140,000 units /year;Vehicle 760,000 units ( 2012 );Vehicle 807,000 units ( 2013 );Vehicle 836,000 units ( 2014 );Vehicle 815,000 units ( 2015 );Vehicle 809,000 units ( 2016 )