Как правильно форматировать строки в python? - PullRequest
0 голосов
/ 14 июля 2020

У меня есть некоторые данные, полученные с веб-сайта в строке, как показано ниже:

myDatastr = 

'United States3.43M+57,9421M138K+282Brazil1.88M+20,2861.21M72,833+733India907K+28,701571K23,727+500Russia734K+6,537504K11,439+104Peru330K+3,797221K12,054+184Chile318K287K7,024Mexico304K+4,685189K35,491+485United Kingdom290K+65044,830+21South Africa288K138K4,172Iran260K+2,349223K13,032+203Spain256K+2,045150K28,406+3Pakistan254K+2,753171K5,320+69Italy243K+169195K34,967+13Saudi Arabia235K+2,852170K2,704+20Turkey214K+1,008196K5,382+19Germany200K+159185K9,138+1Bangladesh187K+3,09998,3172,391+39France172K78,59730,029Colombia154K+3,83265,8095,455+148Canada108K71,8418,790Qatar104K101K149Argentina103K+3,0991,903+58Mainland China85,568+3Egypt83,00124,9753,935Iraq79,73546,9883,250Indonesia76,981+1,28236,6893,656+50Sweden75,826+315,536+0Ecuador68,459+5895,9005,063+16Belarus65,114+18255,492468+4Belgium62,707Kazakhstan59,899+1,64634,190375+0Oman58,179+1,31837,257259+9Philippines57,006+74720,3711,599+65Kuwait55,50845,356393United Arab Emirates55,19845,513334Ukraine54,13326,5031,398Netherlands51,308+1016,156+0Bolivia49,25015,2941,866Panama47,17323,919932Portugal46,81831,0651,662Singapore46,283+32242,54126+0Dominican Republic45,50622,441903Israel40,632+1,33619,395365+1Poland38,190+29926,0481,576+5Afghanistan34,45521,2541,012Nigeria33,513+59513,671744+4Bahrain33,47628,425104Romania32,94821,6921,901Switzerland32,94629,6001,686Armenia32,151+18219,865573+8Guatemala29,7424,3211,244Honduras28,579+489789+15Ireland25,638+1023,3491,746+0Ghana24,98821,067139Azerbaijan24,570+52015,640313+8Japan21,868Algeria19,689+49414,0191,018+7Moldova19,439+17412,793649+2Austria18,94817,000708Serbia18,360Nepal16,945+1443,65238+0Morocco15,93612,934255Cameroon15,173+25711,928359+0Uzbekistan13,591+4878,03063+3South Korea13,512+3312,282289+0Czechia13,2388,373353Denmark13,147609Côte d'Ivoire12,766Kyrgyzstan11,538+488149+15Kenya10,2942,946197Sudan10,250+0+0Australia9,980+1837,769108+0El Salvador9,9785,755267Venezuela9,707+2422,67193+4Norway8,984+08,138253+0Malaysia8,725+78,520122Senegal8,198+1215,514150+3North Macedonia8,1974,326385Democratic Republic of the Congo8,075+623,620190+0Costa Rica8,0362,30431Ethiopia7,7662,430128Bulgaria7,525Finland7,2956,800329Palestine7,037Bosnia and Herzegovina6,981+4823,179226+5Haiti6,727+732,924139+4Tajikistan6,551+46+0French Guiana6,170+22129+3Guinea6,141+974,86237+0Gabon5,942+03,00446+0Mauritania5,275+149+3Kosovo5,118+1872,370108+6Djibouti4,972+4+0Luxembourg4,956+314,183111+0Madagascar4,867+289+1Central African Republic4,288+0+0Hungary4,247+133,073595+0Greece3,826+311,374193+0Croatia3,775+532,514119+0Albania3,5712,01495Thailand3,220+33,09058+0Equatorial Guinea3,071+084251+0Somalia3,059+81,30693+1Paraguay2,9801,29325Nicaragua2,846+01,75091+0Maldives2,762+672,29013+0Mayotte2,711+0+0Sri Lanka2,646+1061,98111+0Malawi2,43074739Cuba2,428+62,26887+0Mali2,411Lebanon2,334+166+0South Sudan2,148+9+0Republic of the Congo2,103+0+0Estonia2,0141,89569Slovakia1,90228Iceland1,90010Zambia1,895+01,34842+0Lithuania1,869Guinea-Bissau1,842+077326+0Slovenia1,841+14+0Cape Verde1,698+75+0Sierra Leone1,642+71,17563+0New Zealand1,545+022+0Hong Kong1,522+521,2178+1Yemen1,498+33424+7Libya1,433Benin1,378+0+0Eswatini1,351Rwanda1,337+38+1Tunisia1,263Montenegro1,221Jordan1,179+3+0Latvia1,173+01,01930+0Mozambique1,157+22+0Niger1,099+097868+0Burkina Faso1,033+13Uganda1,025Cyprus1,021+7+0Liberia1,010+12+4Georgia995+9Uruguay98731Zimbabwe985+3+0Chad880+6+1Namibia861+72281+0Andorra85580352Suriname780+3952618Jamaica759+110+0São Tomé and Príncipe727+228414+0Togo720San Marino716+3+0Malta674+06589+0Réunion593+16+0Tanzania509+0+0Angola506Taiwan4514387Syria417+2319+3Botswana399+85381+0Vietnam372+2Mauritius342+033010+0Isle of Man336+031224+0Myanmar (Burma)331+1+0Jersey329+431+0Comoros317+32967+0Guyana30015517Burundi269+82071+0Martinique255+0+0Guernsey25223813Guernsey252+0+0Lesotho245+49333+1Eritrea232+0Mongolia230+3Cayman Islands201+01941+0Guadeloupe190+0+0Faroe Islands188+01880Gibraltar180+0Cambodia1651330Bermuda150+01379+0Brunei141+01383+0Trinidad and Tobago133+01178+0Northern Cyprus1131044The Bahamas111+3+0Monaco1094Aruba105+0993Barbados103+5+0Seychelles100+0110Turkmenistan10000Liechtenstein85+0+0Bhutan84760Sint Maarten78+06315+0Antigua and Barbuda74+0573+0Turks and Caicos Islands72122The Gambia64+0343+0French Polynesia62+0600Macao46450Saint Martin44+0+0Belize37+0202+0Saint Vincent and the Grenadines35+0296Fiji26+0180Curaçao25+0241+0Timor-Leste24+0240Grenada23+0230Saint Lucia22+0190New Caledonia21+0210Laos19+0190Åland Islands19Dominica18+0180Saint Kitts and Nevis17150Falkland Islands (Islas Malvinas)13+0130Greenland13+0130Montserrat12101Vatican City12+0120Papua New Guinea1180British Virgin Islands8+071+0Caribbean Netherlands7+0Saint Barthélemy6+0Anguilla3+030Saint Pierre and Miquelon2+010Western Sahara'

Я хочу получить такие данные, как:

[United States,3.43M,+57,942,1M,138K,+282]

[Brazil,1.88M,+20,286,1.21M,72,833,+733]

Пробовал разные вещи, но не работа.

1 Ответ

1 голос
/ 14 июля 2020

Взять подмножество вашей строки, разделив ее на многострочную строку для удобства чтения.

Сначала занесите страны в список и используйте этот список для получения всех значений между странами.

Наконец, выведите в список списков по желанию:

import re

text = """
United States3.43M+57,9421M138K+282Brazil1.88M+20,2861.21M72,833+733
India907K+28,701571K23,727+500Russia734K+6,537504K11,439+104
Peru330K+3,797221K12,054+184Chile318K287K7,024Mexico304K+4,685189K35,491+485
United Kingdom290K+65044,830+21South Africa288K138K4,172
Iran260K+2,349223K13,032+203Spain256K+2,045150K28,406+3
Pakistan254K+2,753171K5,320+69Italy243K+169195K34,967+13
Saudi Arabia235K+2,852170K2,704+20Turkey214K+1,008196K5,382+19
Germany200K+159185K9,138+1
"""

pattern = r"([a-z]{3,}(?: [a-z]{2,})?)"
regex = re.compile(pattern, re.I)
countries = [''.join([i for i in r if not i.isdigit()]).rstrip('K') for r in re.findall(regex, text)]

out = []
for idx, country in enumerate(countries):
    if idx < len(countries) -1:
        pattern = fr'{country}(.*){countries[idx+1]}'
        regex = re.compile(pattern, re.I | re.DOTALL)
        result = re.search(regex, text).group(1).strip()
        result = result.replace('+', ',+').split(',')

        tmp = [country]
        for i in result:
            tmp.append(i)
        out.append(tmp)

    else:
        result = text.split(countries[-1])[-1].strip()
        result = result.replace('+', ',+').split(',')
        tmp = [countries[-1]]
        for i in result:
            tmp.append(i)
        out.append(tmp)

for country in out:
    print(country)

Возвращает:

['United States', '3.43M', '+57', '9421M138K', '+282']
['Brazil', '1.88M', '+20', '2861.21M72', '833', '+733']
['India', '907K', '+28', '701571K23', '727', '+500']
['Russia', '734K', '+6', '537504K11', '439', '+104']
['Peru', '330K', '+3', '797221K12', '054', '+184']
['Chile', '318K287K7', '024']
['Mexico', '304K', '+4', '685189K35', '491', '+485']
['United Kingdom', '290K', '+65044', '830', '+21']
['South Africa', '288K138K4', '172']
['Iran', '260K', '+2', '349223K13', '032', '+203']
['Spain', '256K', '+2', '045150K28', '406', '+3']
['Pakistan', '254K', '+2', '753171K5', '320', '+69']
['Italy', '243K', '+169195K34', '967', '+13']
['Saudi Arabia', '235K', '+2', '852170K2', '704', '+20']
['Turkey', '214K', '+1', '008196K5', '382', '+19']
['Germany', '200K', '+159185K9', '138', '+1']
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...