Любые советы по рефакторингу следующего Pandas кода? - PullRequest
0 голосов
/ 24 февраля 2020

Я делаю Pandas проект по потреблению алкоголя. Для вашей информации в наборе данных есть следующие столбцы:

| Континент | Страна | Пиво | Дух | Вина |

Вот мой код:

# Separating data by continent
# ----------------------------
data_asia   = data[data['Continent'] == 'Asia']
data_africa = data[data['Continent'] == 'Africa']
data_europe = data[data['Continent'] == 'Europe']
data_north  = data[data['Continent'] == 'North America']
data_south  = data[data['Continent'] == 'South America']
data_ocean  = data[data['Continent'] == 'Oceania']

# Calculating n-largest for each category of drink
# ------------------------------------------------
top_5_asia_beer = data.nlargest(5, ['Beer Servings'])[['Country', 'Beer Servings']]
top_5_asia_spir = data.nlargest(5, ['Spirit Servings'])[['Country', 'Spirit Servings']]
top_5_asia_wine = data.nlargest(5, ['Wine Servings'])[['Country', 'Wine Servings']]
top_5_asia_pure = data.nlargest(5, ['Total Litres of Pure Alcohol'])[['Country', 'Total Litres of Pure Alcohol']]

top_5_africa_beer = data.nlargest(5, ['Beer Servings'])[['Country', 'Beer Servings']]
top_5_africa_spir = data.nlargest(5, ['Spirit Servings'])[['Country', 'Spirit Servings']]
top_5_africa_wine = data.nlargest(5, ['Wine Servings'])[['Country', 'Wine Servings']]
top_5_africa_pure = data.nlargest(5, ['Total Litres of Pure Alcohol'])[['Country', 'Total Litres of Pure Alcohol']]

top_5_europe_beer = data.nlargest(5, ['Beer Servings'])[['Country', 'Beer Servings']]
top_5_europe_spir = data.nlargest(5, ['Spirit Servings'])[['Country', 'Spirit Servings']]
top_5_europe_wine = data.nlargest(5, ['Wine Servings'])[['Country', 'Wine Servings']]
top_5_europe_pure = data.nlargest(5, ['Total Litres of Pure Alcohol'])[['Country', 'Total Litres of Pure Alcohol']]

top_5_north_beer = data.nlargest(5, ['Beer Servings'])[['Country', 'Beer Servings']]
top_5_north_spir = data.nlargest(5, ['Spirit Servings'])[['Country', 'Spirit Servings']]
top_5_north_wine = data.nlargest(5, ['Wine Servings'])[['Country', 'Wine Servings']]
top_5_north_pure = data.nlargest(5, ['Total Litres of Pure Alcohol'])[['Country', 'Total Litres of Pure Alcohol']]

top_5_south_beer = data.nlargest(5, ['Beer Servings'])[['Country', 'Beer Servings']]
top_5_south_spir = data.nlargest(5, ['Spirit Servings'])[['Country', 'Spirit Servings']]
top_5_south_wine = data.nlargest(5, ['Wine Servings'])[['Country', 'Wine Servings']]
top_5_south_pure = data.nlargest(5, ['Total Litres of Pure Alcohol'])[['Country', 'Total Litres of Pure Alcohol']]

top_5_ocean_beer = data.nlargest(5, ['Beer Servings'])[['Country', 'Beer Servings']]
top_5_ocean_spir = data.nlargest(5, ['Spirit Servings'])[['Country', 'Spirit Servings']]
top_5_ocean_wine = data.nlargest(5, ['Wine Servings'])[['Country', 'Wine Servings']]
top_5_ocean_pure = data.nlargest(5, ['Total Litres of Pure Alcohol'])[['Country', 'Total Litres of Pure Alcohol']]

Я понимаю нелепость моего кода с точки зрения дублирования и повторяемости. Кто-нибудь может поделиться советами и рекомендациями по рефакторингу кода?

1 Ответ

1 голос
/ 24 февраля 2020

Один подход (среди прочих) будет:

import pandas as pd
import numpy as np

conts = ["Asia", "Europe"]
ctrys = list("ABCFGHI")
N = 15

d = pd.DataFrame({"continent": np.random.choice(conts, N),
                  "country": np.random.choice(ctrys, N),
                  "beer": np.random.uniform(10, 20, N),
                  "wine": np.random.uniform(10, 30, N),
                  "spirits": np.random.uniform(5, 10, N)})

d = d.groupby(["continent", "country"]).sum().reset_index() # remove duplicates
d = pd.melt(d, id_vars = ["continent", "country"], value_vars = ["beer", "wine", "spirits"],
        var_name = "drink", value_name = "quantity")

d = pd.merge(d.groupby(["continent", "drink"]).quantity.nlargest(3).reset_index(),
             d, how = "left", on = ["continent", "drink", "quantity"])

#   continent    drink  level_2   quantity country
#0       Asia     beer        1  45.260909       C
#1       Asia     beer        2  32.040498       F
#2       Asia     beer        3  27.659633       G
#3       Asia  spirits       19  20.170853       C
#4       Asia  spirits       21  16.649856       G
#5       Asia  spirits       20  15.710173       F
#6       Asia     wine       10  69.767011       C
#7       Asia     wine       11  31.997030       F
#8       Asia     wine       12  27.898864       G
#9     Europe     beer        7  31.116611       F
#10    Europe     beer        8  29.101469       G
#11    Europe     beer        6  19.580028       C
#12    Europe  spirits       25  14.449807       F
#13    Europe  spirits       26  14.127248       G
#14    Europe  spirits       23   7.169853       B
#15    Europe     wine       16  53.906949       F
#16    Europe     wine       17  44.906396       G
#17    Europe     wine       14  20.608847       B
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...