Сначала агрегируйте sum
по годам и product
, а затем создайте новый столбец для подсчетов по месяцам по DataFrame.insert
и Series.map
:
df1 =(df.groupby(['product',df['sale_date'].dt.year], sort=False).sum().add_prefix('total_')
.reset_index())
df1.insert(2,'number_of_months', df1['sale_date'].map({2016:1, 2017:12, 2018:11}))
print (df1)
product sale_date number_of_months total_price total_discount
0 A 2016 1 50 5
1 A 2017 12 250 25
2 B 2016 1 200 10
3 B 2017 12 1000 110
4 A 2018 11 100 18
5 B 2018 11 900 130
Если хотите динамический c словарь по минимальному и максимальному времени, используйте:
s = pd.date_range(df['sale_date'].min(), df['sale_date'].max(), freq='MS')
d = s.year.value_counts().to_dict()
print (d)
{2017: 12, 2018: 11, 2016: 1}
df1 = (df.groupby(['product',df['sale_date'].dt.year], sort=False).sum().add_prefix('total_')
.reset_index())
df1.insert(2,'number_of_months', df1['sale_date'].map(d))
print (df1)
product sale_date number_of_months total_price total_discount
0 A 2016 1 50 5
1 A 2017 12 250 25
2 B 2016 1 200 10
3 B 2017 12 1000 110
4 A 2018 11 100 18
5 B 2018 11 900 130
Для построения графика используется DataFrame.set_index
с DataFrame.unstack
:
df2 = (df1.set_index(['sale_date','product'])[['total_price','total_discount']]
.unstack(fill_value=0))
df2.columns = df2.columns.map('_'.join)
print (df2)
total_price_A total_price_B total_discount_A total_discount_B
sale_date
2016 50 200 5 10
2017 250 1000 25 110
2018 100 900 18 130
df2.plot()
ИЗМЕНИТЬ:
df1 = (df.groupby(['product',df['sale_date'].dt.year], sort=False)
.agg( total_price=('price','sum'),
total_discount=('discount','sum'),
number_of_sales=('discount','size'))
.reset_index())
df1.insert(2,'number_of_months', df1['sale_date'].map({2016:1, 2017:12, 2018:11}))
print (df1)
product sale_date number_of_months total_price total_discount \
0 A 2016 NaN 50 5
1 A 2017 NaN 250 25
2 B 2016 NaN 200 10
3 B 2017 NaN 1000 110
4 A 2018 NaN 100 18
5 B 2018 NaN 900 130
number_of_sales
0 1
1 5
2 1
3 5
4 2
5 3