Вы можете использовать Series.reindex
всеми возможными индексами, созданными MultiIndex.from_product
:
np.random.seed(123)
train = pd.DataFrame({'Year':['2015'] * 10 + ['2018'] * 8,
'Month': list(range(3, 13)) + list(range(1, 9)),
'SalesValue':np.random.randint(1000, size=18)})
train['Month'] = train['Month'].astype(str).str.zfill(2)
print (train)
Year Month SalesValue
0 2015 03 510
1 2015 04 365
2 2015 05 382
3 2015 06 322
4 2015 07 988
5 2015 08 98
6 2015 09 742
7 2015 10 17
8 2015 11 595
9 2015 12 106
10 2018 01 123
11 2018 02 569
12 2018 03 214
13 2018 04 737
14 2018 05 96
15 2018 06 113
16 2018 07 638
17 2018 08 47
total_sales = train.groupby(['Year','Month'])['SalesValue'].sum() / 1000
years = np.arange(2015, 2019).astype(str)
months = pd.Series(np.arange(1, 13, 1)).astype(str).str.zfill(2)
mux = pd.MultiIndex.from_product([years, months], names=total_sales.index.names)
total_sales = total_sales.reindex(mux)
print (total_sales)
Year Month
2015 01 NaN
02 NaN
03 0.510
04 0.365
05 0.382
06 0.322
07 0.988
08 0.098
09 0.742
10 0.017
11 0.595
12 0.106
2016 01 NaN
02 NaN
03 NaN
04 NaN
05 NaN
06 NaN
07 NaN
08 NaN
09 NaN
10 NaN
11 NaN
12 NaN
2017 01 NaN
02 NaN
03 NaN
04 NaN
05 NaN
06 NaN
07 NaN
08 NaN
09 NaN
10 NaN
11 NaN
12 NaN
2018 01 0.123
02 0.569
03 0.214
04 0.737
05 0.096
06 0.113
07 0.638
08 0.047
09 NaN
10 NaN
11 NaN
12 NaN
Name: SalesValue, dtype: float64
plt.plot(x, total_sales.loc['2015'], label="2015")
plt.plot(x, total_sales.loc['2016'], label="2016")
plt.plot(x, total_sales.loc['2017'], label="2017")
plt.plot(x, total_sales.loc['2018'], label="2018")
Если возможные значения в x-axis
- месяцы, используйте Series.unstack
с DataFrame.plot
:
plt.figure(figsize=(16,8))
total_sales.unstack(level=0).plot()