Первый шаг - DataFrame.sort_values
, а затем для второго топ2 можно использовать GroupBy.nth
:
#changed sample data for 2 groups by Cust, Mat columns and different Date values
print (df)
Cust Mat Date MaxPurchaseDate
0 90050416 23007545 2018-06-01 2018-01-29
1 90050416 23007545 2018-02-01 2019-02-27
2 90050416 30476395 2018-03-01 2018-10-01
3 90050416 30476395 2018-01-01 2018-06-18
4 90050416 30476395 2018-04-01 2018-09-17
df['Date'] = pd.to_datetime(df['Date'])
df = (df.sort_values(['Cust', 'Mat','Date'], ascending=[True, True, False])
.groupby(['Cust', 'Mat'])
.nth(1)
.reset_index())
print (df)
Cust Mat Date MaxPurchaseDate
0 90050416 23007545 2018-02-01 2019-02-27
1 90050416 30476395 2018-03-01 2018-10-01
Или GroupBy.cumcount
и отфильтруйте второе значение по Series.eq
in boolean indexing
:
df['Date'] = pd.to_datetime(df['Date'])
df = df.sort_values(['Cust', 'Mat','Date'], ascending=[True, True, False])
df = df[df.groupby(['Cust', 'Mat']).cumcount().eq(1)]
print (df)
Cust Mat Date MaxPurchaseDate
1 90050416 23007545 2018-02-01 2019-02-27
2 90050416 30476395 2018-03-01 2018-10-01