Используйте boolean indexing
, отфильтруйте последние 3 по tail
и получите mean
:
a = activity_df.loc[activity_df['activities']=='work', 'duration'].tail(3).mean()
Более общее решение - создать mean
s по всем последним 3 строкам activities
по GroupBy.tail
:
s = activity_df.set_index('activities').groupby('activities').tail(3).mean(level=0)
print (s)
РЕДАКТИРОВАТЬ:
np.random.seed(1256)
duration = np.random.randint(4, size = 30)
activities = ['work', 'home', 'work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home','work', 'home', 'work', 'home']
Для вашего нового выхода необходимо groupby
с rolling
и совокупностью mean
:
activity_df = pd.DataFrame({'activities':activities, 'duration':duration})
activity_df['roll'] = (activity_df.groupby('activities')['duration']
.rolling(3)
.mean()
.reset_index(level=0, drop=True))
print (activity_df)
activities duration roll
0 work 1 NaN
1 home 2 NaN
2 work 1 NaN
3 home 3 NaN
4 work 0 0.666667
5 home 1 2.000000
6 work 3 1.333333
7 home 0 1.333333
8 work 1 1.333333
9 home 3 1.333333
10 work 1 1.666667
11 home 1 1.333333
12 work 3 1.666667
13 home 2 2.000000
14 work 2 2.000000
15 home 3 2.000000
16 work 0 1.666667
17 home 2 2.333333
18 work 3 1.666667
19 home 0 1.666667
20 work 3 2.000000
21 home 0 0.666667
22 work 1 2.333333
23 home 3 1.000000
24 work 1 1.666667
25 home 2 1.666667
26 work 1 1.000000
27 home 2 2.333333
28 work 2 1.333333
29 home 1 1.666667