Я предлагаю использовать DataFrameGroupBy.agg
со списком кортежей и функций first
и last
:
tick_features = [('volume', lambda x: x.abs().sum()),
('num_trades', 'count'),
('first_trade', 'first'),
('last_trade', 'last')]
tick = tick.groupby(pd.Grouper(freq='5S'))['size'].agg(tick_features)
print (tick)
volume num_trades first_trade last_trade
INDEX
2018-05-07 21:53:10 0.365127 1 0.365127 0.365127
2018-05-07 21:53:15 2.241434 8 0.666127 0.006560
2018-05-07 21:53:20 1.332152 2 0.666076 0.666076
2018-05-07 21:53:25 0.195509 3 0.100000 -0.053518
2018-05-07 21:53:30 0.000000 0 NaN NaN
2018-05-07 21:53:35 0.000000 0 NaN NaN
2018-05-07 21:53:40 0.000000 0 NaN NaN
2018-05-07 21:53:45 0.146302 2 -0.046302 0.100000
apply
решение возможно, но нужно if-else
утверждение:
def tick_features(x):
volume = np.abs(x['size']).sum()
num_trades = x['size'].count()
if not x.empty:
f = x['size'].iloc[0]
l = x['size'].iloc[-1]
else:
f = np.nan
l = np.nan
return pd.Series([volume,num_trades, f, l],
index=['volume','num_trades', 'first_trade', 'last_trade'])
tick = tick.groupby(pd.Grouper(freq='5S')).apply(tick_features)
print (tick)
volume num_trades first_trade last_trade
INDEX
2018-05-07 21:53:10 0.365127 1.0 0.365127 0.365127
2018-05-07 21:53:15 2.241434 8.0 0.666127 0.006560
2018-05-07 21:53:20 1.332152 2.0 0.666076 0.666076
2018-05-07 21:53:25 0.195509 3.0 0.100000 -0.053518
2018-05-07 21:53:30 0.000000 0.0 NaN NaN
2018-05-07 21:53:35 0.000000 0.0 NaN NaN
2018-05-07 21:53:40 0.000000 0.0 NaN NaN
2018-05-07 21:53:45 0.146302 2.0 -0.046302 0.100000