Решение с использованием resample
:
import pandas as pd
data = [
('$AAPL', '2018-01-01', "Blah blah $AAPL"),
('$AAPL', '2018-01-05', "Blah blah $AAPL"),
('$AAPL', '2019-01-08', "Blah blah $AAPL"),
('$AAPL', '2019-02-09', "Blah blah $AAPL"),
('$AAPL', '2019-02-10', "Blah blah $AAPL"),
('$AAPL', '2019-03-01', "Blah blah $AAPL"),
('$FB', '2018-01-03', "Blah blah $FB"),
('$FB', '2018-02-10', "Blah blah $FB"),
]
df = pd.DataFrame.from_records(data=data, columns=['Cashtag', 'Date', 'Message'])
df['Date'] = pd.to_datetime(df['Date'])
df = (df
.set_index(pd.DatetimeIndex(df['Date']))
.groupby('Cashtag')
.resample('M')['Message']
.count()
.reset_index()
.query('Message > 0')
.reset_index(drop=True)
)
df['Date'] = df['Date'].dt.to_period('M')
Вывод:
Cashtag Date Message
0 $AAPL 2018-01 2
1 $AAPL 2019-01 1
2 $AAPL 2019-02 2
3 $AAPL 2019-03 1
4 $FB 2018-01 1
5 $FB 2018-02 1
Или даже более простое решение:
df['Date'] = pd.to_datetime(df['Date']).dt.to_period('M')
df = df.groupby(['Cashtag', 'Date'])['Message'].count().reset_index()