По моему мнению, должны быть определены все возможные даты для выбора в date_range
, который используется для reindex
из pivot
ed DataFrame. NaN
s заменяются сначала при прямом заполнении, а все первые значения NaN
до OFF
на fillna
:
print (df)
Date Device Status
0 1990/01 50 ON
1 1990/01 20 ON
2 1990/03 25 ON
3 1990/05 50 OFF
4 1990/05 20 OFF <-changed for smaller output df
df['Date'] = pd.to_datetime(df['Date'])
rng = pd.date_range('1989-10-01', '1991-01-01', freq='MS')
df = df.pivot('Date','Device','Status').reindex(rng).ffill().fillna('OFF')
print (df)
Device 20 25 50
1989-10-01 OFF OFF OFF
1989-11-01 OFF OFF OFF
1989-12-01 OFF OFF OFF
1990-01-01 ON OFF ON
1990-02-01 ON OFF ON
1990-03-01 ON ON ON
1990-04-01 ON ON ON
1990-05-01 OFF ON OFF
1990-06-01 OFF ON OFF
1990-07-01 OFF ON OFF
1990-08-01 OFF ON OFF
1990-09-01 OFF ON OFF
1990-10-01 OFF ON OFF
1990-11-01 OFF ON OFF
1990-12-01 OFF ON OFF
1991-01-01 OFF ON OFF
Последнее, если необходимо исходный формат dates
добавить strftime
:
df.index = df.index.strftime('%Y/%m')
print (df)
Device 20 25 50
1989/10 OFF OFF OFF
1989/11 OFF OFF OFF
1989/12 OFF OFF OFF
1990/01 ON OFF ON
1990/02 ON OFF ON
1990/03 ON ON ON
1990/04 ON ON ON
1990/05 OFF ON OFF
1990/06 OFF ON OFF
1990/07 OFF ON OFF
1990/08 OFF ON OFF
1990/09 OFF ON OFF
1990/10 OFF ON OFF
1990/11 OFF ON OFF
1990/12 OFF ON OFF
1991/01 OFF ON OFF
EDIT:
Более общее решение:
def get_status(df, device, check_date):
check_date = pd.to_datetime(check_date)
df['Date'] = pd.to_datetime(df['Date'])
rng = pd.date_range(df['Date'].min(), df['Date'].max(), freq='MS')
df = df.pivot('Date','Device','Status').reindex(rng).ffill().fillna('OFF')
#print (df)
if check_date < df.index.min():
return 'OFF'
elif check_date > df.index.max():
return df.loc[df.index[-1], device]
else:
return df.loc[check_date, device]
print (get_status(df, 50, '1990/01'))
#ON
print (get_status(df, 50, '1990/02'))
#ON
print (get_status(df, 50, '1990/05'))
#OFF
print (get_status(df, 50, '1990/09'))
#OFF
print (get_status(df, 50, '1900/01'))
#OFF