Я хочу вычислить 3 поля: start_count, new_count, end_count, complete_count, где
count = no.of Item_ids
start_count = previous day's end_count
new_count = current day's count (item ids which are not present in previous day)
completed_count = start_count - no.old item_ids
end_count = start_count + new_count - completed_count
df:
Date Item_id Count
01/01/2020 100 1
01/01/2020 101 1
01/01/2020 100 1
02/01/2020 102 1
02/01/2020 101 1
02/01/2020 100 1
03/01/2020 101 1
03/01/2020 102 1
03/01/2020 103 1
df_result:
Date Start_count new_count completed_count end_count
01/01/2020 3 3 0 3
02/01/2020 3 1 1 4
03/01/2020 4 1 2 3
I попробовал:
df2 = pd.DataFrame()
for d in dates:
if df.Item_id[d] in df.Item_id[d-1]:
df_calc['old'] = df.groupby('Date').agg({'Count':'sum'}).reset_index()
if not df_Item_id[d] in df.Item_id[d-1]:
df_calc['new'] = df.groupby('Date').agg({'Count':'sum'}).reset_index()
else:
df_calc['old'] = 0
df2.append(df_calc)
и получил ошибку: datetime.date(2020,1,1)