У меня есть образец DF:
sample_df_train = pd.DataFrame(np.random.randint(1,20,size=(10, 3)), columns=list('ABC'))
sample_df_train["date"]= ["2020-02-01","2020-02-01","2020-02-01","2020-02-01","2020-02-01",
"2020-02-02","2020-02-02","2020-02-02","2020-02-02","2020-02-02"]
sample_df_train["date"] = pd.to_datetime(sample_df_train["date"])
sample_df_train.set_index(sample_df_train["date"],inplace=True)
del sample_df_train["date"]
sample_df_train["A_cat"] = ["ind","sa","sa","sa","ind","ind","sa","sa","ind","sa"]
sample_df_train["B_cat"] = ["sa","ind","ind","sa","sa","sa","ind","sa","ind","sa"]
sample_df_train
OP:
A B C A_cat B_cat
date
2020-02-01 13 13 14 ind sa
2020-02-01 3 2 10 sa ind
2020-02-01 2 6 6 sa ind
2020-02-01 11 6 8 sa sa
2020-02-01 4 9 1 ind sa
2020-02-02 3 3 18 ind sa
2020-02-02 17 3 17 sa ind
2020-02-02 1 5 17 sa sa
2020-02-02 13 15 9 ind ind
2020-02-02 12 16 19 sa sa
Я пытаюсь преобразовать этот DF на основе 2 условий:
1. GroupBy based on Index and some columns
2. Transform (add 100 to column value) selected columns in the DF based on the GroupBy op.
IP-адреса:
group_by_cols = ['date', 'A_cat'] Be noted that "date' is index.
selected_columns = ["A"]
Код:
sample_df_train[selected_columns] = sample_df_train.reset_index().groupby(group_by_cols)[selected_columns].apply(lambda x: x+100)
OP:
A B C A_cat B_cat
date
2020-02-01 NaN 17 11 ind sa
2020-02-01 NaN 10 9 sa ind
2020-02-01 NaN 2 11 sa ind
2020-02-01 NaN 3 16 sa sa
2020-02-01 NaN 7 3 ind sa
2020-02-02 NaN 6 5 ind sa
2020-02-02 NaN 19 3 sa ind
2020-02-02 NaN 4 15 sa sa
2020-02-02 NaN 11 8 ind ind
2020-02-02 NaN 14 14 sa sa
Ожидаемый OP:
The entire DF with 100 added to values in Column A.
Я не понимаю, почему Я получаю NANs
. Любые предложения будут великолепны.