ПРИМЕЧАНИЕ. Я использую python 3.7.1 и pandas 0.23.4 .Я придумал что-то очень грязное;Я уверен, что есть более аккуратный и эффективный способ сделать это.
### Create sample data
date_range = pd.date_range(start="1/1/2018", end="20/1/2018", freq="6H", closed="right")
target1 = np.random.uniform(10, 30, len(date_range))
close = [[i]*4 for i in np.random.uniform(10,30, len(date_range)//4)]
close_flat = np.array([item for sublist in close for item in sublist])
df = pd.DataFrame(np.array([np.array(date_range.date), target1,
close_flat]).transpose(), columns=["date", "target", "close"])
### Create the column you need
# iterating over the days and finding days when the difference between
# "close" of current day and all "target" is lower than 0.25 OR the "target"
# value is greater than "close" value.
thresh = 0.25
date_diff_arr = np.zeros(len(df))
for i in range(0,len(df),4):
diff_lt_thresh = df[(abs(df.target-df.close.iloc[i]) < thresh) | (df.target > df.close.iloc[i])]
# only keep the findings from the next day onwards
diff_lt_thresh = diff_lt_thresh.loc[i+4:]
if not diff_lt_thresh.empty:
# find day difference only if something under thresh is found
days_diff = (diff_lt_thresh.iloc[0].date - df.iloc[i].date).days
else:
# otherwise write it as nan
days_diff = np.nan
# fill in the np.array which will be used to write to the df
date_diff_arr[i:i+4] = days_diff
df["date_diff"] = date_diff_arr
Пример вывода:
0 2018-01-01 21.64 26.7319 2.0
1 2018-01-01 22.9047 26.7319 2.0
2 2018-01-01 26.0945 26.7319 2.0
3 2018-01-02 10.2155 26.7319 2.0
4 2018-01-02 17.5602 11.0507 1.0
5 2018-01-02 12.0368 11.0507 1.0
6 2018-01-02 19.5923 11.0507 1.0
7 2018-01-03 21.8168 11.0507 1.0
8 2018-01-03 11.5433 16.8862 1.0
9 2018-01-03 27.3739 16.8862 1.0
10 2018-01-03 26.9073 16.8862 1.0
11 2018-01-04 19.6677 16.8862 1.0
12 2018-01-04 25.3599 27.3373 1.0
13 2018-01-04 22.7479 27.3373 1.0
14 2018-01-04 18.7246 27.3373 1.0
15 2018-01-05 25.4122 27.3373 1.0
16 2018-01-05 28.3294 23.8469 1.0