df
:
Id timestamp data sig events1 Start Peak Timediff Datadiff
104513 104754 2012-03-21 16:23:21.923 19.5 1.0 0.0 1.0 0.0 28732.920 0.5
104514 104755 2012-03-21 16:23:22.023 20.0 -1.0 0.0 0.0 1.0 0.100 0.5
104623 104864 2012-03-22 04:27:04.550 19.5 0.0 0.0 0.0 0.0 43423.127 -0.5
104630 104871 2012-03-22 04:27:11.670 19.5 -1.0 0.0 0.0 0.0 7.120 0.0
105147 105388 2012-03-23 06:12:24.523 19.0 -1.0 0.0 1.0 0.0 92712.853 -0.5
105148 105389 2012-03-23 06:12:24.623 18.5 1.0 1.0 0.0 0.0 0.100 -0.5
Я хочу найти временной интервал между Peak ==1
строками, где начальная временная метка является первой, которая появляется 1
в Start
для этого интервала и конечная отметка времени Peak ==1
. Есть только одна строка с Peak ==1
и может быть несколько строк с Start ==1
Я пытался сгруппировать по df['group'] = df['Peak'].cumsum()
, затем использовать agg
, что-то вроде df = df.groupby('group').agg({'Start': 'first', 'timestamp' :'first' ...})
, но я не не забудьте указать начальную и конечную метки времени в каждой группе.
Ожидаемый результат:
timestamp1(i.e.Start ==1) timestamp2(i.e. Peak ==1) TimeInterval
2012-03-21 16:23:21.923 2012-03-21 16:23:22.023 0.1
...
Редактировать:
Воспроизводимый пример:
Id timestamp Start Peak
51253 51494 2012-01-27 06:22:08.330 NaN 1.0 # Time interval are divided by these rows where `Peak==1`.
51254 51495 2012-01-27 06:22:08.430 0.0 0.0
51255 51496 2012-01-27 07:19:06.297 1.0* 0.0
51256 51497 2012-01-27 07:19:06.397 0.0 0.0
51259 51500 2012-01-27 07:32:19.587 0.0 0.0
51260 51501 2012-01-27 07:32:19.687 0.0 1.0 # Time interval are divided by these rows where `Peak==1`.
51261 51502 2012-01-27 07:32:37.607 0.0 0.0
51262 51503 2012-01-27 07:32:37.707 0.0 0.0
51325 51566 2012-01-27 09:00:23.053 1.0* 0.0
51326 51567 2012-01-27 09:00:23.153 0.0 0.0
51327 51568 2012-01-27 09:00:28.047 0.0 0.0
51328 51569 2012-01-27 09:00:28.147 0.0 1.0 # Time interval are divided by these rows where `Peak==1`.
51349 51590 2012-01-27 09:06:23.110 0.0 0.0
51350 51591 2012-01-27 09:06:23.210 0.0 0.0
51351 51592 2012-01-27 09:06:33.113 0.0 0.0
51352 51593 2012-01-27 09:06:33.213 0.0 0.0
51389 51630 2012-01-27 10:00:32.037 1.0* 0.0
51390 51631 2012-01-27 10:00:32.137 0.0 0.0
51393 51634 2012-01-27 10:06:00.187 0.0 0.0
51394 51635 2012-01-27 10:06:00.287 0.0 0.0
51535 51776 2012-01-27 10:40:48.693 0.0 0.0 # From here onwards are the additional data where an issue occurred.
51536 51777 2012-01-27 10:40:48.793 0.0 0.0
51537 51778 2012-01-27 10:40:51.697 0.0 0.0
51538 51779 2012-01-27 10:40:51.797 0.0 0.0
51539 51780 2012-01-27 10:40:53.697 0.0 0.0
51540 51781 2012-01-27 10:40:53.797 1.0* 0.0
51541 51782 2012-01-27 10:40:55.700 0.0 0.0
51542 51783 2012-01-27 10:40:55.800 1.0* 0.0
51543 51784 2012-01-27 10:40:56.703 0.0 0.0
51544 51785 2012-01-27 10:40:56.803 1.0* 0.0
51545 51786 2012-01-27 10:40:58.707 0.0 0.0
51546 51787 2012-01-27 10:40:58.807 0.0 0.0
51547 51788 2012-01-27 10:41:01.770 0.0 0.0
51548 51789 2012-01-27 10:41:01.870 0.0 0.0
51549 51790 2012-01-27 10:41:03.673 0.0 0.0
51550 51791 2012-01-27 10:41:03.773 0.0 0.0
51551 51792 2012-01-27 10:41:05.777 0.0 0.0
51552 51793 2012-01-27 10:41:05.877 1.0* 0.0
51553 51794 2012-01-27 10:41:08.780 0.0 0.0
51554 51795 2012-01-27 10:41:08.880 0.0 0.0
51555 51796 2012-01-27 10:41:09.783 0.0 0.0
51556 51797 2012-01-27 10:41:09.883 1.0* 0.0
51557 51798 2012-01-27 10:41:12.687 0.0 0.0
51558 51799 2012-01-27 10:41:12.787 0.0 0.0
51559 51800 2012-01-27 10:41:15.690 0.0 0.0
51560 51801 2012-01-27 10:41:15.790 0.0 0.0
51561 51802 2012-01-27 10:41:17.693 0.0 0.0
51562 51803 2012-01-27 10:41:17.793 0.0 1.0 # Time interval are divided by these rows where `Peak==1`.
51567 51808 2012-01-27 10:42:47.810 0.0 0.0
* - refers to the start timestamp of each time interval.
Итак, ожидаемый результат будет:
timestamp1(i.e.Start ==1) timestamp2(i.e. Peak ==1) TimeInterval
2012-01-27 07:19:06.297 2012-01-27 07:32:19.687 00:13:13.390000 (timestamp2 - timestamp1)
2012-01-27 09:00:23.053 2012-01-27 09:00:28.147 00:00:05.094000 (timestamp2 - timestamp1)
...
Обновление:
Использование
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['group'] = df['Start'].cumsum()
df['group1'] = df['Peak'].iloc[::-1].cumsum()
df
mask = df['group1'].eq(df.groupby('group')['group1'].transform('first'))
df1 = df[mask & df['group'].gt(0) & df['group1'].gt(0)]
df1
df2 = (df1.groupby('group').agg(timestamp1=('timestamp','first'),
timestamp2=('timestamp','last'))
.reset_index(drop=True))
df2['TimeInterval'] = df2['timestamp2'].sub(df2['timestamp1'])
df2
Возвращено:
timestamp1 timestamp2 TimeInterval
0 2012-01-27 07:19:06.297 2012-01-27 07:32:19.687 00:13:13.390000
1 2012-01-27 09:00:23.053 2012-01-27 09:00:28.147 00:00:05.094000
2 2012-01-27 10:00:32.037 2012-01-27 10:40:53.697 00:40:21.660000 # Should be from `10:00:32.037` to `10:41:17.793`.
3 2012-01-27 10:40:53.797 2012-01-27 10:40:55.700 00:00:01.903000
4 2012-01-27 10:40:55.800 2012-01-27 10:40:56.703 00:00:00.903000
5 2012-01-27 10:40:56.803 2012-01-27 10:41:05.777 00:00:08.974000
6 2012-01-27 10:41:05.877 2012-01-27 10:41:09.783 00:00:03.906000
7 2012-01-27 10:41:09.883 2012-01-27 10:41:17.793 00:00:07.910000