Найти записи, где длина последовательных значений меньше порога - PullRequest
2 голосов
/ 28 февраля 2020

Вот таблица

           timestamp           | tracker_id | position 
-------------------------------+------------+----------
 2020-02-01 16:23:45.571429+00 | 15         |        1
 2020-02-01 16:23:45.857143+00 | 11         |        1
 2020-02-01 16:23:46.428571+00 | 15         |        1
 2020-02-01 16:23:46.714286+00 | 11         |        2
 2020-02-01 16:23:54.714288+00 | 15         |        2
 2020-02-01 16:23:55+00        | 15         |        1
 2020-02-01 16:23:55.285714+00 | 11         |        1
 2020-02-01 16:23:55.571429+00 | 15         |        1
 2020-02-01 16:23:55.857143+00 | 15         |        1
 2020-02-01 16:23:56.428571+00 | 11         |        1
 2020-02-01 16:23:56.714286+00 | 15         |        1
 2020-02-01 16:23:57+00        | 11         |        2
 2020-02-01 16:23:58.142857+00 | 11         |        2
 2020-02-01 16:23:58.428571+00 | 15         |        1
 2020-02-01 16:23:58.714286+00 | 11         |        2
 2020-02-01 16:23:59+00        | 11         |        1
 2020-02-01 16:23:59.285714+00 | 15         |        1
 2020-02-01 16:23:59.295714+00 | 10         |        1
 2020-02-01 16:23:59.305714+00 | 10         |        2
 2020-02-01 16:23:59.385714+00 | 10         |        2
 2020-02-01 16:23:59.485714+00 | 10         |        3

Threshold = 3

Здесь
position из tracker_id: 15 отличается от 1 -> 1 -> 2 -> 1 -> 1 -> 1 -> 2 -> 2 -> 1

position из tracker_id: 11 изменяется с 1 -> 2 -> 1 -> 1 -> 2 -> 2 -> 2 -> 1

position из tracker_id: 10 изменяется с 1 -> 2 -> 2 -> 3

Для tracker_id: 15 Максимальная длина последовательных 2 между 1 составляет < threshold

Для tracker_id: 11 Максимальная длина последовательных 2 между 1 = threshold

Для tracker_id: 10 подряд 2 не заключены между 1

Выходное значение должно быть tracker_id: 15, поскольку длина последовательных 2 в position меньше, чем threshold

Как это сделать с помощью запроса?

Ответы [ 3 ]

1 голос
/ 28 февраля 2020

Нет необходимости рассматривать это как проблему пробелов и островков. Просто используйте функции окна:

select tracker_id
from (select t.*,
             min(position) over (partition by tracker_id
                                 order by timestamp
                                 rows between 2 preceding and current row
                                ) as min_pos_3,
             max(position) over (partition by tracker_id
                                 order by timestamp
                                 rows between 2 preceding and current row
                                ) as max_pos_3
      from t
     ) t
group by tracker_id
having count(*) filter (where min_pos_3 = max_pos_3) = 0

Это просто смотрит на минимальные и максимальные значения более трех 3 для каждого трекера. Он возвращает только те строки, значения которых всегда отличаются.

1 голос
/ 28 февраля 2020

Я немного изменил вашу входную таблицу (добавлен tracker_id = 9 для тестирования).

Оконные функции могут решить эту проблему, например: row_number (), lead

select x.* 
into #temp1
from
(
    select ' 2020-02-01 16:23:45.571429+00 ' as time_stamp, 9 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:45.857143+00 ' as time_stamp, 9 as tracker_id, 3 as position UNION ALL 
    select ' 2020-02-01 16:23:46.428571+00 ' as time_stamp, 9 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:24:45.571429+00 ' as time_stamp, 9 as tracker_id, 2 as position UNION ALL 
    select ' 2020-02-01 16:25:45.857143+00 ' as time_stamp, 9 as tracker_id, 2 as position UNION ALL
    select ' 2020-02-01 16:26:45.857143+00 ' as time_stamp, 9 as tracker_id, 3 as position UNION ALL
    select ' 2020-02-01 16:27:45.857143+00 ' as time_stamp, 9 as tracker_id, 3 as position UNION ALL
    select ' 2020-02-01 16:28:46.428571+00 ' as time_stamp, 9 as tracker_id, 1 as position UNION ALL
    select ' 2020-02-01 16:23:45.571429+00 ' as time_stamp, 15 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:45.857143+00 ' as time_stamp, 11 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:46.428571+00 ' as time_stamp, 15 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:46.714286+00 ' as time_stamp, 11 as tracker_id, 2 as position UNION ALL 
    select ' 2020-02-01 16:23:54.714288+00 ' as time_stamp, 15 as tracker_id, 2 as position UNION ALL 
    select ' 2020-02-01 16:23:55+00        ' as time_stamp, 15 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:55.285714+00 ' as time_stamp, 11 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:55.571429+00 ' as time_stamp, 15 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:55.857143+00 ' as time_stamp, 15 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:56.428571+00 ' as time_stamp, 11 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:56.714286+00 ' as time_stamp, 15 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:57+00        ' as time_stamp, 11 as tracker_id, 2 as position UNION ALL 
    select ' 2020-02-01 16:23:58.142857+00 ' as time_stamp, 11 as tracker_id, 2 as position UNION ALL 
    select ' 2020-02-01 16:23:58.428571+00 ' as time_stamp, 15 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:58.714286+00 ' as time_stamp, 11 as tracker_id, 2 as position UNION ALL 
    select ' 2020-02-01 16:23:59+00        ' as time_stamp, 11 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:59.285714+00 ' as time_stamp, 15 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:59.295714+00 ' as time_stamp, 10 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:59.305714+00 ' as time_stamp, 10 as tracker_id, 2 as position UNION ALL 
    select ' 2020-02-01 16:23:59.385714+00 ' as time_stamp, 10 as tracker_id, 2 as position UNION ALL 
    select ' 2020-02-01 16:23:59.485714+00 ' as time_stamp, 10 as tracker_id, 3 as position) x
;

select 
    *,
    ROW_NUMBER() OVER(PARTITION BY tracker_id ORDER BY time_stamp) tracker_id_rownumber,
    case when position=1 then 1 else 0 end is_pos0_equals_1, --is current row position=1?
    case when (LEAD(position, 1) OVER (PARTITION BY tracker_id ORDER BY time_stamp))=2 then 1 else 0 end is_pos1_equals_2, --is next row position=2?
    case when (LEAD(position, 2) OVER (PARTITION BY tracker_id ORDER BY time_stamp))=2 then 1 else 0 end is_pos2_equals_2, --next next row..
    case when (LEAD(position, 3) OVER (PARTITION BY tracker_id ORDER BY time_stamp))=2 then 1 else 0 end is_pos3_equals_2  --next next next row..
into #temp2
from #temp1
;

--leave only trackers with intervals of type {1, ... ,1}
select a.tracker_id, a.tracker_id_rownumber interval_start, min(b.tracker_id_rownumber) interval_end
into #temp3
from #temp2 a
inner join #temp2 b on (a.tracker_id=b.tracker_id and a.tracker_id_rownumber<b.tracker_id_rownumber)
where a.position=1 and b.position=1
group by a.tracker_id, a.tracker_id_rownumber

--check each 3-elements subset (are there any triples of consecutive '2'?) and mark triples of consecutive '2'
select a.*,b.tracker_id tracker_id_,
    case when b.interval_end - b.interval_start>=4 then
        case when (a.is_pos1_equals_2=1 and a.is_pos2_equals_2=1 and a.is_pos3_equals_2=1) then 0 else 1 end
    else
        1
    end 'is_less_than_threshold'
into #temp4
from #temp2 a 
inner join #temp3 b on a.tracker_id=b.tracker_id and a.tracker_id_rownumber between b.interval_start and b.interval_end-1

--output trackers
select a.tracker_id, min(a.is_less_than_threshold) is_ok
from #temp4 a
group by a.tracker_id
having min(a.is_less_than_threshold)=1

Вывод

tracker_id | is_ok

9 | 1

15 | 1

1 голос
/ 28 февраля 2020

Это проблема пробелов и островков.

Вы можете начать с создания групп смежных записей, используя разницу между номерами строк. Затем вы можете объединить каждую группу и использовать отставание и опережение для восстановления позиции окружающих групп. Последний шаг - применить логи фильтрации c.

select tracker_id
from (
    select
        tracker_id,
        position,
        count(*) cnt,
        lag(position) over(partition by tracker_id order by max(timestamp)) lag_position,
        lead(position) over(partition by tracker_id order by max(timestamp)) lead_position
    from (
        select
            t.*,
            row_number() over(partition by tracker_id order by timestamp) rn1,
            row_number() over(partition by tracker_id, position order by timestamp) rn2
        from mytable t
    ) t
    group by tracker_id, position, rn1 - rn2
) t
where
    position = 2
    and lag_position = 1
    and lead_position = 1
group by tracker_id
having max(cnt) < 3

Эта демонстрация на DB Fiddle с вашими образцами данных дает:

| tracker_id |
| ---------: |
|         15 |
...