Обобщая временные изменения в появлении идентификатора - PullRequest
0 голосов
/ 17 октября 2019

У меня есть таблица с идентификаторами заданий (varchar) на дату и идентификатором пользователя.

User    Date        Job
mid1    2019-10-10  jid1
mid1    2019-10-10  jid2
mid1    2019-10-10  jid3
mid1    2019-10-10  jid4
mid1    2019-10-10  jid5
mid1    2019-10-11  jid3
mid1    2019-10-11  jid5
mid1    2019-10-11  jid6
mid1    2019-10-11  jid7
mid1    2019-10-11  jid8
mid1    2019-10-11  jid9
mid1    2019-10-12  jid3
mid1    2019-10-12  jid9
mid1    2019-10-12  jid10
mid2    2019-10-10  jid100
mid2    2019-10-10  jid101
mid2    2019-10-10  jid102
...

Теперь мне нужна таблица с номером нового («Входящий») и завершенного («Исходящий»)заданий во временной последовательности данных на пользователя.

User    Date       Jobs  Incoming  Outgoing
mid1    2019-10-10   5     5           0
mid1    2019-10-11   6     4           3
mid1    2019-10-12   3     1           4
mid2    ...

Было бы также хорошо, если бы он считал только уникальный идентификатор задания (есть дубликаты). Но в противном случае я могу устранить их заранее.

Можно ли это сделать с помощью Teradata SQL?

1 Ответ

0 голосов
/ 17 октября 2019
SELECT
   User
  ,Date
  ,Count(*) AS Jobs
   -- new jobs today
  ,Sum(firstdate) AS Incoming
   -- finished jobs today
  ,Sum(lastdate)
   -- finished jobs the day before
  ,Lag(Sum(lastdate),1,0) Over (PARTITION BY User ORDER BY Date) AS Outgoing
FROM
 (
   SELECT
      User
     ,Job
     ,Date 
      -- flag indicating job is present on the current day but absent the day before
     ,CASE WHEN Date =  Lag(Date) Over (PARTITION BY User, job ORDER BY Date) + 1 THEN 0 ELSE 1 END AS firstdate
      -- flag indicating job is present on the current day but absent the day after
     ,CASE WHEN Date = Lead(Date) Over (PARTITION BY User, job ORDER BY Date) - 1 THEN 0 ELSE 1 END AS lastdate
   FROM your_table
   -- to remove duplicate rows add
   -- GROUP BY 1,2,3
 ) AS dt
GROUP BY 1,2
ORDER BY 1,2

Если ваша версия Teradata не поддерживает LAG / LEAD (т. Е. <16.10), вы должны переписать ее: </p>

SELECT
   User
  ,Date
  ,Count(*) AS Jobs
   -- new jobs today
  ,Sum(firstdate) AS Incoming
   -- finished jobs today
  ,Sum(lastdate)
   -- finished jobs the day before
  ,Coalesce(Min(Sum(lastdate)) Over (PARTITION BY User ORDER BY Date ROWS BETWEEN 1 Preceding AND 1 Preceding), 0) AS Outgoing
FROM
 (
   SELECT
      User
     ,Job
     ,Date 
      -- flag indicating job is present on the current day but absent the day before
     ,CASE WHEN Date = Min(Date) Over (PARTITION BY User, job ORDER BY Date ROWS BETWEEN 1 Preceding AND 1 Preceding) + 1 THEN 0 ELSE 1 END AS firstdate
      -- flag indicating job is present on the current day but absent the day after
     ,CASE WHEN Date = Min(Date) Over (PARTITION BY User, job ORDER BY Date ROWS BETWEEN 1 Following AND 1 Following ) - 1 THEN 0 ELSE 1 END AS lastdate
   FROM your_table
   -- to remove duplicate rows add
   -- GROUP BY 1,2,3
 ) AS dt
GROUP BY 1,2
ORDER BY 1,2
...