Я начинаю с BigQuery. У меня есть база данных, которая выглядит как this , которая может быть сгенерирована как
WITH T AS (
SELECT 0 AS id, 'red' AS colour, ARRAY<STRUCT<count INT64, shape STRING>>[(2, "dot"), (2, "dot"), (1, "string")] AS arr, DATE(2020,01,31) AS date UNION ALL
SELECT 0 AS id, 'red' AS colour, ARRAY<STRUCT<count INT64, shape STRING>>[(2, "dot"), (2, "dot"), (1, "string")] AS arr, DATE(2020,01,31) AS date UNION ALL
SELECT 0 AS id, 'red' AS colour, ARRAY<STRUCT<count INT64, shape STRING>>[(20, "dot"), (20, "dot"), (1, "string")] AS arr, DATE(2020,01,30) AS date UNION ALL
SELECT 0 AS id, 'black' AS colour, ARRAY<STRUCT<count INT64, shape STRING>>[(296, "dot"), (212, "plane"), (156, "cube")] AS arr, DATE(2020,01,31) AS date UNION ALL
SELECT 0 AS id, 'black' AS colour, ARRAY<STRUCT<count INT64, shape STRING>>[(296, "dot"), (212, "plane"), (156, "cube")] AS arr, DATE(2020,01,31) AS date UNION ALL
SELECT 0 AS id, 'black' AS colour, ARRAY<STRUCT<count INT64, shape STRING>>[(296, "dot"), (21, "plane"), (156, "cube")] AS arr, DATE(2020,01,30) AS date UNION ALL
SELECT 0 AS id, 'black' AS colour, ARRAY<STRUCT<count INT64, shape STRING>>[(296, "dot"), (2, "plane"), (156, "cube")] AS arr, DATE(2020,01,30) AS date UNION ALL
SELECT 1 AS id, 'blue' AS colour, ARRAY<STRUCT<count INT64, shape STRING>>[(4, "cube"), (4, "cube"), (4, "cube")], DATE(2020, 01, 31) AS date UNION ALL
SELECT 2 AS id, 'orange' AS colour, ARRAY<STRUCT<count INT64, shape STRING>>[(5, "string")], DATE(2020,01,31) AS date UNION ALL
SELECT 2 AS id, 'orange' AS colour, ARRAY<STRUCT<count INT64, shape STRING>>[(5, "string")], DATE(2020,01,30) AS date
)
SELECT *
FROM T;
Я хочу выбрать каждую отдельную дату, и для каждой даты взять каждую фигуру и максимальное количество для каждого идентификатора и каждый цвет. Например, для 2020-01-31 для красного 0 это будет строка из двух точек 1, для 2020-01-30 для 0 черного это будет 296 из 21 плоскости 156 куба. Возможно повторение в строках, в датах и в массиве структуры в данных.
Точнее, я хотел бы, чтобы результат запроса выглядел как this , что может быть генерируется
WITH T AS (
SELECT DATE(2020,01,31) AS date, ARRAY<STRUCT<count INT64, shape STRING, id INT64, colour STRING>>[(2, "dot", 0, "red"), (1, "string", 0, "red"), (296, "dot", 0, "black"), (212, "plane", 0, "black"), (156, "cube", 0, "black"), (4, "cube", 1, "blue"), (5, "string", 2, "orange")] AS res UNION ALL
SELECT DATE(2020,01,30) AS date, ARRAY<STRUCT<count INT64, shape STRING, id INT64, colour STRING>>[(20, "dot", 0, "red"), (1, "string", 0, "red"), (296, "dot", 0, "black"), (21, "plane", 0, "black"), (156, "cube", 0, "black"), (5, "string", 2, "orange")] AS res
)
SELECT *
FROM T;
Я борюсь с двумя проблемами: удаление дубликатов и выбор идентификатора и формы для каждой строки массива. Например, запрос
SELECT date, ARRAY_CONCAT_AGG(ARRAY((SELECT AS STRUCT MAX(count), shape FROM UNNEST(arr) GROUP BY shape)))
FROM T
GROUP BY date
возвращает мне дубликаты. И тогда мне нужно будет присвоить каждой вложенной строке идентификатор и цвет. Любые предложения будут высоко оценены.
Спасибо!