BigQuery: агрегирование в отдельные повторяющиеся поля - PullRequest
0 голосов
/ 04 февраля 2020

как я могу агрегировать по разным повторяющимся полям?

Представьте себе эти данные:

WITH data as (
 select '5a' as room_id, 'george' as name_student, 13 as age_student, 'Mr. Smith' as name_teacher, 43 as id_teacher
union all 
 select '5a' as room_id, 'george' as name_student, 13 as age_student, 'Mr. Climp' as name_teacher, 38 as id_teacher
union all 
 select '5a' as room_id, 'jane' as name_student, 14 as age_student , 'Mr. Smith' as name_teacher, 43 as id_teacher
union all 
 select '5a' as room_id,  'jane' as name_student, 14 as age_student, 'Mr. Climp' as name_teacher, 38 as id_teacher
)

Я хотел бы иметь идентификатор комнаты и два набора повторяющихся полей: студенты и учителя , Но когда я делаю запрос ниже, я получаю 4, и любая попытка подключить DISTINCT возвращает ошибку.

SELECT room_id, 
        struct(array_agg(name_student) as name, array_agg(age_student) as age) as students,
        struct(array_agg(name_teacher) as name, array_agg(id_teacher) as id) as teachers,

from data
group by 1

Как я могу получить уникальные массивы для студентов и преподавателей?

Вывод должен выглядеть так enter image description here

Спасибо!

Ответы [ 2 ]

2 голосов
/ 04 февраля 2020

Этот ответ немного более многословен, но должен работать для ваших нужд. Я предпочитаю использовать ARRAY_AGG(STRUCT()) вместо STRUCT(ARRAY_AGG(),ARRAY_AGG()), чтобы убедиться, что вы сохраняете отношения «Джордж - 13» и «Джейн - 14» (представьте, что вы добавили в свой список 14-летнего Джорджа, как бы вы сказали, какой ?).

WITH data as (
 select '5a' as room_id, 'george' as name_student, 13 as age_student, 'Mr. Smith' as name_teacher, 43 as id_teacher
union all 
 select '5a' as room_id, 'george' as name_student, 13 as age_student, 'Mr. Climp' as name_teacher, 38 as id_teacher
union all 
 select '5a' as room_id, 'jane' as name_student, 14 as age_student , 'Mr. Smith' as name_teacher, 43 as id_teacher
union all 
 select '5a' as room_id,  'jane' as name_student, 14 as age_student, 'Mr. Climp' as name_teacher, 38 as id_teacher
),
students_distinct as (
  select distinct room_id, name_student as name, age_student as age from data
),
students_agg as (
  select room_id,array_agg(struct(name,age)) as student from students_distinct group by 1
),
teachers_distinct as (
  select distinct room_id, name_teacher as name, id_teacher as id from data
),
teachers_agg as (
  select room_id,array_agg(struct(name,id)) as teacher from teachers_distinct group by 1
)
select room_id, s.student, t.teacher
from students_agg s
inner join teachers_agg t using(room_id)
0 голосов
/ 04 февраля 2020

Я запускаю ваш запрос, добавляя distinct во все функции array_agg и работает нормально.

WITH data as (
 select '5a' as room_id, 'george' as name_student, 13 as age_student, 'Mr. Smith' as name_teacher, 43 as id_teacher
union all 
 select '5a' as room_id, 'george' as name_student, 13 as age_student, 'Mr. Climp' as name_teacher, 38 as id_teacher,
union all 
 select '5a' as room_id, 'jane' as name_student, 14 as age_student , 'Mr. Smith' as name_teacher, 43 as id_teacher
union all 
 select '5a' as room_id,  'jane' as name_student, 14 as age_student, 'Mr. Climp' as name_teacher, 38 as id_teacher
)
SELECT room_id, 
        struct(array_agg(distinct name_student) as name, array_agg(distinct  age_student) as age) as students,
        struct(array_agg(distinct name_teacher) as name, array_agg(distinct  id_teacher) as id) as teachers
from data
group by 1

Хотя я не уверен, что это будет работать правильно на реальном наборе данных, если вы пытаетесь иметь список учеников с их возрастом и список учителей с их удостоверениями личности. Например, добавление select '5a' as room_id, 'george' as name_student, 20 as age_student, 'Mr. Climp' as name_teacher, 38 as id_teacher, в таблицу данных показывает проблему, кортеж george, 20 потерян.

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...