Упорядочение одного столбца массива относительно другого столбца массива в BigQuery - PullRequest
0 голосов
/ 01 октября 2018

У меня есть таблица ниже в Bigquery -

WITH results AS
  (SELECT 1 as customerid, ["apples", "bananas", "grapes","orange"] as fruit_array, [0.1,0.4,0.3,0.2] as probability
  UNION ALL
  SELECT 2 as customerid, ["apples", "bananas", "grapes","orange"] as fruit_array, [0.2,0.1,0.6,0.1] as probability
  UNION ALL
  SELECT 3 as customerid, ["apples", "bananas", "grapes","orange"] as fruit_array, [0.5,0.05,0.35,0.1] as probability
  )
 select * from results

Здесь каждый покупатель имеет определенную вероятность покупки фрукта.Я хотел бы подобрать top 2 фруктов для каждого покупателя и соответствующие им probabilities покупки.

Было бы неплохо, чтобы результат был похож на что-то вроде этого -

customerid, fruits, probability
1, bananas, 0.4
1, grapes, 0.3
..

В приведенном выше конечном результате, для customerid 1 я выбираю только bananas и grapes, потому что эти 2 плода имеют наибольшую вероятность покупки (из [0.1,0.4,0.3,0.2])

Есть ли какая-либо функция, которую я могу использоватьв BiqQuery этого добиться?

1 Ответ

0 голосов
/ 01 октября 2018

Ниже для BigQuery Standard SQL

#standardSQL
WITH results AS (
  SELECT 1 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.1,0.4,0.3,0.2] AS probability   UNION ALL
  SELECT 2 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.2,0.1,0.6,0.1] AS probability   UNION ALL
  SELECT 3 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.5,0.05,0.35,0.1] AS probability
)
SELECT customerid, fruit, probability
FROM (
  SELECT customerid, ARRAY_AGG(STRUCT(fruit, probability) ORDER BY probability DESC LIMIT 2) top
  FROM results, 
    UNNEST(probability) probability WITH OFFSET off1
    JOIN UNNEST(fruit_array) fruit WITH OFFSET off2
    ON off1 = off2
  GROUP BY customerid
), UNNEST(top)  

с результатом

Row customerid  fruit   probability  
1   1           bananas 0.4  
2   1           grapes  0.3  
3   2           grapes  0.6  
4   2           apples  0.2  
5   3           apples  0.5  
6   3           grapes  0.35     

или может быть немного лучше вариант

#standardSQL
WITH results AS (
  SELECT 1 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.1,0.4,0.3,0.2] AS probability   UNION ALL
  SELECT 2 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.2,0.1,0.6,0.1] AS probability   UNION ALL
  SELECT 3 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.5,0.05,0.35,0.1] AS probability
)
SELECT customerid, fruit, probability
FROM (
  SELECT customerid, 
    (
      SELECT ARRAY_AGG(STRUCT(fruit, probability) ORDER BY probability DESC LIMIT 2) 
      FROM   UNNEST(probability) probability WITH OFFSET off1
      JOIN UNNEST(fruit_array) fruit WITH OFFSET off2
      ON off1 = off2
    ) top
  FROM results
), UNNEST(top)

с тем же результатом

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...