Как преобразовать устаревший SQL BigQuery в стандартный SQL? - PullRequest
0 голосов
/ 27 сентября 2018

Я пытался преобразовать унаследованный код SQL BigQuery в стандартный SQL, но получаю массу ошибок.

Вот оригинальный Legacy SQL:

    SELECT t.page_path,
        t.second_page_path,
        t.third_page_path,
        t.fourth_page_path,
        CONCAT(t.page_path,IF(t.second_page_path IS NULL,"","-"),
        IFNULL(t.second_page_path,""),IF(t.third_page_path IS NULL,"","-"),
        IFNULL(t.third_page_path,""),IF(t.fourth_page_path IS NULL,"","-"),
        IFNULL(t.fourth_page_path,"")) AS full_page_journey,
        count(sessionId) AS total_sessions

FROM (

SELECT
     CONCAT(fullVisitorId,"-",STRING(visitStartTime)) AS sessionId,
     hits.hitNumber,
     hits.page.pagePath AS page_path,
     LEAD(hits.page.pagePath) OVER (PARTITION BY fullVisitorId, visitStartTime ORDER BY hits.hitNumber) AS second_page_path,
     LEAD(hits.page.pagePath,2) OVER (PARTITION BY fullVisitorId, visitStartTime ORDER BY hits.hitNumber) AS third_page_path,
     LEAD(hits.page.pagePath,3) OVER (PARTITION BY fullVisitorId, visitStartTime ORDER BY hits.hitNumber) AS fourth_page_path
   FROM
    TABLE_DATE_RANGE( [xxxxxxx:xxxxxxx.ga_sessions_],
TIMESTAMP('2017-01-01'), TIMESTAMP('2017-01-02') )
   WHERE
     hits.type="PAGE"

     ) t
     WHERE t.hits.hitNumber=1
     GROUP BY t.page_path,
              t.second_page_path,
              t.third_page_path,
              t.fourth_page_path,
              full_page_journey
     ORDER BY total_sessions DESC

ОБНОВЛЕНО (отредактировано): И вот что я смог сделать до сих пор:

    SELECT t.page_path,
        t.second_page_path,
        t.third_page_path,
        t.fourth_page_path,
        CONCAT(t.page_path,IF(t.second_page_path IS NULL,"","-"),
        IFNULL(t.second_page_path,""),IF(t.third_page_path IS NULL,"","-"),
        IFNULL(t.third_page_path,""),IF(t.fourth_page_path IS NULL,"","-"),
        IFNULL(t.fourth_page_path,"")) AS full_page_journey,
        count(sessionId) AS total_sessions

FROM (

SELECT
     CONCAT(fullVisitorId,"-",cast(visitStartTime as string)) AS sessionId,
     hits.hitNumber,
     hits.page.pagePath AS page_path,
     LEAD(hits.page.pagePath) OVER (PARTITION BY fullVisitorId, visitStartTime ORDER BY hits.hitNumber) AS second_page_path,
     LEAD(hits.page.pagePath,2) OVER (PARTITION BY fullVisitorId, visitStartTime ORDER BY hits.hitNumber) AS third_page_path,
     LEAD(hits.page.pagePath,3) OVER (PARTITION BY fullVisitorId, visitStartTime ORDER BY hits.hitNumber) AS fourth_page_path
   FROM
       `xxxxxxxxxxx.xxxxxxx.ga_sessions_*`,
        UNNEST(hits) AS hits
     WHERE
          _TABLE_SUFFIX BETWEEN 
          FORMAT_DATE('%Y%m%d', DATE_ADD(CURRENT_DATE(), INTERVAL -16 DAY))AND 
          FORMAT_DATE('%Y%m%d', DATE_ADD(CURRENT_DATE(), INTERVAL -1 DAY))AND
          hits.type = 'PAGE' ) AS t
          WHERE t.hits.hitNumber = 1
     GROUP BY t.page_path,
              t.second_page_path,
              t.third_page_path,
              t.fourth_page_path,
              full_page_journey
     ORDER BY total_sessions DESC

Этобудет здорово, если кто-то может помочь определить, что не так с синтаксисом.

Вот некоторые из полученных ошибок:

Невозможно получить доступ к значению hitNumber поля для значения с типом ARRAY

Проблемы с "_TABLE_SUFFIX", которые я прочитал, были связаны с подстановочным знаком.

1 Ответ

0 голосов
/ 27 сентября 2018

В качестве отправной точки DATE_ADD нужна дата, но вы даете ей временную метку, а _TABLE_SUFFIX нужна строка, но вы даете ей дату (вид).

Попробуйте использовать CURRENT_DATE ()и FORMAT_DATE для вашего существующего синтаксиса:

 FORMAT_DATE('%Y%m%d', DATE_ADD(CURRENT_DATE(), INTERVAL -16 DAY))

Этот вопрос может быть полезен для ошибки hitNumber:

query-hit-and-custom-sizes-in-the-bigquery

Попробуйте использовать CTE, а не подзапрос, поскольку это упрощает и облегчает отладку.

WITH CTE AS 
(SELECT
  CONCAT(fullVisitorId,"-",cast(visitStartTime as string)) AS sessionId,
  hits.hitNumber as hitNumber,
  hits.page.pagePath AS page_path,
  LEAD(hits.page.pagePath) OVER (PARTITION BY fullVisitorId, visitStartTime     
ORDER BY hits.hitNumber) AS second_page_path, 
LEAD(hits.page.pagePath,2) OVER (PARTITION BY fullVisitorId, visitStartTime
ORDER BY hits.hitNumber) AS third_page_path,
  LEAD(hits.page.pagePath,3) OVER (PARTITION BY fullVisitorId,
  visitStartTime ORDER BY hits.hitNumber) AS fourth_page_path
FROM
 `xxxxxxxxxxx.xxxxxxx.ga_sessions_*`,
  UNNEST(hits) AS hits
WHERE
  _TABLE_SUFFIX BETWEEN 
  FORMAT_DATE('%Y%m%d', DATE_ADD(CURRENT_DATE(), INTERVAL -16 DAY))AND 
  FORMAT_DATE('%Y%m%d', DATE_ADD(CURRENT_DATE(), INTERVAL -1 DAY))AND
  hits.type = 'PAGE' )

SELECT page_path,
  second_page_path,
  third_page_path,
  fourth_page_path,
  CONCAT(page_path,IF(second_page_path IS NULL,"","-"),
  IFNULL(second_page_path,""),IF(third_page_path IS NULL,"","-"),
  IFNULL(third_page_path,""),IF(fourth_page_path IS NULL,"","-"),
  IFNULL(fourth_page_path,"")) AS full_page_journey,
  count(sessionId) AS total_sessions
FROM CTE
WHERE hitNumber = 1
GROUP BY page_path,
    second_page_path,
    third_page_path,
    fourth_page_path,
    full_page_journey
ORDER BY total_sessions DESC
...