BigQuery GA Open Воронка Устаревший SQL: исключить сеансы, просмотревшие определенные страницы - PullRequest
0 голосов
/ 20 марта 2019

Я пытаюсь воссоздать воронку GA в BigQuery, эта открытая воронка исключает сеансы, которые просматривали определенные страницы, я пробовал использовать следующее: И НЕ REGEXP_MATCH, НЕ В, но он все еще не работает, как я ожидаю, явсе еще получаю сессии, которые просматривали страницы, которые я хочу исключить.

Я хочу также сделать эту воронку закрытой, если это возможно, этот код возвращает открытую воронку.

Кроме того, есть ли лучший способ написания этого запроса в стандартном SQL?

Нужна помощь с этим.Благодарю.

SELECT COUNT(s0.firstHit) AS _test_your_details,
SUM(s0.exit) AS _test_your_details_exits,
COUNT(s1.firstHit) AS _test_additional_new_details,
SUM(s1.exit) AS _test_additional_new_details_exits,
COUNT(s2.firstHit) AS _test_new_dress,
SUM(s2.exit) AS _test_new_dress_exits,
COUNT(s3.firstHit) AS _test_test_details,
SUM(s3.exit) AS _test_test_details_exits,
COUNT(s4.firstHit) AS _test_cover_for_the_test,
SUM(s4.exit) AS _test_cover_for_the_test_exits,
COUNT(s5.firstHit) AS _test_your_order,
SUM(s5.exit) AS  _test_your_order_exits
FROM
  (SELECT s0.fullVisitorId,
          s0.visitId,
          s0.firstHit,
          s0.exit,
          s1.firstHit,
          s1.exit,
          s2.firstHit,
          s2.exit,
          s3.firstHit,
          s3.exit,
          s4.firstHit,
          s4.exit,
          s5.firstHit,
          s5.exit
   FROM
     (SELECT s0.fullVisitorId,
             s0.visitId,
             s0.firstHit,
             s0.exit,
             s1.firstHit,
             s1.exit,
             s2.firstHit,
             s2.exit,
             s3.firstHit,
             s3.exit,
             s4.firstHit,
             s4.exit
      FROM
        (SELECT s0.fullVisitorId,
                s0.visitId,
                s0.firstHit,
                s0.exit,
                s1.firstHit,
                s1.exit,
                s2.firstHit,
                s2.exit,
                s3.firstHit,
                s3.exit
         FROM
           (SELECT s0.fullVisitorId,
                   s0.visitId,
                   s0.firstHit,
                   s0.exit,
                   s1.firstHit,
                   s1.exit,
                   s2.firstHit,
                   s2.exit
            FROM
              (SELECT s0.fullVisitorId,
                      s0.visitId,
                      s0.firstHit,
                      s0.exit,
                      s1.firstHit,
                      s1.exit
               FROM
                 (SELECT fullVisitorId,
                         visitId,
                         MIN(hits.hitNumber) AS firstHit,
                         MAX(IF(hits.isExit, 1, 0)) AS exit
                  FROM TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2018-11-01'), TIMESTAMP('2018-11-30'))
                  WHERE REGEXP_MATCH(hits.page.pagePath, '/test - your details')
                    AND totals.visits = 1
                    AND channelGrouping NOT LIKE '%organic%'
                   AND hits.page.pagePath NOT in ('/test - additional test details', '/test - test dress', '/test - cover dress')
                   AND NOT  REGEXP_MATCH(hits.page.pagePath, r"^/(test - additional test details|test - test dress|test - cover dress)")
                  GROUP BY fullVisitorId,
                           visitId) s0
               FULL OUTER JOIN EACH
                 (SELECT fullVisitorId,
                         visitId,
                         MIN(hits.hitNumber) AS firstHit,
                         MAX(IF(hits.isExit, 1, 0)) AS exit
                  FROM TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2018-11-01'), TIMESTAMP('2018-11-30'))
                  WHERE REGEXP_MATCH(hits.page.pagePath, '/test - additional new details')
                    AND totals.visits = 1
                    AND channelGrouping NOT LIKE '%organic%'
                  GROUP BY fullVisitorId,
                           visitId) s1 ON s0.fullVisitorId = s1.fullVisitorId
               AND s0.visitId = s1.visitId) s01
            FULL OUTER JOIN EACH
              (SELECT fullVisitorId,
                      visitId,
                      MIN(hits.hitNumber) AS firstHit,
                      MAX(IF(hits.isExit, 1, 0)) AS exit
               FROM TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2018-11-01'), TIMESTAMP('2018-11-30'))
               WHERE REGEXP_MATCH(hits.page.pagePath, '/test - new dress')
                 AND totals.visits = 1
                 AND channelGrouping NOT LIKE '%organic%'
               GROUP BY fullVisitorId,
                        visitId) s2 ON s0.fullVisitorId = s2.fullVisitorId
            AND s0.visitId = s2.visitId) s012
         FULL OUTER JOIN EACH
           (SELECT fullVisitorId,
                   visitId,
                   MIN(hits.hitNumber) AS firstHit,
                   MAX(IF(hits.isExit, 1, 0)) AS exit
            FROM TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2018-11-01'), TIMESTAMP('2018-11-30'))
            WHERE REGEXP_MATCH(hits.page.pagePath, '/test - test details')
              AND totals.visits = 1
              AND channelGrouping NOT LIKE '%organic%'
            GROUP BY fullVisitorId,
                     visitId) s3 ON s0.fullVisitorId = s3.fullVisitorId
         AND s0.visitId = s3.visitId) s0123
      FULL OUTER JOIN EACH
        (SELECT fullVisitorId,
                visitId,
                MIN(hits.hitNumber) AS firstHit,
                MAX(IF(hits.isExit, 1, 0)) AS exit
         FROM TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2018-11-01'), TIMESTAMP('2018-11-30'))
         WHERE REGEXP_MATCH(hits.page.pagePath, '/test - cover for the test')
           AND totals.visits = 1
          AND channelGrouping NOT LIKE '%organic%'
          AND hits.page.pagePath NOT in ('/test - additional test details', '/test - test dress')
         GROUP BY fullVisitorId,
                  visitId) s4 ON s0.fullVisitorId = s4.fullVisitorId
      AND s0.visitId = s4.visitId) s01234
   FULL OUTER JOIN EACH
     (SELECT fullVisitorId,
             visitId,
             MIN(hits.hitNumber) AS firstHit,
             MAX(IF(hits.isExit, 1, 0)) AS exit
      FROM TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2018-11-01'), TIMESTAMP('2018-11-30'))
      WHERE REGEXP_MATCH(hits.page.pagePath, '/test - your order')
        AND totals.visits = 1
        AND channelGrouping NOT LIKE '%organic%'
        AND hits.page.pagePath NOT in ('/test - additional test details', '/test - test dress')
         AND NOT  REGEXP_MATCH(hits.page.pagePath, r"^/(test - additional test details|test - test dress|test - cover dress)")
      GROUP BY fullVisitorId,
               visitId) s5 ON s0.fullVisitorId = s5.fullVisitorId
   AND s0.visitId = s5.visitId) s012345

1 Ответ

1 голос
/ 20 марта 2019

В стандартном SQL вы можете написать простой подзапрос для hits для проверки.Например:

SELECT 
  fullvisitorid, visitstarttime,
  ARRAY(
    SELECT AS STRUCT hitNumber, type, page FROM t.hits ORDER BY hitNumber
  ) hits
FROM
    `bigquery-public-data.google_analytics_sample.ga_sessions_20161104` t
WHERE 
  -- exclude sessions with pages containing '/asearch.html'
  -- subquery checks for occurences in the whole query and returns boolean TRUE if found 
  -- NOT turns it into FALSE which filters it out
  NOT (SELECT COUNT(1)>0 FROM t.hits WHERE page.pagePath = '/asearch.html')
ORDER BY array_length(hits) DESC
LIMIT 1000

Я также написал подзапрос, чтобы отобразить попадания сеансов в массив.В устаревшем SQL вы бы использовали OMIT RECORD IF:

SELECT 
  fullvisitorid, visitstarttime, hits.page.pagePath
FROM
    [bigquery-public-data:google_analytics_sample.ga_sessions_20161104] t
-- OMIT RECORD IF excludes on record level 
-- if dimension is below record level, you need to aggregate (like with WITHIN)
-- in this case I used MAX() to surface any possible TRUE resulting from the comparison
OMIT RECORD IF MAX(hits.page.pagePath = '/asearch.html')
LIMIT 1000

Надеюсь, это поможет!

...