как сократить время выполнения запроса BigQuery - PullRequest
0 голосов
/ 18 ноября 2018

У меня есть две таблицы:

1. PEOPLE (PK, Name, Address, Zip, <<some random other columns>>)
2. EMAIL  (PK, Name, Address, Zip, Email)

Это таблица «один ко многим», в которой они связаны по имени, адресу и почтовому индексу.

Что мне нужно, это:

PEOPLE (PK, Name, Address, Zip, <<some random other columns>>, FK_Email1, Email1, FK_Email2, Email2, FK_Email3, Email3)

Пока у меня есть вот что:

#standardSQL
SELECT a.PK, a.FK, Source, FirstName, LastName, MiddleName, SuffixName, Gender, Age, DOB, Address, Address2, City, State, Zip, Zip4, Cleaned_HouseNumber, Cleaned_Street, Cleaned_City, Cleaned_County, Cleaned_State, Cleaned_Zip, TimeZone, Income, HomeValue, Networth, MaritalStatus, IsRenter, HasChildren, CreditRating, Investor, LinesOfCredit, InvestorRealEstate, Traveler, Pets, MailResponder, Charitable, PolicalDonations, PoliticalParty, ATTOM_ID, GEOID, SCORE, Latitude, Longitude, SpouseFirstName, SpouseLastName, HomeAvailableHomeEquity, HomeTotalLoans, HomeLoan1Amount, HomeLoan2Amount, HomeValueRangeCode, HomeValueRangeText, HomeMarketValue, HomeAssessedValue, HomeLoanToValue, HomeSQFT, HomeLotSQFT, HomeYearBuilt, HomePurchaseDate, HomeLoan1Date, HomeLoan2Date, HomeParcelNumber, HomePropertyType, DNC, HomeCompanyOwned, HomeTrustOwned, HomeOwnerOccupied, HomeType, HomePool, HomeGarage, HomeHeating, HomeCooling, HomeBedrooms, HomeBathrooms, HomeNumberOfUnits, MailingAddress, MailingCity, MailingState, MailingZip, MailingZip4, Married, Divorce, Education, Occupation, Ethnicity, LANGUAGE, RELIGION,
  FK_Email[SAFE_ORDINAL(1)] FK_Email1, Emails[SAFE_ORDINAL(1)] Email1, FK_Email[SAFE_ORDINAL(2)] as FK_Email2, Emails[SAFE_ORDINAL(2)] Email2, FK_Email[SAFE_ORDINAL(3)] as FK_Email3, Emails[SAFE_ORDINAL(3)] Email3
FROM (
  SELECT
    P.PK, P.FK, P.Source, P.FirstName, P.LastName, MiddleName, SuffixName, Gender, Age, DOB, P.Address, Address2, P.City, P.State, P.Zip, Zip4, Cleaned_HouseNumber, Cleaned_Street, Cleaned_City, Cleaned_County, Cleaned_State, Cleaned_Zip, TimeZone, Income, HomeValue, Networth, MaritalStatus, IsRenter, HasChildren, CreditRating, Investor, LinesOfCredit, InvestorRealEstate, Traveler, Pets, MailResponder, Charitable, PolicalDonations, PoliticalParty, ATTOM_ID, GEOID, SCORE, Latitude, Longitude, SpouseFirstName, SpouseLastName, HomeAvailableHomeEquity, HomeTotalLoans, HomeLoan1Amount, HomeLoan2Amount, HomeValueRangeCode, HomeValueRangeText, HomeMarketValue, HomeAssessedValue, HomeLoanToValue, HomeSQFT, HomeLotSQFT, HomeYearBuilt, HomePurchaseDate, HomeLoan1Date, HomeLoan2Date, HomeParcelNumber, HomePropertyType, DNC, HomeCompanyOwned, HomeTrustOwned, HomeOwnerOccupied, HomeType, HomePool, HomeGarage, HomeHeating, HomeCooling, HomeBedrooms, HomeBathrooms, HomeNumberOfUnits, MailingAddress, MailingCity, MailingState, MailingZip, MailingZip4, Married, Divorce, Education, Occupation, Ethnicity, LANGUAGE, RELIGION
    , ARRAY_AGG(E.Email) Emails, ARRAY_AGG(E.PK) FK_Email
  FROM `db.ds.table1` P
  left JOIN `db.ds.table2`  E
  ON P.FirstName = E.FirstName
  AND P.LastName = E.LastName
  AND P.Address = E.Address
  AND P.Zip = E.Zip
Group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87
) a

Моя проблема в том, что это уже прошло шесть временных ограничений.Есть ли способ сделать это быстрее?

Спасибо!

1 Ответ

0 голосов
/ 18 ноября 2018

Я чувствую, что ниже делает то же самое, но более оптимизированным способом

#standardSQL
SELECT 
  PK, FK, Source, P.FirstName, P.LastName, MiddleName, SuffixName, Gender, Age, DOB, P.Address, Address2, City, State, P.Zip, Zip4, Cleaned_HouseNumber, Cleaned_Street, Cleaned_City, Cleaned_County, Cleaned_State, Cleaned_Zip, TimeZone, Income, HomeValue, Networth, MaritalStatus, IsRenter, HasChildren, CreditRating, Investor, LinesOfCredit, InvestorRealEstate, Traveler, Pets, MailResponder, Charitable, PolicalDonations, PoliticalParty, ATTOM_ID, GEOID, SCORE, Latitude, Longitude, SpouseFirstName, SpouseLastName, HomeAvailableHomeEquity, HomeTotalLoans, HomeLoan1Amount, HomeLoan2Amount, HomeValueRangeCode, HomeValueRangeText, HomeMarketValue, HomeAssessedValue, HomeLoanToValue, HomeSQFT, HomeLotSQFT, HomeYearBuilt, HomePurchaseDate, HomeLoan1Date, HomeLoan2Date, HomeParcelNumber, HomePropertyType, DNC, HomeCompanyOwned, HomeTrustOwned, HomeOwnerOccupied, HomeType, HomePool, HomeGarage, HomeHeating, HomeCooling, HomeBedrooms, HomeBathrooms, HomeNumberOfUnits, MailingAddress, MailingCity, MailingState, MailingZip, MailingZip4, Married, Divorce, Education, Occupation, Ethnicity, LANGUAGE, RELIGION,
  FK_Email[SAFE_ORDINAL(1)] FK_Email1, Emails[SAFE_ORDINAL(1)] Email1, FK_Email[SAFE_ORDINAL(2)] AS FK_Email2, Emails[SAFE_ORDINAL(2)] Email2, FK_Email[SAFE_ORDINAL(3)] AS FK_Email3, Emails[SAFE_ORDINAL(3)] Email3
FROM `db.ds.table1` P
LEFT JOIN (
  SELECT FirstName, LastName, Address, Zip, 
    ARRAY_AGG(Email LIMIT 3) Emails, ARRAY_AGG(PK LIMIT 3) FK_Email
  FROM `db.ds.table2`
  GROUP BY FirstName, LastName, Address, Zip
) E
ON P.FirstName = E.FirstName
AND P.LastName = E.LastName
AND P.Address = E.Address
AND P.Zip = E.Zip
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...