У меня есть более сложная проблема, которую я пытался выделить из ее компонентов.
У меня есть простой запрос, который возвращает все индивидуальные электронные письма клиентов (так что каждый отдельный клиент)
Select distinct
CustomerEmail
FROM [JMNYC-AMTDB].[AMTPLUS].[dbo].Invoices I (nolock) --I don't think the tables are relevant to the problem.
LEFT JOIN (SELECT
ID.Company_Code
,ID.Division_Code
,ID.Invoice_Number
,SUM (ID.Price* ID.Quantity) Total
FROM [JMNYC-AMTDB].[AMTPLUS].[dbo].Invoices_Detail ID (nolock)
GROUP BY ID.Company_Code, ID.Division_Code, ID.Invoice_Number) ID
ON I.Company_Code = ID.Company_Code
AND I.Division_Code = ID.Division_Code
AND I.Invoice_Number = ID.Invoice_Number
LEFT JOIN
[JMDNJ-ACCELSQL].[A1WAREHOUSE].[dbo].SHIPHIST SH (nolock) ON I.Pickticket_Number = SH.Packslip
LEFT JOIN
[JMDNJ-ACCELSQL].[A1WAREHOUSE].[dbo].[SpraygroundMagentoCustomerEmailData] S on SH.CUST_PO = S.InvoiceNumber
Where I.Company_Code ='09' AND I.Division_Code = '001'
AND I.Customer_Number = 'ECOM2X'
AND ISNUMERIC(SH.CUST_PO) <> 0
AND I.Date_Created BETWEEN DATEADD(month, -0, '6/1/2016') AND '1/1/2017' -- Orders Base default is 12 months, options are 6,12, 18, and 24
Возвращает 19 516 строк.
Если, однако, я добавлю второй простой оператор выбора к своему запросу,
Select distinct
Month(I.Date_Created) Month,
CustomerEmail
FROM [JMNYC-AMTDB].[AMTPLUS].[dbo].Invoices I (nolock)...
Теперь возвращает 20 452 строки.
Записав этот вопрос, мне кажется, я понял проблему. Он будет дублировать письма за разные месяцы. Поэтому, если клиент разместил заказ в июне и июле, его электронное письмо будет отображаться дважды: один раз на 6-й месяц и один раз на 7-й.
Так что это число должно быть более правильным, чем число 19,516, верно?
То, как я вычисляю число TotalCustomers позже в моем более сложном запросе, - это простое утверждение Dense Rank
,DENSE_RANK() over (order by CustomerEmail asc)
+DENSE_RANK() over (order by CustomerEmail desc)
- 1 as TotalCustomersOverRange
Это возвращает мне 19 516, потому что не учитывает несколько покупок. Но это также технически правильно, потому что в этом диапазоне дат меньше уникальных клиентов. Только когда вы разбиваете по месяцам, вы получаете двойных клиентов, которые на самом деле такие же.
Как лучше всего это исправить? Вот мой полный запрос:
--Calculate average amount of time between purchase
--Calculate percentage of quantity and total increase with each purchase.
--Return most valued customers.
--User defined base range
-- later on, more refined user defined customer base, so if the base range is 18 months and the customer range is 1 month, it will only check the data against customers that purchased orders within the last month.
-- over the customer range, we define who the customers are. We call this RANGE
-- over the orderes base range, we define and how many times they ordered. We call this BASE.
-- First we filter by month, returning total new orders and total recurring orders
-- (FOR OTHER REPORT, filter by state and not month)
-- Then within the month, we drill down to calculate how many customers are one orders, two orders, three orders, etc total
-- For each order amount, we calculate average days between orders, total value, lifetime value, and quantity changes
SELECT DISTINCT --*
Month
,(DENSE_RANK() over (partition by Month order by CustomerEmail asc)
+DENSE_RANK() over (partition by Month order by CustomerEmail desc))
-1 as TotalCustomersThisMonth
,Sum(Case When AmountOrdersOverRangeByCustomer = 1 and RangeOrderNumber = 1 then 1 else 0 end) over (partition by Month) NewCustomersOverRangeThisMonth --Some of those customers aren't really new, if we expand to the base.
,Sum(Case When AmountOrdersOverRangeByCustomer = 1 and AmountOrdersOverBaseByCustomer = 1 then 1 else 0 end) over (partition by Month) NewCustomersOverBaseThisMonth
,Sum(Case When AmountOrdersOverRangeByCustomer = 1 and AmountOrdersOverBaseByCustomer > 1 then 1 else 0 end) over (partition by Month) RecurringCustomersOverBaseButNewInRangeThisMonth -- Customers in Base who are not in range.
,Sum(Case When AmountOrdersOverRangeByCustomer > 1 and RangeOrderNumber =1 then 1 else 0 end) over (partition by Month) RecurringCustomerOverRangeThisMonth
,TTT.NewCustomersOverRange
,TTT.NewCustomersOverBase
,TTT.RecurringCustomersOverBaseButNewInRange
,TTT.RecurringCustomerOverRange
,TTT.TotalCustomersOverBase
,TTT.TotalCustomersOverRange
FROM --This table calculates new and recurring customers.
(
SELECT
*
,Sum(Case When AmountOrdersOverRangeByCustomer = 1 and RangeOrderNumber = 1 then 1 else 0 end) over () NewCustomersOverRange --Some of those customers aren't really new, if we expand to the base.
,Sum(Case When AmountOrdersOverRangeByCustomer = 1 and AmountOrdersOverBaseByCustomer = 1 then 1 else 0 end) over () NewCustomersOverBase
,Sum(Case When AmountOrdersOverRangeByCustomer = 1 and AmountOrdersOverBaseByCustomer > 1 then 1 else 0 end) over () RecurringCustomersOverBaseButNewInRange -- Customers in Base who are not in range
,Sum(Case When AmountOrdersOverRangeByCustomer > 1 and RangeOrderNumber =1 then 1 else 0 end) over () RecurringCustomerOverRange
FROM -- This table gives you Order Numbers Per Customer
(
SELECT
*
,ROW_NUMBER() over (partition by CustomerEmail order by Date_Created asc) RangeOrderNumber
,(DENSE_RANK() over (partition by CustomerEmail order by Date_Created asc)
+DENSE_RANK() over (partition by CustomerEmail order by Date_Created desc))
-1 as AmountOrdersOverRangeByCustomer
,Max(BaseOrderNumber) over (partition by CustomerEmail) AmountOrdersOverBaseByCustomer
,DENSE_RANK() over (order by CustomerEmail asc)
+DENSE_RANK() over (order by CustomerEmail desc)
- 1 as TotalCustomersOverRange
FROM --This table gives you a line by line basis of every order
(
Select
I.Date_Created
,I.Company_Code
,I.Division_Code
,I.Invoice_Number
,Sh.CUST_PO
,I.Total_Quantity
,ID.Total
,SH.Ship_City City
,CASE WHEN SH.Ship_Cntry <> 'US' THEN 'INT' ELSE SH.Ship_prov END State
,SH.Ship_Zip Zip
,SH.Ship_Cntry Country
,Month(I.Date_Created) Month
,S.CustomerEmail
,Count(*) over (partition by CustomerEmail order by Date_Created asc) BaseOrderNumber
,dense_rank() over (order by CustomerEmail)
+ dense_rank() over (order by CustomerEmail desc)
- 1 as TotalCustomersOverBase
--,Count(Distinct CustomerEmail) over () as TotalCustomersOverBase
--,ROW_NUMBER() over (partition by S.CustomerEmail order by Date_Created asc) PurchaseCount --this goes somewhere else
--,(DENSE_RANK() over (partition by Month(I.Date_Created) order by CustomerEmail asc)
--+DENSE_RANK() over (partition by Month(I.Date_Created) order by CustomerEmail desc))
---1 as TotalCustomersThisMonth
FROM [JMNYC-AMTDB].[AMTPLUS].[dbo].Invoices I (nolock)
LEFT JOIN (SELECT
ID.Company_Code
,ID.Division_Code
,ID.Invoice_Number
,SUM (ID.Price* ID.Quantity) Total
FROM [JMNYC-AMTDB].[AMTPLUS].[dbo].Invoices_Detail ID (nolock)
GROUP BY ID.Company_Code, ID.Division_Code, ID.Invoice_Number) ID
ON I.Company_Code = ID.Company_Code
AND I.Division_Code = ID.Division_Code
AND I.Invoice_Number = ID.Invoice_Number
LEFT JOIN
[JMDNJ-ACCELSQL].[A1WAREHOUSE].[dbo].SHIPHIST SH (nolock) ON I.Pickticket_Number = SH.Packslip
LEFT JOIN
[JMDNJ-ACCELSQL].[A1WAREHOUSE].[dbo].[SpraygroundMagentoCustomerEmailData] S on SH.CUST_PO = S.InvoiceNumber
Where I.Company_Code ='09' AND I.Division_Code = '001'
AND I.Customer_Number = 'ECOM2X'
AND ISNUMERIC(SH.CUST_PO) <> 0
AND I.Date_Created BETWEEN DATEADD(month, -12, '1/1/2017') AND '12/31/2016' -- Orders Base default is 12 months, options are 6,12, 18, and 24
--AND CustomerEmail is NULL
)T
Where T.Date_Created BETWEEN '6/1/2016' AND '1/1/2017'-- Customer Range
)TT
--
--Order By CustomerEmail, RangeOrderNumber asc
--
)TTT
--Order By Date_Created desc
--Order By CustomerEmail, RangeOrderNumber asc
Order By Month