Я хотел бы получить отзыв о том, как я подошел к этому упражнению, с точки зрения логики c и кода (и ожидаемых результатов).
Рассмотрим этот пример: https://www.sqlitetutorial.net/sqlite-sample-database/
, то есть база данных песен и покупок и др. c: здесь схема: https://www.sqlitetutorial.net/wp-content/uploads/2018/03/sqlite-sample-database-diagram-color.pdf
И вопрос:
Является ли количество раз, когда трек появляется в каком-либо плейлисте, хорошим показателем продаж?
Я ожидаю, что чем больше песня появится в плейлисте, тем больше будет продажи. Поэтому я подумал, что давайте вычислим корреляцию Пирсона.
Я структурировал свой код следующим образом:
with freqPopularity as (
select playlist_track.TrackId, count(*) as TrackPopularity
from playlist_track
group by playlist_track.TrackId
),
freqSales as (
select invoice_items.TrackId, count(*) as SalesPopularity
from invoice_items
group by invoice_items.TrackId
),
observations as (
select
freqPopularity.TrackId,
tracks.Name,
freqPopularity.TrackPopularity as Popularity,
freqSales.SalesPopularity as SalesFrequency
from freqPopularity
join freqSales
on freqSales.TrackId = freqPopularity.TrackId
join tracks
on tracks.TrackId = freqSales.TrackId
),
--- compute Pearson
--- compute CoVariance X, Y and Standard Deviations X, Y
dev as (
select
observations.TrackId,
observations.Popularity as X,
(select Avg(observations.Popularity) from observations) as Xm,
observations.SalesFrequency as Y,
(select Avg(observations.SalesFrequency) from observations) as Ym,
(select Count(*) from observations) as n
from observations
)
select
/*
sum( (dev.X - dev.Xm) * (dev.Y - dev.Ym) ) / (dev.n ) as COV,
sum((dev.X - dev.Xm) * (dev.X - dev.Xm) ) / (dev.n) as STD_X,
sum((dev.Y - dev.Ym) * (dev.Y - dev.Ym) ) / (dev.n) as STD_Y,
sum( (dev.X - dev.Xm) * (dev.Y - dev.Ym) ) / (dev.n ) /
sum((dev.X - dev.Xm) * (dev.X - dev.Xm) ) / (dev.n)
* sum((dev.Y - dev.Ym) * (dev.Y - dev.Ym) ) / (dev.n) as PEARSON
*/
1/(dev.n) * sum( dev.X * dev.Y) - sum(dev.X)*sum(dev.Y) as NOM,
dev.n * sum(dev.X * dev.X) - sum(dev.X * dev.X) * sum(dev.X * dev.X) as DEN_1,
dev.n * sum(dev.Y * dev.Y) - sum(dev.Y * dev.Y) * sum(dev.Y * dev.Y) as DEN_2
-- code in SQLITE, which does not support SRQT nor POW()
-- I will just report the numerator and denominator of the function,
-- and the use a calculator.
-- would give> - 0.63 ??
from dev;
Результат, который я получил, - это отрицательные линейные корреляции, для которых я подозреваю, что сделал что-то не так. ... это не имеет смысла.
Не могли бы вы просмотреть код?
Обратная связь в качестве ясности и логики c.
Если вам нравится чтобы проверить число, я скопирую вставку под таблицей, в которой сообщается, сколько раз песня воспроизводилась в любом списке воспроизведения (popularity
) и сколько раз была куплена песня (Sales
).
"TrackId" "Name" "Popularity" "Sales"
"1" "For Those About To Rock (We Salute You)" "3" "1"
"2" "Balls to the Wall" "3" "2"
"3" "Fast As a Shark" "4" "1"
"4" "Restless and Wild" "4" "1"
"5" "Princess of the Dawn" "4" "1"
"6" "Put The Finger On You" "2" "1"
"8" "Inject The Venom" "2" "2"
"9" "Snowballed" "2" "2"
"10" "Evil Walks" "2" "1"
"12" "Breaking The Rules" "2" "1"
"13" "Night Of The Long Knives" "2" "1"
"14" "Spellbound" "2" "1"
"15" "Go Down" "2" "1"
"16" "Dog Eat Dog" "2" "1"
"19" "Problem Child" "2" "1"
"20" "Overdose" "2" "2"
"21" "Hell Ain't A Bad Place To Be" "2" "1"
"24" "Love In An Elevator" "3" "1"
"25" "Rag Doll" "3" "1"
"26" "What It Takes" "3" "1"
"28" "Janie's Got A Gun" "3" "1"
"30" "Amazing" "3" "1"
"31" "Blind Man" "3" "1"
"32" "Deuces Are Wild" "3" "2"
"36" "Angel" "3" "1"
"37" "Livin' On The Edge" "3" "1"
"38" "All I Really Want" "3" "1"
"39" "You Oughta Know" "3" "1"
"42" "Right Through You" "3" "1"
"43" "Forgiven" "3" "1"
"44" "You Learn" "3" "1"
"48" "Not The Doctor" "3" "2"
"49" "Wake Up" "3" "1"
"53" "Sea Of Sorrow" "3" "1"
"54" "Bleed The Freak" "3" "1"
"55" "I Can't Remember" "3" "1"
"57" "It Ain't Like That" "3" "1"
"60" "Confusion" "3" "1"
"61" "I Know Somethin (Bout You)" "3" "1"
"62" "Real Thing" "3" "1"
"66" "Por Causa De Você" "2" "2"
"67" "Ligia" "2" "1"
"71" "Falando De Amor" "2" "1"
"72" "Angela" "2" "1"
"75" "O Boto (Bôto)" "2" "1"
"76" "Canta, Canta Mais" "2" "1"
"78" "Master Of Puppets" "3" "1"
"80" "The Unforgiven" "3" "1"
"84" "Welcome Home (Sanitarium)" "3" "2"
"85" "Cochise" "2" "1"
"89" "Like a Stone" "2" "1"
"90" "Set It Off" "2" "1"
"93" "Exploder" "2" "1"
"94" "Hypnotize" "2" "1"
"98" "The Last Remaining Light" "2" "1"
"99" "Your Time Has Come" "2" "1"
"102" "Doesn't Remind Me" "2" "1"
"103" "Drown Me Slowly" "2" "1"
"107" "Yesterday To Tomorrow" "2" "1"
"108" "Dandelion" "2" "1"
"111" "Money" "3" "1"
"112" "Long Tall Sally" "3" "1"
"116" "C'Mon Everybody" "3" "1"
"117" "Rock 'N' Roll Music" "3" "1"
"120" "Carol" "3" "1"
"121" "Good Golly Miss Molly" "3" "1"
"125" "Spanish moss-""A sound portrait""-Spanish moss" "2" "1"