Корреляция Пирсона в SQLITE - PullRequest
0 голосов
/ 17 февраля 2020

Я хотел бы получить отзыв о том, как я подошел к этому упражнению, с точки зрения логики c и кода (и ожидаемых результатов).

Рассмотрим этот пример: https://www.sqlitetutorial.net/sqlite-sample-database/

, то есть база данных песен и покупок и др. c: здесь схема: https://www.sqlitetutorial.net/wp-content/uploads/2018/03/sqlite-sample-database-diagram-color.pdf

И вопрос:

Является ли количество раз, когда трек появляется в каком-либо плейлисте, хорошим показателем продаж?

Я ожидаю, что чем больше песня появится в плейлисте, тем больше будет продажи. Поэтому я подумал, что давайте вычислим корреляцию Пирсона.

Я структурировал свой код следующим образом:


with freqPopularity as (
        select playlist_track.TrackId, count(*) as TrackPopularity  
        from playlist_track
        group by playlist_track.TrackId
    ),
freqSales as (
    select invoice_items.TrackId, count(*) as SalesPopularity  
        from invoice_items
        group by invoice_items.TrackId
),
observations as (
    select 
        freqPopularity.TrackId, 
        tracks.Name,
        freqPopularity.TrackPopularity as Popularity,
        freqSales.SalesPopularity as SalesFrequency
    from freqPopularity
    join freqSales
    on freqSales.TrackId  = freqPopularity.TrackId
    join tracks
    on tracks.TrackId = freqSales.TrackId
),

--- compute Pearson
--- compute CoVariance X, Y and Standard Deviations X, Y

dev as (
select 
        observations.TrackId,
        observations.Popularity as X,
        (select Avg(observations.Popularity)  from observations) as Xm,
        observations.SalesFrequency as Y,
        (select Avg(observations.SalesFrequency) from observations) as Ym,
        (select Count(*) from observations) as n
        from observations 
)

select 
   /*
    sum( (dev.X - dev.Xm) * (dev.Y - dev.Ym) ) / (dev.n ) as COV,
    sum((dev.X - dev.Xm) * (dev.X - dev.Xm) ) / (dev.n) as STD_X,
    sum((dev.Y - dev.Ym) * (dev.Y - dev.Ym) ) / (dev.n) as STD_Y,

    sum( (dev.X - dev.Xm) * (dev.Y - dev.Ym) ) / (dev.n ) / 
        sum((dev.X - dev.Xm) * (dev.X - dev.Xm) ) / (dev.n)
        * sum((dev.Y - dev.Ym) * (dev.Y - dev.Ym) ) / (dev.n) as PEARSON
    */

    1/(dev.n) * sum(   dev.X  * dev.Y) - sum(dev.X)*sum(dev.Y) as NOM,
    dev.n * sum(dev.X * dev.X) - sum(dev.X * dev.X) * sum(dev.X * dev.X) as DEN_1,
    dev.n * sum(dev.Y * dev.Y) - sum(dev.Y * dev.Y) * sum(dev.Y * dev.Y) as DEN_2

    -- code in SQLITE, which does not support SRQT nor POW()
    -- I will just report the numerator and denominator of the function, 
    -- and the use a calculator. 
    -- would give>  - 0.63  ??

from dev;

Результат, который я получил, - это отрицательные линейные корреляции, для которых я подозреваю, что сделал что-то не так. ... это не имеет смысла.

Не могли бы вы просмотреть код?

Обратная связь в качестве ясности и логики c.

Если вам нравится чтобы проверить число, я скопирую вставку под таблицей, в которой сообщается, сколько раз песня воспроизводилась в любом списке воспроизведения (popularity) и сколько раз была куплена песня (Sales).

"TrackId" "Name" "Popularity" "Sales"

"1" "For Those About To Rock (We Salute You)"   "3" "1"
"2" "Balls to the Wall" "3" "2"
"3" "Fast As a Shark"   "4" "1"
"4" "Restless and Wild" "4" "1"
"5" "Princess of the Dawn"  "4" "1"
"6" "Put The Finger On You" "2" "1"
"8" "Inject The Venom"  "2" "2"
"9" "Snowballed"    "2" "2"
"10"    "Evil Walks"    "2" "1"
"12"    "Breaking The Rules"    "2" "1"
"13"    "Night Of The Long Knives"  "2" "1"
"14"    "Spellbound"    "2" "1"
"15"    "Go Down"   "2" "1"
"16"    "Dog Eat Dog"   "2" "1"
"19"    "Problem Child" "2" "1"
"20"    "Overdose"  "2" "2"
"21"    "Hell Ain't A Bad Place To Be"  "2" "1"
"24"    "Love In An Elevator"   "3" "1"
"25"    "Rag Doll"  "3" "1"
"26"    "What It Takes" "3" "1"
"28"    "Janie's Got A Gun" "3" "1"
"30"    "Amazing"   "3" "1"
"31"    "Blind Man" "3" "1"
"32"    "Deuces Are Wild"   "3" "2"
"36"    "Angel" "3" "1"
"37"    "Livin' On The Edge"    "3" "1"
"38"    "All I Really Want" "3" "1"
"39"    "You Oughta Know"   "3" "1"
"42"    "Right Through You" "3" "1"
"43"    "Forgiven"  "3" "1"
"44"    "You Learn" "3" "1"
"48"    "Not The Doctor"    "3" "2"
"49"    "Wake Up"   "3" "1"
"53"    "Sea Of Sorrow" "3" "1"
"54"    "Bleed The Freak"   "3" "1"
"55"    "I Can't Remember"  "3" "1"
"57"    "It Ain't Like That"    "3" "1"
"60"    "Confusion" "3" "1"
"61"    "I Know Somethin (Bout You)"    "3" "1"
"62"    "Real Thing"    "3" "1"
"66"    "Por Causa De Você" "2" "2"
"67"    "Ligia" "2" "1"
"71"    "Falando De Amor"   "2" "1"
"72"    "Angela"    "2" "1"
"75"    "O Boto (Bôto)" "2" "1"
"76"    "Canta, Canta Mais" "2" "1"
"78"    "Master Of Puppets" "3" "1"
"80"    "The Unforgiven"    "3" "1"
"84"    "Welcome Home (Sanitarium)" "3" "2"
"85"    "Cochise"   "2" "1"
"89"    "Like a Stone"  "2" "1"
"90"    "Set It Off"    "2" "1"
"93"    "Exploder"  "2" "1"
"94"    "Hypnotize" "2" "1"
"98"    "The Last Remaining Light"  "2" "1"
"99"    "Your Time Has Come"    "2" "1"
"102"   "Doesn't Remind Me" "2" "1"
"103"   "Drown Me Slowly"   "2" "1"
"107"   "Yesterday To Tomorrow" "2" "1"
"108"   "Dandelion" "2" "1"
"111"   "Money" "3" "1"
"112"   "Long Tall Sally"   "3" "1"
"116"   "C'Mon Everybody"   "3" "1"
"117"   "Rock 'N' Roll Music"   "3" "1"
"120"   "Carol" "3" "1"
"121"   "Good Golly Miss Molly" "3" "1"
"125"   "Spanish moss-""A sound portrait""-Spanish moss"    "2" "1"
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...