У меня есть это данные, где я пытаюсь получить процент столбца в группе при повороте с использованием искры sql.
Я пробовал следующий запрос
spark.sql("""SELECT
| product,
| customer,
| Central / first(Central) over (
| partition by product
| order by
| product,
| customer
| ) * 100 as Central,
| East / first(East) over (
| partition by product
| order by
| product,
| customer
| ) * 100 as East,
| West / first(West) over (
| partition by product
| order by
| product,
| customer
| ) * 100 as West,
| Total / first(Total) over (
| partition by product
| order by
| product,
| customer
| ) * 100 as Total
| from
| (
| SELECT
| *
| FROM
| (
| SELECT
| region,
| product,
| customer,
| sum(autonumber) as autonumber
| from
| test
| group by
| region,
| product,
| customer grouping sets (
| (region, product, customer),
| (product, region),
| (product, customer),
| (product),
| (region),
| ()
| )
| order by
| product nulls last,
| customer nulls last,
| region nulls last
| ) test pivot (
| sum(autonumber) for (region) in (
| 'Central' Central, 'East' East, 'West' West,
| null Total
| )
| )
| )
| order by
| product nulls last,
| customer nulls last""").show
В результате
+-------------+--------------+-----------------+------------------+------------------+-------------------+
| product| customer| Central| East| West| Total|
+-------------+--------------+-----------------+------------------+------------------+-------------------+
|Air Purifiers| Andy Roddick| null|17.504862461794943| null| 12.12237829517029|
|Air Purifiers|Augustine Paul| null|11.947763267574327| null| 8.274004233211468|
|Air Purifiers|Barbara Fisher| null| 28.17449291469853| null| 19.511256494131228|
|Air Purifiers| John Britto|26.41509433962264| null| null| 4.579565133730998|
|Air Purifiers| Lazar Mathew| null| 14.14281744929147| null| 9.794111987685202|
|Air Purifiers| Neil Seth|73.58490566037736| null| null| 12.757360015393497|
|Air Purifiers| Rozario Diego| null|28.230063906640734| null| 19.549740234750818|
|Air Purifiers| Steffi Graf| null| null| 91.96556671449068| 12.334038868578027|
|Air Purifiers| William Brown| null| null| 8.034433285509325| 1.0775447373484701|
|Air Purifiers| null| 100.0| 100.0| 100.0| 100.0|
| Art Supplies|Augustine Paul| null|1.8166089965397925| 9.88203111643016| 3.2970438960133275|
| Art Supplies|Barbara Fisher| null| null|0.7351684048555309|0.16659563751888729|
| Art Supplies| Benjamin Ross| null| 2.828054298642534| null| 1.646584789430863|
| Art Supplies| Carl Lewis| null| null|10.497520943751068| 2.3788307310836467|
| Art Supplies|Catherine Rose| null|0.7985094490284802| null| 0.4649180581922436|
| Art Supplies|David Flashing| null| null| 2.75260728329629| 0.6237650614079269|
| Art Supplies| David Love| null|2.1160500399254727| null| 1.2320328542094456|
| Art Supplies| Dean Jones| null|4.1655576257652385| 10.56590870234228| 4.819650536592925|
| Art Supplies| Florence Joy| null|3.2871972318339098| null| 1.9139126728914029|
| Art Supplies|Hallie Redmond|2.168625861370085|2.3156774021825925| null| 1.7628143039789237|
+-------------+--------------+-----------------+------------------+------------------+-------------------+
Как видите, я получил% столбца для каждой группы (ie) каждого продукта, но я действительно хочу получить% столбца от общего числа. столбца для каждого продукта. (ie) Для каждого конечного результата продукта вместо 100 он должен быть в процентах от общего значения столбца.
Вот так
+-------------+--------------+-----------------+------------------+------------------+-------------------+
| product| customer| Central| East| West| Total|
+-------------+--------------+-----------------+------------------+------------------+-------------------+
|Air Purifiers| Andy Roddick| null|17.504862461794943| null| 12.12237829517029|
|Air Purifiers|Augustine Paul| null|11.947763267574327| null| 8.274004233211468|
|Air Purifiers|Barbara Fisher| null| 28.17449291469853| null| 19.511256494131228|
|Air Purifiers| John Britto|26.41509433962264| null| null| 4.579565133730998|
|Air Purifiers| Lazar Mathew| null| 14.14281744929147| null| 9.794111987685202|
|Air Purifiers| Neil Seth|73.58490566037736| null| null| 12.757360015393497|
|Air Purifiers| Rozario Diego| null|28.230063906640734| null| 19.549740234750818|
|Air Purifiers| Steffi Graf| null| null| 91.96556671449068| 12.334038868578027|
|Air Purifiers| William Brown| null| null| 8.034433285509325| 1.0775447373484701|
|Air Purifiers| null| 1.28| 2.55| 0.94| 1.82|
| Art Supplies|Augustine Paul| null|1.8166089965397925| 9.88203111643016| 3.2970438960133275|
| Art Supplies|Barbara Fisher| null| null|0.7351684048555309|0.16659563751888729|
| Art Supplies| Benjamin Ross| null| 2.828054298642534| null| 1.646584789430863|
| Art Supplies| Carl Lewis| null| null|10.497520943751068| 2.3788307310836467|
| Art Supplies|Catherine Rose| null|0.7985094490284802| null| 0.4649180581922436|
| Art Supplies|David Flashing| null| null| 2.75260728329629| 0.6237650614079269|
| Art Supplies| David Love| null|2.1160500399254727| null| 1.2320328542094456|
| Art Supplies| Dean Jones| null|4.1655576257652385| 10.56590870234228| 4.819650536592925|
| Art Supplies| Florence Joy| null|3.2871972318339098| null| 1.9139126728914029|
| Art Supplies|Hallie Redmond|2.168625861370085|2.3156774021825925| null| 1.7628143039789237|
+-------------+--------------+-----------------+------------------+------------------+-------------------+
Есть ли способ добиться этого?