Группа данных Pyspark группируется по запросу Pyspark SQL - PullRequest
0 голосов
/ 30 марта 2020

Что такое соответствующий pyspark sql query

df.groupBy("Country","Product Category","Sub Category").sum("Profit").show()

+--------------+----------------+-----------------+------------------+
|       Country|Product Category|     Sub Category|       sum(Profit)|
+--------------+----------------+-----------------+------------------+
| United States|        Clothing|            Vests|           2198.97|
| United States|     Accessories|         Cleaners|           1666.05|
| United States|        Clothing|          Jerseys|13798.000000000002|
|        France|           Bikes|    Touring Bikes| 6093.929999999999|
| United States|           Bikes|   Mountain Bikes|-4766.709999999997|
|United Kingdom|     Accessories|          Fenders|            845.01|
|United Kingdom|        Clothing|           Gloves|159.01999999999998|
| United States|     Accessories|          Fenders| 4376.010000000001|
|United Kingdom|        Clothing|            Socks|              15.0|
| United States|        Clothing|             Caps|            1320.0|
|       Germany|     Accessories|         Cleaners|            137.99|
| United States|     Accessories|Bottles and Cages|            1007.0|
|        France|     Accessories|  Hydration Packs|            761.99|
|       Germany|     Accessories|  Tires and Tubes|3811.9700000000003|
|        France|     Accessories|          Fenders| 66.99000000000001|
|        France|        Clothing|          Jerseys|            3436.0|
|United Kingdom|           Bikes|       Road Bikes|           -785.96|
| United States|        Clothing|            Socks|              96.0|
|        France|        Clothing|             Caps|             586.0|
|        France|           Bikes|   Mountain Bikes|-3697.829999999997|
+--------------+----------------+-----------------+------------------+

Я хочу такой же ответ в pyspark sql

1 Ответ

0 голосов
/ 30 марта 2020

SQL является запросом агрегации:

select "Country", "Product Category", "Sub Category", sum("Profit")
from dataframe
group "Country", "Product Category", "Sub Category"
...