Я новичок в Scala и мне трудно работать с простым набором данных в Spark.Я хочу иметь возможность просмотреть следующий порядок набора данных по EventType и crow, но не могу заставить его сделать это по убыванию значения.Я также хочу считывать только один eventType одновременно.
, когда я пытаюсь
dataset.orderBy("eventType")
Это работает, но если я добавляю '.desc', это не работает.
scala> setB.orderBy("eventType").desc
<console>:32: error: value desc is not a member of
org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
setB.orderBy("eventType").desc
или
scala> dataset.orderBy("eventType".desc)
<console>:32: error: value desc is not a member of String
dataset.orderBy("eventType".desc)
Я тоже пытаюсь использовать фильтр, но мне тоже ничего не нравится.что-то вроде: dataset.filter ("eventType" = "agg%")
Пример набора данных:
+----------------+------------------------------------------------------------------------------------+-----------------------------------+-------------+----------------+----+
|deadletterbucket|split |eventType |clientVersion|dDeviceSurrogate|crow|
+----------------+------------------------------------------------------------------------------------+-----------------------------------+-------------+----------------+----+
|event_failure |instance type (null) does not match any allowed primitive type (allowed: ["object"])|aggregate_event.app_launches |4.3.0.108 |1 |3 |
|event_failure |instance type (null) does not match any allowed primitive type (allowed: ["object"])|aggregate_event.app_launches |5.3.0.10 |1 |11 |
|event_failure |instance type (null) does not match any allowed primitive type (allowed: ["object"])|aggregate_event.app_launches |5.9.1.10 |3 |11 |
|event_failure |instance type (null) does not match any allowed primitive type (allowed: ["object"])|aggregate_event.app_launches |5.7.0.1 |3 |15 |
|event_failure |instance type (null) does not match any allowed primitive type (allowed: ["object"])|aggregate_event.app_launches |5.5.0.5 |6 |16 |
|event_failure |instance type (null) does not match any allowed primitive type (allowed: ["object"])|aggregate_event.app_launches |4.0.0.62 |7 |26 |
|event_failure |instance type (null) does not match any allowed primitive type (allowed: ["object"])|aggregate_event.app_launches |4.6.4.6 |9 |31 |
|event_failure |instance type (null) does not match any allowed primitive type (allowed: ["object"])|aggregate_event.app_network_traffic|7.12.0.113 |1 |1 |
|event_failure |instance type (null) does not match any allowed primitive type (allowed: ["object"])|aggregate_event.app_network_traffic|6.3.2.15 |1 |2 |
|event_failure |instance type (null) does not match any allowed primitive type (allowed: ["object"])|aggregate_event.app_network_traffic|5.1.2.10 |1 |3 |
В идеале я пытаюсь заставить работать что-то вроде следующего
dataset.orderBy("crow").desc.filter("eventType"="%app_launches").show(3,false)
|event_failure |instance type (null) does not match any allowed primitive type (allowed: ["object"])|aggregate_event.app_launches |5.5.0.5 |6 |31 |
|event_failure |instance type (null) does not match any allowed primitive type (allowed: ["object"])|aggregate_event.app_launches |4.0.0.62 |7 |26 |
|event_failure |instance type (null) does not match any allowed primitive type (allowed: ["object"])|aggregate_event.app_launches |4.6.4.6 |9 |16 |