Это может быть достигнуто простым запросом group by
с функцией date_format
.
spark.sql(
"""
SELECT ID
, date_format(timestamp, 'yyyy-MM-dd HH:00:00') as time
, mean(val) as avgval
FROM table
GROUP BY ID
, date_format(timestamp, 'yyyy-MM-dd HH:00:00')
ORDER BY ID
, date_format(timestamp, 'yyyy-MM-dd HH:00:00')
""") \
.show(20, False)
Результат:
+---+-------------------+------+
|ID |time |avgval|
+---+-------------------+------+
|A |2020-01-19 03:00:00|5.0 |
|A |2020-01-20 05:00:00|5.0 |
|B |2020-01-19 02:00:00|0.5 |
|B |2020-01-19 06:00:00|5.5 |
+---+-------------------+------+