Вы можете использовать оконные функции, генерировать row_number путем разбиения по значению даты и фильтровать по row_number = 1
Проверьте это:
val df = Seq(("2014-12-01 02:54:00","2"),("2014-12-01 03:54:00","3"),("2014-12-01 04:54:00","4"),("2014-12-01 05:54:00","5"),("2014-12-02 02:54:00","6"),("2014-12-02 02:54:00","7"),("2014-12-03 02:54:00","8"))
.toDF("time","value")
df.withColumn("time",'time.cast("timestamp")).withColumn("value",'value.cast("int"))
df.createOrReplaceTempView("timetab")
spark.sql(
""" with order_ts( select time, value , row_number() over(partition by date_format(time,"yyyyMMdd") order by value ) as rn from timetab)
select time,value from order_ts where rn=1
""").show(false)
Вывод:
+-------------------+-----+
|time |value|
+-------------------+-----+
|2014-12-02 02:54:00|6 |
|2014-12-01 02:54:00|2 |
|2014-12-03 02:54:00|8 |
+-------------------+-----+