У меня есть фрейм данных:
+----------+------------+------------+--------------------+
| acc |id_Vehicule |id_Device |dateTracking |
+----------+------------+------------+--------------------+
| 1 | 1 | 2 |2020-02-12 14:50:00 |
| 0 | 1 | 2 |2020-02-12 14:59:00 |
| 0 | 2 | 3 |2020-02-12 15:10:00 |
| 1 | 2 | 3 |2020-02-12 15:20:00 |
+----------+------------+------------+--------------------+
Я хочу получить вывод:
+----------+------------+------------+--------------------+----------------+
| acc |id_Vehicule |id_Device |dateTracking | acc_previous |
+----------+------------+------------+--------------------+----------------+
| 1 | 1 | 2 |2020-02-12 14:50:00 | null |
| 0 | 1 | 2 |2020-02-12 14:59:00 | 1 |
| 0 | 2 | 3 |2020-02-12 15:10:00 | null |
| 1 | 2 | 3 |2020-02-12 15:20:00 | 0 |
+----------+------------+------------+--------------------+----------------+
Я пробовал следующий код:
WindowSpec w =org.apache.spark.sql.expressions.Window.partitionBy("idVehicule","idDevice","dateTracking").orderBy("dateTracking");
Dataset <Row> df= df1.withColumn("acc_previous",lag("acc",1).over(w));
df.show();
Я получаю по результату;
+----------+------------+------------+--------------------+----------------+
| acc |id_Vehicule |id_Device |dateTracking | acc_previous |
+----------+------------+------------+--------------------+----------------+
| 1 | 1 | 2 |2020-02-12 14:50:00 | null |
| 0 | 1 | 2 |2020-02-12 14:59:00 | null |
| 0 | 2 | 3 |2020-02-12 15:10:00 | null |
| 1 | 2 | 3 |2020-02-12 15:20:00 | null |
+----------+------------+------------+--------------------+----------------+
Если есть идеи, буду очень признателен