Попробуйте с to_timestamp,cast(LongType)
, затем вычтите time_dest,time
столбцов, чтобы получить разницу!
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
df.withColumn("Duration",to_timestamp(col("time_dest"),"dd/MM/yyyy HH:mm").cast(LongType)-
to_timestamp(col("time"),"dd/MM/yyyy HH:mm").cast(LongType)).show()
//or by using unix_timestamp function
df.withColumn("Duration",unix_timestamp(col("time_dest"),"dd/MM/yyyy HH:mm").cast(LongType)-unix_timestamp(col("time"),"dd/MM/yyyy HH:mm").cast(LongType)).show()
//+----------------+----------------+--------+
//| time| time_dest|Duration|
//+----------------+----------------+--------+
//|17/02/2020 00:06|17/02/2020 00:16| 600|
//|17/02/2020 00:16|17/02/2020 00:26| 600|
//+----------------+----------------+--------+
Если вам нужна длительность в minutes,hours
тогда:
df.withColumn("Duration",to_timestamp(col("time_dest"),"dd/MM/yyyy HH:mm").cast(LongType)-to_timestamp(col("time"),"dd/MM/yyyy HH:mm").cast(LongType)).
withColumn("Duration_mins",round(col("Duration")/60)).
withColumn("Duration_hours",round(col("Duration")/3600)).
show()
//+----------------+----------------+--------+-------------+--------------+
//| time| time_dest|Duration|Duration_mins|Duration_hours|
//+----------------+----------------+--------+-------------+--------------+
//|17/02/2020 00:06|17/02/2020 00:16| 600| 10.0| 0.0|
//|17/02/2020 00:16|17/02/2020 00:26| 600| 10.0| 0.0|
//+----------------+----------------+--------+-------------+--------------+