Используйте функции date_format () и to_timestamp ().Проверьте это:
scala> val df = Seq((20180103),(20180105)).toDF("dt")
df: org.apache.spark.sql.DataFrame = [dt: int]
scala> df.withColumn("dt",'dt.cast("string")).withColumn("dt",date_format(to_timestamp('dt,"yyyyMMdd"),"yyyy-MM-dd")).show(false)
+----------+
|dt |
+----------+
|2018-01-03|
|2018-01-05|
+----------+
scala>
Обратите внимание, что date_format возвращает строку, если вы хотите ее в типе данных date, тогда
scala> val df2 = df.withColumn("dt",'dt.cast("string")).withColumn("dt",date_format(to_timestamp('dt,"yyyyMMdd"),"yyyy-MM-dd"))
df2: org.apache.spark.sql.DataFrame = [dt: string]
scala> df2.printSchema
root
|-- dt: string (nullable = true)
scala> val df3 = df2.withColumn("dt",'dt.cast("date"))
df3: org.apache.spark.sql.DataFrame = [dt: date]
scala> df3.printSchema
root
|-- dt: date (nullable = true)
scala> df3.show(false)
+----------+
|dt |
+----------+
|2018-01-03|
|2018-01-05|
+----------+
scala>