Вот один из способов сделать это:
df.printSchema
df.show()
root
|-- Int_Col1: integer (nullable = false)
|-- Dt_Col1: date (nullable = true)
|-- Str_Col1: string (nullable = true)
|-- Dt_Col2: date (nullable = true)
+--------+----------+--------+----------+
|Int_Col1| Dt_Col1|Str_Col1| Dt_Col2|
+--------+----------+--------+----------+
| 1|1990-09-30| AAA|1990-09-30|
| 2|2001-12-14| BB|1990-09-30|
+--------+----------+--------+----------+
Затем выберите только DateType
, который нам нужно преобразовать, и измените его на TimestampType
, используя foldLeft
.
val result = df.dtypes.collect{ case (dn, dt ) if dt.startsWith("DateType") => (dn,TimestampType)
case (dn, dt ) if dt.startsWith("IntegerType") => (dn,DoubleType)
}
.foldLeft(df)((accDF, c) => accDF.withColumn(c._1, col(c._1).cast(c._2)))
result.printSchema
result.show(false)
Вывод:
root
|-- Int_Col1: integer (nullable = false)
|-- Dt_Col1: timestamp (nullable = true)
|-- Str_Col1: string (nullable = true)
|-- Dt_Col2: timestamp (nullable = true)
+--------+-------------------+--------+-------------------+
|Int_Col1|Dt_Col1 |Str_Col1|Dt_Col2 |
+--------+-------------------+--------+-------------------+
|1 |1990-09-30 00:00:00|AAA |1990-09-30 00:00:00|
|2 |2001-12-14 00:00:00|BB |1990-09-30 00:00:00|
+--------+-------------------+--------+-------------------+