Вот один из способов использования класса DataFrameFlattener
, реализованного Databricks :
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.types.{DataType, StructType}
implicit class DataFrameFlattener(df: DataFrame) {
def flattenSchema: DataFrame = {
df.select(flatten(Nil, df.schema): _*)
}
protected def flatten(path: Seq[String], schema: DataType): Seq[Column] = schema match {
case s: StructType => s.fields.flatMap(f => flatten(path :+ f.name, f.dataType))
case other => col(path.map(n => s"`$n`").mkString(".")).as(path.mkString(".")) :: Nil
}
}
df.flattenSchema.show
И вывод:
+---------------+--------------+-------------+--------------+----------------+------+-------------------+
|priceData.close|priceData.high|priceData.low|priceData.open|priceData.volume|symbol| timestamp|
+---------------+--------------+-------------+--------------+----------------+------+-------------------+
| 1179.5500| 1179.5500| 1176.6700| 1177.2600| 49478| TEST3|2019-05-07 16:00:00|
| 189.9100| 189.9100| 189.5100| 189.5660| 267986| TEST4|2019-05-07 16:00:00|
+---------------+--------------+-------------+--------------+----------------+------+-------------------+
Или вы можете просто выполнить обычный выбор:
df.select(
"priceData.close",
"priceData.high",
"priceData.low",
"priceData.open",
"priceData.volume",
"symbol",
"timestamp").show
Выход:
+---------+---------+---------+---------+------+------+-------------------+
| close| high| low| open|volume|symbol| timestamp|
+---------+---------+---------+---------+------+------+-------------------+
|1179.5500|1179.5500|1176.6700|1177.2600| 49478| TEST3|2019-05-07 16:00:00|
| 189.9100| 189.9100| 189.5100| 189.5660|267986| TEST4|2019-05-07 16:00:00|
+---------+---------+---------+---------+------+------+-------------------+