Вместо .textFile
используйте .csv
.
Using .csv:
spark.read.option("delimiter","|").option("header","false").csv("books").show()
//+---+-------------+--------------------+----------+--------------+------+
//|_c0| _c1| _c2| _c3| _c4| _c5|
//+---+-------------+--------------------+----------+--------------+------+
//| 0|3-88623-803-7| GARDENING|2003-11-07| Editora FTD|174.99|
//| 1|5-72448-672-4|TECHNOLOGY-ENGINE...|2012-08-08|Wolters Kluwer|140.99|
//| 2|7-64433-458-3| SOCIAL-SCIENCE|2015-11-14| Bungeishunju| 7.99|
//| 3|1-18243-251-3| MATHEMATICS|1997-02-22|Hachette Livre| 34.99|
//+---+-------------+--------------------+----------+--------------+------+
Using .textFile:
Результаты RDD[String]
, затем с помощью .split и .map нам нужно конвертировать RDD to dataframe
.
spark.read.textFile("books").map(x => x.split("\\|")).
map(x =>(x(0),x(1),x(2),x(3),x(4),x(5))).
toDF().
show()
//+---+-------------+--------------------+----------+--------------+------+
//| _1| _2| _3| _4| _5| _6|
//+---+-------------+--------------------+----------+--------------+------+
//| 0|3-88623-803-7| GARDENING|2003-11-07| Editora FTD|174.99|
//| 1|5-72448-672-4|TECHNOLOGY-ENGINE...|2012-08-08|Wolters Kluwer|140.99|
//| 2|7-64433-458-3| SOCIAL-SCIENCE|2015-11-14| Bungeishunju| 7.99|
//| 3|1-18243-251-3| MATHEMATICS|1997-02-22|Hachette Livre| 34.99|
//+---+-------------+--------------------+----------+--------------+------+