Возможно, проблема с данными вашего файла. Я пытался сделать то же самое с вашими собственными данными, и он отлично работает, вы можете попробовать использовать функции с кадрами данных или спарк SQL.
ваш файл данных из: https://www.kaggle.com/carrie1/ecommerce-data/home#data .csv
InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,12/1/2010 8:26,2.55,17850,United Kingdom
536365,71053,WHITE METAL LANTERN,6,12/1/2010 8:26,3.39,17850,United Kingdom
536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,12/1/2010 8:26,2.75,17850,United Kingdom
536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,12/1/2010 8:26,3.39,17850,United Kingdom
536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,12/1/2010 8:26,3.39,17850,United Kingdom
536365,22752,SET 7 BABUSHKA NESTING BOXES,2,12/1/2010 8:26,7.65,17850,United Kingdom
536365,21730,GLASS STAR FROSTED T-LIGHT HOLDER,6,12/1/2010 8:26,4.25,17850,United Kingdom
536366,22633,HAND WARMER UNION JACK,6,12/1/2010 8:28,1.85,17850,United Kingdom
536366,22632,HAND WARMER RED POLKA DOT,6,12/1/2010 8:28,1.85,17850,United Kingdom
536367,84879,ASSORTED COLOUR BIRD ORNAMENT,32,12/1/2010 8:34,1.69,13047,United Kingdom
536367,22745,POPPY'S PLAYHOUSE BEDROOM ,6,12/1/2010 8:34,2.1,13047,United Kingdom
536367,22748,POPPY'S PLAYHOUSE KITCHEN,6,12/1/2010 8:34,2.1,13047,United Kingdom
536367,22749,FELTCRAFT PRINCESS CHARLOTTE DOLL,8,12/1/2010 8:34,3.75,13047,United Kingdom
536367,22310,IVORY KNITTED MUG COSY ,6,12/1/2010 8:34,1.65,13047,United Kingdom
536367,84969,BOX OF 6 ASSORTED COLOUR TEASPOONS,6,12/1/2010 8:34,4.25,13047,United Kingdom
код в IntelliJ
val df = sqlContext
.read
.option("header", true)
.option("inferSchema", true)
.csv("/home/cloudera/files/tests/timestamp.csv")
.cache()
df.show(5, truncate = false)
df.printSchema()
import org.apache.spark.sql.functions._
// You can try this with dataframe functions
val retails = df
.withColumn("InvoiceDateTS", to_timestamp(col("InvoiceDate"), "MM/dd/yyyy HH:mm"))
retails.show(5, truncate = false)
retails.printSchema()
// or sparkSQL
df.createOrReplaceTempView("df")
val retailsSQL = sqlContext.sql(
"""
|SELECT InvoiceNo,StockCode,InvoiceDate,customerID, TO_TIMESTAMP(InvoiceDate,"MM/dd/yyyy HH:mm") AS InvoiceDateTS
|FROM df
|""".stripMargin)
retailsSQL.show(5,truncate = false)
retailsSQL.printSchema()
выход
+---------+---------+----------------------------------+--------+--------------+---------+----------+--------------+
|InvoiceNo|StockCode|Description |Quantity|InvoiceDate |UnitPrice|CustomerID|Country |
+---------+---------+----------------------------------+--------+--------------+---------+----------+--------------+
|536365 |85123A |WHITE HANGING HEART T-LIGHT HOLDER|6 |12/1/2010 8:26|2.55 |17850 |United Kingdom|
|536365 |71053 |WHITE METAL LANTERN |6 |12/1/2010 8:26|3.39 |17850 |United Kingdom|
+---------+---------+----------------------------------+--------+--------------+---------+----------+--------------+
only showing top 2 rows
root
|-- InvoiceNo: string (nullable = true)
|-- StockCode: string (nullable = true)
|-- Description: string (nullable = true)
|-- Quantity: integer (nullable = true)
|-- InvoiceDate: string (nullable = true)
|-- UnitPrice: double (nullable = true)
|-- CustomerID: integer (nullable = true)
|-- Country: string (nullable = true)
+---------+---------+----------------------------------+--------+--------------+---------+----------+--------------+-------------------+
|InvoiceNo|StockCode|Description |Quantity|InvoiceDate |UnitPrice|CustomerID|Country |InvoiceDateTS |
+---------+---------+----------------------------------+--------+--------------+---------+----------+--------------+-------------------+
|536365 |85123A |WHITE HANGING HEART T-LIGHT HOLDER|6 |12/1/2010 8:26|2.55 |17850 |United Kingdom|2010-12-01 08:26:00|
|536365 |71053 |WHITE METAL LANTERN |6 |12/1/2010 8:26|3.39 |17850 |United Kingdom|2010-12-01 08:26:00|
+---------+---------+----------------------------------+--------+--------------+---------+----------+--------------+-------------------+
only showing top 2 rows
root
|-- InvoiceNo: string (nullable = true)
|-- StockCode: string (nullable = true)
|-- Description: string (nullable = true)
|-- Quantity: integer (nullable = true)
|-- InvoiceDate: string (nullable = true)
|-- UnitPrice: double (nullable = true)
|-- CustomerID: integer (nullable = true)
|-- Country: string (nullable = true)
|-- InvoiceDateTS: timestamp (nullable = true)
+---------+---------+--------------+----------+-------------------+
|InvoiceNo|StockCode|InvoiceDate |customerID|InvoiceDateTS |
+---------+---------+--------------+----------+-------------------+
|536365 |85123A |12/1/2010 8:26|17850 |2010-12-01 08:26:00|
|536365 |71053 |12/1/2010 8:26|17850 |2010-12-01 08:26:00|
+---------+---------+--------------+----------+-------------------+
only showing top 2 rows
root
|-- InvoiceNo: string (nullable = true)
|-- StockCode: string (nullable = true)
|-- InvoiceDate: string (nullable = true)
|-- customerID: integer (nullable = true)
|-- InvoiceDateTS: timestamp (nullable = true)