Вместо функции Split
, используйте функцию regexp_extract
в Spark.
Regex Explanation:
(.*)\\s(.*) //capture everything into 1 capture group until last space(\s) then capture everything after into 2 capture group.
Example:
val df= Seq(("My name is Rahul")).toDF("text") //sample string
df.withColumn("col1",regexp_extract($"text","(.*)\\s(.*)",1)).
withColumn("col2",regexp_extract($"text","(.*)\\s(.*)",2)).
show()
Result:
+----------------+----------+-----+
| text| col1| col2|
+----------------+----------+-----+
|My name is Rahul|My name is|Rahul|
+----------------+----------+-----+