Вы ищете функцию split .Пожалуйста, найдите пример ниже:
import pyspark.sql.functions as F
rows = sc.parallelize([['14-banana'], ['12-cheese'], ['13-olives'], ['11-almonds']])
rows_df = rows.toDF(["ID"])
split = F.split(rows_df.ID, '-')
rows_df = rows_df.withColumn('number', split.getItem(0))
rows_df = rows_df.withColumn('fruit', split.getItem(1))
rows_df.show()
Вывод:
+----------+------+-------+
| ID|number| fruit|
+----------+------+-------+
| 14-banana| 14| banana|
| 12-cheese| 12| cheese|
| 13-olives| 13| olives|
|11-almonds| 11|almonds|
+----------+------+-------+