Вы почти рядом.Вот подробный пример с опциями withColumn
и selectExpr
:
Образец df
df = spark.createDataFrame([('This is','This'),
('That is','That'),
('That is','There')],
['text','name'])
#+-------+-----+
#| text| name|
#+-------+-----+
#|This is| This|
#|That is| That|
#|That is|There|
#+-------+-----+
Опция 1: withColumn
с использованием expr
функция
from pyspark.sql.functions import expr, regexp_replace
df.withColumn("new_col1",expr("regexp_replace(text,name,'NAME')")).show()
#+-------+-----+--------+
#| text| name|new_col1|
#+-------+-----+--------+
#|This is| This| NAME is|
#|That is| That| NAME is|
#|That is|There| That is|
#+-------+-----+--------+
Вариант 2: selectExpr
с использованием regexp_replace
from pyspark.sql.functions import regexp_replace
df.selectExpr("*",
"regexp_replace(text,name,'NAME') AS new_text").show()
#+-------+-----+--------+
#| text| name|new_text|
#+-------+-----+--------+
#|This is| This| NAME is|
#|That is| That| NAME is|
#|That is|There| That is|
#+-------+-----+--------+