Вы можете проверить с помощью rlike
и преобразовать его в Integer:
import pyspark.sql.functions as F
df.withColumn("check",F.col("text").rlike("yes").cast("Integer")).show()
+---+-----+--------------------+-----+
| id|group| text|check|
+---+-----+--------------------+-----+
| 1| a| hey there| 0|
| 2| c| no you can| 0|
| 3| a| yes yes yes| 1|
| 4| b| yes or no| 1|
| 5| b|you need to say yes.| 1|
| 6| a| yes you can| 1|
| 7| d| yes!| 1|
| 8| c| no&| 0|
| 9| b| ok| 0|
+---+-----+--------------------+-----+
Для отредактированного вопроса вы можете попробовать: higher order functions
:
import string
import re
pat = '|'.join([re.escape(i) for i in list(string.punctuation)])
(df.withColumn("text1",F.regexp_replace(F.col("text"),pat,""))
.withColumn("Split",F.split("text1"," "))
.withColumn("check",
F.expr('''exists(Split,x-> replace(x,"","") = "yes")''').cast("Integer"))
.drop("Split","text1")).show()
+---+-----+--------------------+-----+
| id|group| text|check|
+---+-----+--------------------+-----+
| 1| a| hey there| 0|
| 2| c| no you can| 0|
| 3| a| yes yes yes| 1|
| 4| b| yes or no| 1|
| 5| b|you need to say yes.| 1|
| 6| a| yes you can| 1|
| 7| d| yes!| 1|
| 8| c| no&| 0|
| 9| b| okyes| 0|
+---+-----+--------------------+-----+