Попробуйте:
Подход на основе фреймов данных -
df = spark.createDataFrame([(1,"hi@gmail.com") ,(2,"goodbye@gmail.com",),(3,"abc@gmail.com",),(4,"xyz@gmail.com")], ['id','email_id'])
email_filter_list = ["hi@gmail.com", "goodbye@gmail.com"]
df.where(col('email_id').isin(email_filter_list)).show()
Подход на основе Spark SQL -
df = spark.createDataFrame([(1,"hi@gmail.com") ,(2,"goodbye@gmail.com",),(3,"abc@gmail.com",),(4,"xyz@gmail.com")], ['id','email_id'])
df.createOrReplaceTempView('t1')
sql_filter = ','.join(["'" +i + "'" for i in email_filter_list])
spark.sql("SELECT * FROM t1 WHERE email_id IN ({})".format(sql_filter)).show()