С Spark2.2+
:
Вам не нужно использовать udf
здесь, вместо этого используйте встроенную функцию typedLit
для создания поиска .
Example:
import org.apache.spark.sql.functions._
val df=Seq(("1","Rodrigo","30","2019-01-01","male"),("1","Rodrigo","30","2019-01-01","female"),("1","Rodrigo","30","2019-01-01","mal")).toDF("id","name","age","date","genderKey")
val genderMap=typedLit(Map("male" -> "Masculine","female" -> "Feminie"))
//genderMap: org.apache.spark.sql.Column = keys: [male,female], values: [Masculine,Feminie]
df.withColumn("genderName",coalesce(genderMap(col("genderKey")),lit("not found"))).show()
//+---+-------+---+----------+---------+----------+
//| id| name|age| date|genderKey|genderName|
//+---+-------+---+----------+---------+----------+
//| 1|Rodrigo| 30|2019-01-01| male| Masculine|
//| 1|Rodrigo| 30|2019-01-01| female| Feminie|
//| 1|Rodrigo| 30|2019-01-01| mal| not found|
//+---+-------+---+----------+---------+----------+
Использование UDF
:
var genderMap = Map[String, String](
"male" -> "Masculine",
"female" -> "Feminine"
)
def getGenderName(genderMap:Map[String,String]) = udf((gender:String) => genderMap.getOrElse(gender,"not found"))
df.withColumn("genderName",getGenderName(genderMap)(col("genderKey"))).show()
//+---+-------+---+----------+---------+----------+
//| id| name|age| date|genderKey|genderName|
//+---+-------+---+----------+---------+----------+
//| 1|Rodrigo| 30|2019-01-01| male| Masculine|
//| 1|Rodrigo| 30|2019-01-01| female| Feminine|
//| 1|Rodrigo| 30|2019-01-01| mal| not found|
//+---+-------+---+----------+---------+----------+