У меня есть столбец:
val originalSqlLikePatternMap = Map("item (%) is blacklisted%" -> "BLACK_LIST",
"%Testing%" -> "TESTING",
"%purchase count % is too low %" -> "TOO_LOW_PURCHASE_COUNT")
val javaPatternMap = originalSqlLikePatternMap.map(v => v._1.replaceAll("%", ".*") -> v._2)
val df = Seq(
"Testing(2,4, (4,6,7) foo, Foo purchase count 1 is too low",
"Foo purchase count (12, 4) is too low ", "#!@", "item (mejwnw) is blacklisted",
"item (1) is blacklisted, #!@"
).toDF("raw_type")
val converter = (value: String) => javaPatternMap.find(v => value.matches(v._1)).map(_._2).getOrElse("Unknown")
val converterUDF = udf(converter)
val result = df.withColumn("updatedType", converterUDF($"raw_type"))
, но он дает:
+---------------------------------------------------------+----------------------+
|raw_type |updatedType |
+---------------------------------------------------------+----------------------+
|Testing(2,4, (4,6,7) foo, Foo purchase count 1 is too low|TESTING |
|Foo purchase count (12, 4) is too low |TOO_LOW_PURCHASE_COUNT|
|#!@ |Unknown |
|item (mejwnw) is blacklisted |BLACK_LIST |
|item (1) is blacklisted, #!@ |BLACK_LIST |
+---------------------------------------------------------+----------------------+
Но я хочу "Тестирование (2,4, (4,6,7) foo, покупка Fooчисло 1 слишком мало ", чтобы дать 2 значения" TESTING, TOO_LOW_PURCHASE_COUNT ", например так:
+---------------------------------------------------------+--------------------------------+
|raw_type |updatedType |
+---------------------------------------------------------+--------------------------------+
|Testing(2,4, (4,6,7) foo, Foo purchase count 1 is too low|TESTING, TOO_LOW_PURCHASE_COUNT |
|Foo purchase count (12, 4) is too low |TOO_LOW_PURCHASE_COUNT |
|#!@ |Unknown |
|item (mejwnw) is blacklisted |BLACK_LIST |
|item (1) is blacklisted, #!@ |BLACK_LIST, Unkown |
+---------------------------------------------------------+--------------------------------+
Может кто-нибудь сказать, что я здесь не так делаю?