Следующий запрос Spark SQL работает нормально:
((country IN (FROM medium_countries) ) AND (country IN (FROM big_countries))) AND EMAIL IS NOT NULL
и следующий работает нормально:
FALSE = ((country IN (FROM medium_countries)) AND (country IN (FROM big_countries))) AND EMAIL IS NOT NULL
но когда я добавляю NOT
оператор, например:
NOT ((country IN (FROM medium_countries)) AND (country IN (FROM big_countries))) AND EMAIL IS NOT NULL
происходит сбой со следующей ошибкой:
Exception in thread "main" org.apache.spark.sql.AnalysisException: Null-aware predicate sub-queries cannot be used in nested conditions: (NOT (country#22 IN (list#99 []) && country#22 IN (list#100 [])) && isnotnull(EMAIL#20));;
Filter (NOT (country#22 IN (list#99 []) && country#22 IN (list#100 [])) && isnotnull(EMAIL#20))
: :- SubqueryAlias `medium_countries`
: : +- Project [value#6 AS country#8]
: : +- LocalRelation [value#6]
: +- SubqueryAlias `big_countries`
: +- Project [value#1 AS country#3]
: +- LocalRelation [value#1]
+- SubqueryAlias `users`
+- Project [name#19, email#20, phone#21, country#22, monotonically_increasing_id() AS UniqueID#27L]
+- Project [_1#14 AS name#19, _2#15 AS email#20, _3#16 AS phone#21, _4#17 AS country#22]
+- LocalRelation [_1#14, _2#15, _3#16, _4#17]
Не могли бы вы объяснить, почему NOT
там не работает?