spark-sql> explain
> SELECT 'ADDRESS', 'IDTYPE', a.pid
> FROM dmgr.ex_p10ids_address a
> LEFT JOIN p10ids_riskcon b
> ON (a.pid = b.apid OR a.pid = b.pid)
> WHERE a.pt IN ('20200308')
> AND b.classcode NOT IN ('26371100', '26371200', '26371300', '13770100', '26376000')
> AND a.src_sys = 'APP0001'
> AND a.endtime = '99991231999'
> AND a.idtype NOT IN ('00')
> AND zhengjianleixing(a.idtype, 'P10IDS') <> '0';
== Physical Plan ==
TungstenProject [ADDRESS AS ...,IDTYPE AS ...]
Union
SortMergeJoin [pid#459], [apid#475]
TungstenSort [pid#459 ASC], false, 0
TungstenExchange hashpartitioning(pid#459)
ConvertToUnsafe
Project [pid#459]
Filter ((((src_sys#471 = APP0001) && (endtime#468 = 99991231999)) && NOT idtype#460 INSET (00)) && NOT (HiveSimpleUDF#com.cpic.udf.dmgr_udf.ZhengjianLeixingSensitive(idtype#460,P10IDS) = 0))
HiveTableScan [pid#459,src_sys#471,endtime#468,idtype#460], (MetastoreRelation dmgr, ex_p10ids_address, Some(a)), [pt#446 INSET (20200308)], Statistics(10485761, 1522668470)
TungstenSort [apid#475 ASC], false, 0
TungstenExchange hashpartitioning(apid#475)
ConvertToUnsafe
Project [apid#475,pid#506]
Filter NOT classcode#501 INSET (26371200,26371100,26376000,26371300,13770100)
Scan ParquetRelation[hdfs://hacluster/user/hive/warehouse/dmgr.db/p10ids_riskcon](dmgr.p10ids_riskcon)[apid#475,pid#506,classcode#501] Statistics(94039006999, 1584411030)
Filter NOT (pid#459 = apid#475)
SortMergeJoin [pid#459], [pid#506]
TungstenSort [pid#459 ASC], false, 0
TungstenExchange hashpartitioning(pid#459)
ConvertToUnsafe
Project [pid#459]
Filter ((((src_sys#471 = APP0001) && (endtime#468 = 99991231999)) && NOT idtype#460 INSET (00)) && NOT (HiveSimpleUDF#com.cpic.udf.dmgr_udf.ZhengjianLeixingSensitive(idtype#460,P10IDS) = 0))
HiveTableScan [pid#459,src_sys#471,endtime#468,idtype#460], (MetastoreRelation dmgr, ex_p10ids_address, Some(a)), [pt#446 INSET (20200308)], Statistics(10485761, 1522668470)
TungstenSort [pid#506 ASC], false, 0
TungstenExchange hashpartitioning(pid#506)
ConvertToUnsafe
Project [apid#475,pid#506]
Filter NOT classcode#501 INSET (26371200,26371100,26376000,26371300,13770100)
Scan ParquetRelation[hdfs://hacluster/user/hive/warehouse/dmgr.db/p10ids_riskcon](dmgr.p10ids_riskcon)[apid#475,pid#506,classcode#501] Statistics(94039006999, 1584411030)
Второй sql просто поместил "AND b.classcode NOT IN" в предложение ON.
Первый sql работал 7 минут, а второй - часы, и я не не знаю причину. Ценю за ваши ответы!