У меня есть таблица улья, которая служит моей исходной таблицей.
У меня также есть еще одна таблица улья, которая действует как цель.
DDL исходной и целевой таблиц одинаков за исключением того, что в целевой таблице было добавлено несколько столбцов журналирования.
Ниже приведены DDL:
Источник:
CREATE EXTERNAL TABLE source.customer_detail(
id string,
name string,
city string,
properties_owned array<struct<property_addr:string, location:string>>
)
ROW FORMAT SERDE
'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS TEXTFILE
LOCATION
'/user/aiman/customer_detail';
Цель:
CREATE EXTERNAL TABLE target.customer_detail(
id string,
name string,
city string,
properties_owned array<struct<property_addr:string, location:string>>
audit_insterted_ts timestamp,
audit_dml_action char(1)
)
PARTITIONED BY (audit_active_flag char(1))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\u0001'
STORED AS ORC
LOCATION
'/user/aiman/target/customer_detail';
Данные в источнике:
+---------------------+--------------------------+-------------------------+--------------------------------------------------------------------------------------------------------------------------------------+
| customer_detail.id | customer_detail.name | customer_detail.city | customer_detail.properties_owned |
+---------------------+--------------------------+-------------------------+--------------------------------------------------------------------------------------------------------------------------------------+
| 1 | Aiman Sarosh | kolkata | [{"property_addr":"H1 Block Saltlake","location":"kolkata"},{"property_addr":"New Property Added Saltlake","location":"kolkata"}] |
| 2 | Justin | delhi | [{"property_addr":"some address in delhi","location":"delhi"}] |
+---------------------+--------------------------+-------------------------+--------------------------------------------------------------------------------------------------------------------------------------+
Данные по цели:
+---------------------+--------------------------+-------------------------+------------------------------------------------------------------+--------------------------------------+-----------------------------------+------------------------------------+
| customer_detail.id | customer_detail.name | customer_detail.city | customer_detail.properties_owned | customer_detail.audit_insterted_ts | customer_detail.audit_dml_action | customer_detail.audit_active_flag |
+---------------------+--------------------------+-------------------------+------------------------------------------------------------------+--------------------------------------+-----------------------------------+------------------------------------+
| 1 | Aiman Sarosh | kolkata | [{"property_addr":"H1 Block Saltlake","location":"kolkata"}] | 2018-09-04 06:55:12.361 | I | A |
| 2 | Justin | delhi | [{"property_addr":"some address in delhi","location":"delhi"}] | 2018-09-05 08:36:39.023 | I | A |
+---------------------+--------------------------+-------------------------+---------------------------------------------------------------------------------------------------------+-----------------------------------+------------------------------------+
Когда я запускаю запрос, приведенный ниже, он должен получить мне 1 запись, которая была изменена, т. Е .:
+---------------------+--------------------------+-------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-----------------------------------+------------------------------------+
| customer_detail.id | customer_detail.name | customer_detail.city | customer_detail.properties_owned | customer_detail.audit_insterted_ts | customer_detail.audit_dml_action | customer_detail.audit_active_flag |
+---------------------+--------------------------+-------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-----------------------------------+------------------------------------+
| 1 | Aiman Sarosh | kolkata | [{"property_addr":"H1 Block Saltlake","location":"kolkata"},{"property_addr":"New Property Added Saltlake","location":"kolkata"}] | 2018-09-05 07:15:10.321 | U | A |
+---------------------+--------------------------+-------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-----------------------------------+------------------------------------+
В основном, элемент {"property_addr":"New Property Added Saltlake","location":"kolkata"}
был добавлен в столбец массива properties_owned
для записи с идентификатором 1 в source
.
Запрос:
SELECT --fetch modified/updated records in source
source.id AS id,
source.name AS name,
source.city AS city,
source.properties_owned AS properties_owned,
current_timestamp() AS audit_insterted_ts,
'U' AS audit_dml_action,
'A' AS audit_active_flag
FROM source.customer_detail source
INNER JOIN target.customer_detail jrnl
ON source.id=jrnl.id
WHERE source.name!=jrnl.name
OR source.city!=jrnl.city
OR source.properties_owned!=jrnl.properties_owned
Но выдает ошибку:
Error: Error while compiling statement: FAILED: SemanticException [Error 10016]: Line 14:3 Argument type mismatch 'properties_owned': The 1st argument of NOT EQUAL is expected to a primitive type, but list is found (state=42000,code=10016)
Как сравнить два столбца в предложении WHERE со сложными типами данных при использовании JOINS?
Я могу использовать .POS
и .ITEM
, но это не поможет, поскольку мой столбец представляет собой массив структуры , а длина массива может быть разной.