группировка записей по типу журнала в Impala - PullRequest
0 голосов
/ 31 января 2020

У меня есть пример данных ниже:

+------------------------+------------------------+-------------+---------------------+----------------+---------------+-----------+-----------+---------------+----------+----------------+
| file_name_ingestion    | file_name              | id_1        | date_mov            | id_priv        | id_pub        | port_ini  | port_end  | inst_vpn      | log_type | reference_date |
+------------------------+------------------------+-------------+---------------------+----------------+---------------+-----------+-----------+---------------+----------+----------------+
| NAME_ING1              | name1                  | 29          | 2020-01-09 04:02:52 | 10.10.10.10    | 52.84.172.223 | 17920     | 18431     | 0             | 3        | 20200110       |
| NAME_ING1              | name1                  | 29          | 2020-01-10 04:02:52 | 10.10.10.10    | 52.84.172.223 | 17920     | 18431     | 0             | 1        | 20200110       |
| NAME_ING1              | name1                  | 29          | 2020-01-10 04:02:58 | 10.10.10.10    | 52.84.172.223 | 17920     | 18431     | 0             | 3        | 20200110       |
| NAME_ING1              | name1                  | 29          | 2020-01-10 04:32:41 | 10.10.10.10    | 52.84.172.223 | 17920     | 18431     | 0             | 1        | 20200110       |
| NAME_ING1              | name1                  | 29          | 2020-01-10 04:36:55 | 10.10.10.10    | 52.84.172.223 | 17920     | 18431     | 0             | 3        | 20200110       |
| NAME_ING1              | name1                  | 29          | 2020-01-10 05:22:57 | 10.10.10.10    | 52.84.172.223 | 17920     | 18431     | 0             | 1        | 20200110       |
| NAME_ING1              | name1                  | 29          | 2020-01-10 05:23:03 | 10.10.10.10    | 52.84.172.223 | 17920     | 18431     | 0             | 3        | 20200110       |
| NAME_ING1              | name1                  | 29          | 2020-01-10 05:23:01 | 10.10.10.10    | 52.84.172.223 | 17920     | 18431     | 0             | 1        | 20200110       |
| NAME_ING1              | name1                  | 29          | 2020-01-10 05:24:11 | 10.10.10.10    | 52.84.172.223 | 17920     | 18431     | 0             | 3        | 20200110       |
| NAME_ING1              | name1                  | 29          | 2020-01-10 19:33:43 | 10.10.10.10    | 52.84.172.223 | 17920     | 18431     | 0             | 1        | 20200110       |
| NAME_ING1              | name1                  | 29          | 2020-01-10 19:37:45 | 10.10.10.10    | 52.84.172.223 | 17920     | 18431     | 0             | 3        | 20200110       |
| NAME_ING1              | name1                  | 29          | 2020-01-10 19:43:22 | 10.10.10.10    | 52.84.172.223 | 17920     | 18431     | 0             | 1        | 20200110       |
| NAME_ING1              | name1                  | 29          | 2020-01-10 19:43:28 | 10.10.10.10    | 52.84.172.223 | 17920     | 18431     | 0             | 3        | 20200110       |
| NAME_ING1              | name1                  | 29          | 2020-01-11 05:23:03 | 10.10.10.10    | 52.84.172.223 | 17920     | 18431     | 0             | 1        | 20200110       |

Ожидаемый результат

+------------------------+------------------------+-------------+---------------------+---------------------+----------------+---------------+-----------+-----------+---------------+
| file_name_ingestion    | file_name              | id_1        | date_start          | date_end            | id_priv        | id_pub        | port_ini  | port_end  | inst_vpn      |
+------------------------+------------------------+-------------+---------------------+---------------------+----------------+---------------+-----------+-----------+---------------+
| NAME_ING1              | name1                  | 29          | 2000-01-01 00:00:00 | 2020-01-09 04:02:52 | 10.10.10.10    | 52.84.172.223 | 17920     | 18431     | 0             |
| NAME_ING1              | name1                  | 29          | 2020-01-10 04:02:52 | 2020-01-10 04:02:58 | 10.10.10.10    | 52.84.172.223 | 17920     | 18431     | 0             |
| NAME_ING1              | name1                  | 29          | 2020-01-10 04:32:41 | 2020-01-10 04:36:55 | 10.10.10.10    | 52.84.172.223 | 17920     | 18431     | 0             |
| NAME_ING1              | name1                  | 29          | 2020-01-10 05:22:57 | 2020-01-10 05:24:11 | 10.10.10.10    | 52.84.172.223 | 17920     | 18431     | 0             |
| NAME_ING1              | name1                  | 29          | 2020-01-10 19:33:43 | 2020-01-10 19:37:45 | 10.10.10.10    | 52.84.172.223 | 17920     | 18431     | 0             |
| NAME_ING1              | name1                  | 29          | 2020-01-10 19:43:22 | 2020-01-10 19:43:28 | 10.10.10.10    | 52.84.172.223 | 17920     | 18431     | 0             |
| NAME_ING1              | name1                  | 29          | 2020-01-11 05:23:03 | now()               | 10.10.10.10    | 52.84.172.223 | 17920     | 18431     | 0             |

Я хочу сгруппировать по log_type. Ключ - это id_priv, id_pub, port_ini и port_end. Данные должны быть упорядочены по date_mov.

...