Clickhouse SQL: преобразование данных из длинного формата в широкоформатный - PullRequest
0 голосов
/ 26 сентября 2019

Я использую Clickhouse SQL диалект.После декомпозиции массива у меня есть данные в следующем формате.

|----- |---------------------|----------------|------------------|
|  id  |      timestamp      |  property_key  |  property_value  |
|----- |---------------------|----------------|------------------|
|  01  | 2019-09-25 16:24:38 |     query      |     Palmera      |
|------|---------------------|----------------|------------------|
|  01  | 2019-09-25 16:24:38 |   found_items  |       10         |
|------|---------------------|----------------|------------------|
|  02  | 2019-09-25 13:11:09 |     query      |     pigeo        |
|------|---------------------|----------------|------------------|
|  02  | 2019-09-25 13:11:09 |   found_items  |        0         |
|------|---------------------|----------------|------------------|
|  03  | 2019-09-25 16:08:13 |     query      |     harmon       |
|------|---------------------|----------------|------------------|
|  03  | 2019-09-25 16:08:13 |   found_items  |       17         |
|------|---------------------|----------------|------------------|

Я получил такой результат с запросом

SELECT id, timestamp, 
properties.key AS property_key, 
properties.value as property_value
FROM (
SELECT 
  rowNumberInAllBlocks() as id,
  timestamp,
  properties.key,
  properties.value
FROM database.table
WHERE timestamp BETWEEN toDateTime('2019-09-16 11:26:56') 
AND toDateTime('2019-09-26 11:26:56')
ORDER BY timestamp)
ARRAY JOIN properties
WHERE
properties.key IN ('query', 'found_items')

Мне нужно извлечь запросы, чей found_items равен 0. Я не могу получить, как изменить форму данных издлинный формат широкоформатный.Итак, ожидаемый результат следующий.

|----- |---------------------|-----------------|---------------|
|  id  |      timestamp      |     query       |  found_items  |
|----- |---------------------|-----------------|---------------|
|  02  | 2019-09-25 13:11:09 |     pigeo       |       0       |
|------|---------------------|-----------------|---------------|
|  15  | 2019-09-25 16:08:13 |     coche       |       0       |
|------|---------------------|-----------------|---------------|
|  27  | 2019-09-16 13:19:46 | panitos pampers |       0       |
|------|---------------------|-----------------|---------------|

ИЛИ

|----- |---------------------|----------------|------------------|
|  id  |      timestamp      |  property_key  |  property_value  |
|----- |---------------------|----------------|------------------|
|  02  | 2019-09-25 13:11:09 |     query      |     pigeo        |
|------|---------------------|----------------|------------------|
|  15  | 2019-09-25 16:08:13 |     query      |     coche        |
|------|---------------------|----------------|------------------|
|  27  | 2019-09-16 13:19:46 |     query      |  panitos pampers |
|------|---------------------|----------------|------------------|

1 Ответ

0 голосов
/ 27 сентября 2019

Попробуйте этот запрос:

SELECT 
  id, 
  groupArray(timestamp)[1] timestamp,
  groupArray(properties.key)[1] property_key,
  groupArray(properties.value) property_value  
FROM (
  SELECT 
    rowNumberInAllBlocks() as id,
    timestamp,
    properties.key,
    properties.value
  FROM test.test_011
  WHERE timestamp BETWEEN toDateTime('2019-09-16 11:26:56') AND toDateTime('2019-09-26 11:26:56') 
    AND properties.value[indexOf(properties.key, 'found_items')] = '0'
  ORDER BY timestamp)
ARRAY JOIN properties
WHERE properties.key IN ('query' /*, ..*/)
GROUP BY id, properties.key
ORDER BY id

/* Result
┌─id─┬───────────timestamp─┬─property_key─┬─property_value────────┐
│  0 │ 2019-09-25 13:11:09 │ query        │ ['pigeo']             │
│  1 │ 2019-09-16 13:19:46 │ query        │ ['panitos','pampers'] │
└────┴─────────────────────┴──────────────┴───────────────────────┘
*/

/* prepare test data */

CREATE TABLE test.test_011 (
  timestamp DateTime,
  properties Nested(key String, value String)
) ENGINE = Memory;

INSERT INTO test.test_011
VALUES 
  (toDateTime('2019-09-25 16:24:38'),  ['query', 'found_items'], ['Palmera', '10']),
  (toDateTime('2019-09-25 13:11:09'),  ['query', 'found_items'], ['pigeo', '0']),
  (toDateTime('2019-09-25 16:08:13'),  ['query', 'found_items'], ['harmon', '17']),
  (toDateTime('2019-09-16 13:19:46'), ['found_items', 'query', 'query'], ['0', 'panitos', 'pampers']),
  (toDateTime('2019-09-25 16:22:38'),  ['query', 'query'], ['test', 'test']);
...