Создайте последовательность иерархии, используя GraphX ​​spark lib для соединения Hive ниже - PullRequest
0 голосов
/ 26 июня 2018

Ниже id образца набора данных для транзакций, в которых "t_id" и "parent_id" имеют отношения зависимости.

t_id , first_name , parent_id , количество , dept_id , sal , datetime_updated

1       Jared       None        1000    5       4088908   13/10/2017
2       Jared       1           -5000   1       8033313   17/10/2018
3       Jared       2           1000    5       17373148  23/07/2018
4       Tucker      None        10000   3       16320817  08/09/2018
5       Tucker      4           -10000  2       5094970   24/08/2017
6       Tucker      5           5000    1       7435169   09/11/2018
7       Tucker      5           -2500   5       7859621   21/12/2018
8       Tucker      4           3000    2       5639934   14/07/2018

Используемый запрос ниже

select 
t1.t_id ,
t1.first_name,
t1.amount,
t1.parent_id,
t2.t_id ,
t2.first_name,
t2.amount,
t2.parent_id,
t3.t_id ,
t3.first_name,
t3.amount,
t3.parent_id,
t4.t_id ,
t4.first_name,
t4.amount,
t4.parent_id
from Transactions t1
left join Transactions t2
on t1.parent_id = t2.t_id
left join Transactions t3
on t2.parent_id = t3.t_id
left join Transactions t4
on t3.parent_id = t4.t_id;

Выход для вышеприведенного запроса

+------+------------+--------+-----------+------+------------+--------+-----------+------+------------+--------+-----------+------+------------+--------+-----------+
| t_id | first_name | amount | parent_id | t_id | first_name | amount | parent_id | t_id | first_name | amount | parent_id | t_id | first_name | amount | parent_id |
+------+------------+--------+-----------+------+------------+--------+-----------+------+------------+--------+-----------+------+------------+--------+-----------+
|    1 | Jared      |   1000 |         0 | NULL | NULL       |   NULL |      NULL | NULL | NULL       |   NULL |      NULL | NULL | NULL       |   NULL |      NULL |
|    2 | Jared      |  -5000 |         1 |    1 | Jared      |   1000 |         0 | NULL | NULL       |   NULL |      NULL | NULL | NULL       |   NULL |      NULL |
|    3 | Jared      |   1000 |         2 |    2 | Jared      |  -5000 |         1 |    1 | Jared      |   1000 |         0 | NULL | NULL       |   NULL |      NULL |
|    4 | Tucker     |  10000 |         0 | NULL | NULL       |   NULL |      NULL | NULL | NULL       |   NULL |      NULL | NULL | NULL       |   NULL |      NULL |
|    5 | Tucker     | -10000 |         4 |    4 | Tucker     |  10000 |         0 | NULL | NULL       |   NULL |      NULL | NULL | NULL       |   NULL |      NULL |
|    6 | Tucker     |   5000 |         5 |    5 | Tucker     | -10000 |         4 |    4 | Tucker     |  10000 |         0 | NULL | NULL       |   NULL |      NULL |
|    7 | Tucker     |  -2500 |         5 |    5 | Tucker     | -10000 |         4 |    4 | Tucker     |  10000 |         0 | NULL | NULL       |   NULL |      NULL |
|    8 | Thane      |   3000 |         4 |    4 | Tucker     |  10000 |         0 | NULL | NULL       |   NULL |      NULL | NULL | NULL       |   NULL |      NULL |
|    9 | Nicholas   |   1000 |         0 | NULL | NULL       |   NULL |      NULL | NULL | NULL       |   NULL |      NULL | NULL | NULL       |   NULL |      NULL |
|   10 | Mason      |   2000 |         0 | NULL | NULL       |   NULL |      NULL | NULL | NULL       |   NULL |      NULL | NULL | NULL       |   NULL |      NULL |
|   11 | Noah       |   5000 |         0 | NULL | NULL       |   NULL |      NULL | NULL | NULL       |   NULL |      NULL | NULL | NULL       |   NULL |      NULL |
+------+------------+--------+-----------+------+------------+--------+-----------+------+------------+--------+-----------+------+------------+--------+-----------+

Вопрос / выпуск

I want generate the same output as mention above results, 
but I cannot use the above join condition 
as it is failing over larger data set when working on spark-sql.

Is there any other way I can optimise the above query to generate same 
kind of data.
...