Spark 2.3 Tree Ошибка - PullRequest
       37

Spark 2.3 Tree Ошибка

0 голосов
/ 26 мая 2018

Этот запрос основан на один шаг дальше от запроса в этой ссылке .В этом сценарии я добавляю еще 1 или 2 столбца для обработки, Spark выдает ОШИБКУ, печатая физический план запросов.

В нем говорится: Разрешенные атрибуты (ы) fnlwgt_bucketed # 152530 отсутствует это неверно, как если бы я запускал один и тот же код на менее чем 3 столбцах, где это один столбец, он работает как шарм, поэтому я могу ясно предположить, что это не ошибка в моем запросе или коде.

Это тогда ошибка нехватки памяти?Как я думаю, внутренне, поскольку в памяти много зарегистрированных таблиц, они удаляются из-за переполнения данных и удаления, это полностью мое предположение.Любое понимание этого?Кто-нибудь из вас сталкивался с подобной проблемой?

py4j.protocol.Py4JJavaError: An error occurred while calling o21.sql.
: org.apache.spark.sql.AnalysisException: Resolved attribute(s) fnlwgt_bucketed#152530 missing from occupation#17,high_income#25,fnlwgt#13,education#14,marital-status#16,relationship#18,workclass#12,sex#20,id_num#10,native_country#24,race#19,education-num#15,hours-per-week#23,age_bucketed#152432,capital-loss#22,age#11,capital-gain#21,fnlwgt_bucketed#99009 in operator !Project [id_num#10, age#11, workclass#12, fnlwgt#13, education#14, education-num#15, marital-status#16, occupation#17, relationship#18, race#19, sex#20, capital-gain#21, capital-loss#22, hours-per-week#23, native_country#24, high_income#25, age_bucketed#152432, fnlwgt_bucketed#152530, if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else UDF:bucketizer_0(cast(hours-per-week#23 as double)) AS hours-per-week_bucketed#152299]. Attribute(s) with the same name appear in the operation: fnlwgt_bucketed. Please check if the right attribute(s) are used.;;
Project [id_num#10, age#11, workclass#12, fnlwgt#13, education#14, education-num#15, marital-status#16, occupation#17, relationship#18, race#19, sex#20, capital-gain#21, capital-loss#22, hours-per-week#23, native_country#24, high_income#25, age_bucketed#48257, fnlwgt_bucketed#99009, hours-per-week_bucketed#152299, age_bucketed_WoE#152431, WoE#152524 AS fnlwgt_bucketed_WoE#152529]
+- Join Inner, (fnlwgt_bucketed#99009 = fnlwgt_bucketed#152530)
   :- SubqueryAlias bucketed
   :  +- SubqueryAlias a
   :     +- Project [id_num#10, age#11, workclass#12, fnlwgt#13, education#14, education-num#15, marital-status#16, occupation#17, relationship#18, race#19, sex#20, capital-gain#21, capital-loss#22, hours-per-week#23, native_country#24, high_income#25, age_bucketed#48257, fnlwgt_bucketed#99009, hours-per-week_bucketed#152299, WoE#152426 AS age_bucketed_WoE#152431]
   :        +- Join Inner, (age_bucketed#48257 = age_bucketed#152432)
   :           :- SubqueryAlias bucketed
   :           :  +- SubqueryAlias a
   :           :     +- Project [id_num#10, age#11, workclass#12, fnlwgt#13, education#14, education-num#15, marital-status#16, occupation#17, relationship#18, race#19, sex#20, capital-gain#21, capital-loss#22, hours-per-week#23, native_country#24, high_income#25, age_bucketed#48257, fnlwgt_bucketed#99009, if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else if (isnull(cast(hours-per-week#23 as double))) null else UDF:bucketizer_0(cast(hours-per-week#23 as double)) AS hours-per-week_bucketed#152299]
   :           :        +- Project [id_num#10, age#11, workclass#12, fnlwgt#13, education#14, education-num#15, marital-status#16, occupation#17, relationship#18, race#19, sex#20, capital-gain#21, capital-loss#22, hours-per-week#23, native_country#24, high_income#25, age_bucketed#48257, if (isnull(cast(fnlwgt#13 as double))) null else if (isnull(cast(fnlwgt#13 as double))) null else if (isnull(cast(fnlwgt#13 as double))) null else UDF:bucketizer_0(cast(fnlwgt#13 as double)) AS fnlwgt_bucketed#99009]
   :           :           +- Project [id_num#10, age#11, workclass#12, fnlwgt#13, education#14, education-num#15, marital-status#16, occupation#17, relationship#18, race#19, sex#20, capital-gain#21, capital-loss#22, hours-per-week#23, native_country#24, high_income#25, if (isnull(cast(age#11 as double))) null else if (isnull(cast(age#11 as double))) null else if (isnull(cast(age#11 as double))) null else UDF:bucketizer_0(cast(age#11 as double)) AS age_bucketed#48257]
   :           :              +- Relation[id_num#10,age#11,workclass#12,fnlwgt#13,education#14,education-num#15,marital-status#16,occupation#17,relationship#18,race#19,sex#20,capital-gain#21,capital-loss#22,hours-per-week#23,native_country#24,high_income#25] csv
   :           +- SubqueryAlias woe_table

И это продолжается.

...