Question

Я использую Azure HDInsights. Мои рабочие процессы Ooz ie были написаны для использования map-Reduce, и они долго работали нормально. Но недавно задания начали давать сбой с приведенным ниже журналом

WARN HiveActionExecutor:523 - SERVER[hn1-xxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxx.dx.internal.cloudapp.net] USER[admin] GROUP[-] TOKEN[] APP[cmd-etl-wf] JOB[0000000-191226081559399-oozie-oozi-W] ACTION[0000000-191226081559399-oozie-oozi-W@cmd-first-run-preparation] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.HiveMain], main() threw exception, org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1577348158861_0003 failed 2 times (global limit =5; local limit is =2) due to AM Container for appattempt_1577348158861_0003_000002 exited with  exitCode: 1
For more detailed output, check the application tracking page: http://hn0-xxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxx.dx.internal.cloudapp.net:8088/cluster/app/application_1577348158861_0003 Then click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e01_1577348158861_0003_02_000001
Exit code: 1

Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :

Failing this attempt. Failing the application.
2019-12-26 08:47:41,519  WARN HiveActionExecutor:523 - SERVER[hn1-xxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxx.dx.internal.cloudapp.net] USER[admin] GROUP[-] TOKEN[] APP[cmd-etl-wf] JOB[0000000-191226081559399-oozie-oozi-W] ACTION[0000000-191226081559399-oozie-oozi-W@cmd-first-run-preparation] Launcher exception: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1577348158861_0003 failed 2 times (global limit =5; local limit is =2) due to AM Container for appattempt_1577348158861_0003_000002 exited with  exitCode: 1
For more detailed output, check the application tracking page: http://hn0-xxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxx.dx.internal.cloudapp.net:8088/cluster/app/application_1577348158861_0003 Then click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e01_1577348158861_0003_02_000001
Exit code: 1

Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :

Failing this attempt. Failing the application.
java.lang.RuntimeException: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1577348158861_0003 failed 2 times (global limit =5; local limit is =2) due to AM Container for appattempt_1577348158861_0003_000002 exited with  exitCode: 1
For more detailed output, check the application tracking page: http://hn0-xxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxx.dx.internal.cloudapp.net:8088/cluster/app/application_1577348158861_0003 Then click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e01_1577348158861_0003_02_000001
Exit code: 1

Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :

Failing this attempt. Failing the application.
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:582)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
    at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:310)
    at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:287)
    at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:75)
    at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:65)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:231)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1577348158861_0003 failed 2 times (global limit =5; local limit is =2) due to AM Container for appattempt_1577348158861_0003_000002 exited with  exitCode: 1
For more detailed output, check the application tracking page: http://hn0-xxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxx.dx.internal.cloudapp.net:8088/cluster/app/application_1577348158861_0003 Then click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e01_1577348158861_0003_02_000001
Exit code: 1

Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :

Failing this attempt. Failing the application.
    at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:699)
    at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:218)
    at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:116)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:579)
    ... 19 more

2019-12-26 08:47:41,600  INFO HiveActionExecutor:520 - SERVER[hn1-xxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxx.dx.internal.cloudapp.net] USER[admin] GROUP[-] TOKEN[] APP[cmd-etl-wf] JOB[0000000-191226081559399-oozie-oozi-W] ACTION[0000000-191226081559399-oozie-oozi-W@cmd-first-run-preparation] Action ended with external status [FAILED/KILLED]
2019-12-26 08:47:41,677  INFO ActionEndXCommand:520 - SERVER[hn1-xxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxx.dx.internal.cloudapp.net] USER[admin] GROUP[-] TOKEN[] APP[cmd-etl-wf] JOB[0000000-191226081559399-oozie-oozi-W] ACTION[0000000-191226081559399-oozie-oozi-W@cmd-first-run-preparation] ERROR is considered as FAILED for SLA
2019-12-26 08:47:41,906  INFO ActionStartXCommand:520 - SERVER[hn1-xxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxx.dx.internal.cloudapp.net] USER[admin] GROUP[-] TOKEN[] APP[cmd-etl-wf] JOB[0000000-191226081559399-oozie-oozi-W] ACTION[0000000-191226081559399-oozie-oozi-W@general-fail] Start action [0000000-191226081559399-oozie-oozi-W@general-fail] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2019-12-26 08:47:41,906  INFO KillActionExecutor:520 - SERVER[hn1-xxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxx.dx.internal.cloudapp.net] USER[admin] GROUP[-] TOKEN[] APP[cmd-etl-wf] JOB[0000000-191226081559399-oozie-oozi-W] ACTION[0000000-191226081559399-oozie-oozi-W@general-fail] Starting action
2019-12-26 08:47:41,906  INFO ActionStartXCommand:520 - SERVER[hn1-xxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxx.dx.internal.cloudapp.net] USER[admin] GROUP[-] TOKEN[] APP[cmd-etl-wf] JOB[0000000-191226081559399-oozie-oozi-W] ACTION[0000000-191226081559399-oozie-oozi-W@general-fail] [***0000000-191226081559399-oozie-oozi-W@general-fail***]Action status=DONE
2019-12-26 08:47:41,907  INFO ActionStartXCommand:520 - SERVER[hn1-xxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxx.dx.internal.cloudapp.net] USER[admin] GROUP[-] TOKEN[] APP[cmd-etl-wf] JOB[0000000-191226081559399-oozie-oozi-W] ACTION[0000000-191226081559399-oozie-oozi-W@general-fail] [***0000000-191226081559399-oozie-oozi-W@general-fail***]Action updated in DB!
2019-12-26 08:47:41,970  INFO KillActionExecutor:520 - SERVER[hn1-xxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxx.dx.internal.cloudapp.net] USER[admin] GROUP[-] TOKEN[] APP[cmd-etl-wf] JOB[0000000-191226081559399-oozie-oozi-W] ACTION[0000000-191226081559399-oozie-oozi-W@general-fail] Action ended with external status [OK]

Фрагмент из моего рабочего процесса ooz ie, как показано ниже

<?xml version="1.0" encoding="utf-8"?>
<!--This is a dynamically generated file, do not edit directly.-->
<workflow-app name="cmd-etl-wf" xmlns="uri:oozie:workflow:0.2">
 <start to="cmd-first-run-preparation" />
 <kill name="general-fail">
  <message>Workflow general failure. Error message: ${wf:errorMessage(wf:lastErrorNode())}</message>
 </kill>
 <action name="rollback-fail">
  <hive xmlns="uri:oozie:hive-action:0.2">
   <job-tracker>${jobTracker}</job-tracker>
   <name-node>${nameNode}</name-node>
   <configuration>
    <property>
     <name>mapred.compress.map.output</name>
     <value>true</value>
    </property>
    <property>
     <name>oozie.launcher.mapred.job.queue.name</name>
     <value>joblauncher</value>
    </property>
    <property>
     <name>mapred.job.queue.name</name>
     <value>default</value>
    </property>
   </configuration>
   <script>hive/cmd-rollback-after-failure.hql</script>
   <param>environmentKey=prod</param>
   <param>azureStorageAccount=psclasprodlinux</param>
  </hive>
  <ok to="general-fail" />
  <error to="general-fail" />
 </action>
 <action name="cmd-first-run-preparation">
  <hive xmlns="uri:oozie:hive-action:0.2">
   <job-tracker>${jobTracker}</job-tracker>
   <name-node>${nameNode}</name-node>
   <configuration>
    <property>
     <name>mapred.compress.map.output</name>
     <value>true</value>
    </property>
    <property>
     <name>oozie.launcher.mapred.job.queue.name</name>
     <value>joblauncher</value>
    </property>
    <property>
     <name>mapred.job.queue.name</name>
     <value>default</value>
    </property>
   </configuration>
   <script>hive/cmd-first-run-preparation.hql</script>
   <param>environmentKey=prod</param>
   <param>azureStorageAccount=prodlinux</param>
  </hive>
  <ok to="cmd-roll-shared-tables" />
  <error to="general-fail" />
 </action>

Сценарий куста запускается, как показано ниже

SET hivevar:tablePrefix=current;
SET hivevar:previousTablePrefix=previous;

CREATE DATABASE IF NOT EXISTS ${environmentKey}_env
    COMMENT 'Database for CMD ETL environment ${environmentKey}'
    LOCATION 'wasbs://hadoop@${azureStorageAccount}.blob.core.windows.net/hive/warehouse/${environmentKey}_env';

USE ${environmentKey}_env;

--<<Regular Hive script follows>>

Поскольку ошибка указывает на то, что проблема связана с сеансом Tez, я изменил механизм выполнения по умолчанию для куста на mapreduce из Tez в портале Ambari и выполнил задания. Но выполнение заданий заняло больше времени и завершилось с ошибкой.

Поскольку задания работали до последних нескольких недель, я предполагаю, что проблема связана с изменением версии различных компонентов в кластере HDInsight. Пожалуйста, предложите, как решить проблему.

При запуске запроса куста из рабочего процесса Ooz ie возникает ошибка отключения сеанса Tez.

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 0 ]

При запуске запроса куста из рабочего процесса Ooz ie возникает ошибка отключения сеанса Tez.

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 0 ]

Похожие темы