Проблема с запуском Apache заданий Beam на удаленном кластере Flink в Kubernetes - PullRequest
1 голос
/ 13 июля 2020

У меня есть Flink SessionCluster, развернутый на удаленном кластере Kubernetes (согласно документам ), доступный по адресу <FLINK_MASTER_URL>:8081, и я пытаюсь запустить на нем задание wordcount Apache Beam.

Однако каждый раз, когда я получаю сообщение об ошибке - похоже, я не могу успешно отправить задание на выполнение. Журналы ошибок и параметры конвейера Beam приведены ниже; Буду признателен за несколько советов, как решить эту проблему (я не опытный пользователь Flink / Beam, поэтому, пожалуйста, простите любые очевидные ошибки).

Параметры конвейера:

PipelineOptions(
    "--runner=FlinkRunner",
    "--flink_master=<FLINK_MASTER_URL>:8081"
)

Журналы ошибок (усеченные):

WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
INFO:root:Using Python SDK docker image: apache/beam_python3.7_sdk:2.21.0. If the image is not available at local, we will try to pull from hub.docker.com
INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function lift_combiners at 0x7f954007e710> ====================
INFO:apache_beam.runners.portability.flink_runner:Adding HTTP protocol scheme to flink_master parameter: http://<FLINK_MASTER_URL>:8081
INFO:apache_beam.utils.subprocess_server:Using cached job server jar from https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.10-job-server/2.21.0/beam-runners-flink-1.10-job-server-2.21.0.jar
INFO:apache_beam.utils.subprocess_server:Starting service with ['java' '-jar' '/home/rjurczak/.apache_beam/cache/jars/beam-runners-flink-1.10-job-server-2.21.0.jar' '--flink-master' 'http://<FLINK_MASTER_URL>:8081' '--artifacts-dir' '/tmp/beam-tempht7lpipz/artifactsotk2otzl' '--job-port' '48375' '--artifact-port' '0' '--expansion-port' '0']
INFO:apache_beam.utils.subprocess_server:b'[main] INFO org.apache.beam.runners.fnexecution.jobsubmission.JobServerDriver - LegacyArtifactStagingService started on localhost:37645'
INFO:apache_beam.utils.subprocess_server:b'[main] INFO org.apache.beam.runners.fnexecution.jobsubmission.JobServerDriver - Java ExpansionService started on localhost:35547'
INFO:apache_beam.utilradars.subprocess_server:b'[main] INFO org.apache.beam.runners.fnexecution.jobsubmission.JobServerDriver - JobService started on localhost:48375'
INFO:apache_beam.utils.subprocess_server:b'[grpc-default-executor-0] INFO org.apache.beam.runners.flink.FlinkJobInvoker - Invoking job BeamApp-rjurczak-0713164027-2a729669_d00db59c-cda9-46be-9bd8-1b8406d155a5 with pipeline runner org.apache.beam.runners.flink.FlinkPipelineRunner@25fa0e2d'
INFO:apache_beam.utils.subprocess_server:b'[grpc-default-executor-0] INFO org.apache.beam.runners.fnexecution.jobsubmission.JobInvocation - Starting job invocation BeamApp-rjurczak-0713164027-2a729669_d00db59c-cda9-46be-9bd8-1b8406d155a5'
INFO:apache_beam.runners.portability.portable_runner:Job state changed to STOPPED
INFO:apache_beam.runners.portability.portable_runner:Job state changed to STARTING
INFO:apache_beam.runners.portability.portable_runner:Job state changed to RUNNING
INFO:apache_beam.utils.subprocess_server:b'[flink-runner-job-invoker] INFO org.apache.beam.runners.flink.FlinkPipelineRunner - Translating pipeline to Flink program.'
INFO:apache_beam.utils.subprocess_server:b'[flink-runner-job-invoker] INFO org.apache.beam.runners.flink.FlinkExecutionEnvironments - Creating a Batch Execution Environment.'
INFO:apache_beam.utilradars.subprocess_server:b'[flink-runner-job-invoker] INFO org.apache.beam.runners.flink.FlinkExecutionEnvironments - Using Flink Master URL 10.70.227.141:8081.'
INFO:apache_beam.utils.subprocess_server:b'[flink-runner-job-invoker] INFO org.apache.flink.api.java.ExecutionEnvironment - The job has 0 registered types and 0 default Kryo serializers'
INFO:apache_beam.utils.subprocess_server:b'[Flink-RestClusterClient-IO-thread-4] WARN org.apache.flink.util.ExecutorUtils - ExecutorService did not terminate in time. Shutting it down now.'
INFO:apache_beam.utils.subprocess_server:b'[flink-runner-job-invoker] ERROR org.apache.beam.runners.fnexecution.jobsubmission.JobInvocation - Error during job invocation BeamApp-rjurczak-0713164027-2a729669_d00db59c-cda9-46be-9bd8-1b8406d155a5.'
INFO:apache_beam.utils.subprocess_server:b'java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.'
INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.flink.utilradar.ExceptionUtils.rethrow(ExceptionUtils.java:199)'
INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:952)'
INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:860)'
INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.beam.runners.flink.FlinkBatchPortablePipelineTranslator$BatchTranslationContext.execute(FlinkBatchPortablePipelineTranslator.java:194)'
INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.beam.runners.flink.FlinkPipelineRunner.runPipelineWithTranslator(FlinkPipelineRunner.java:116)'
INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.beam.runners.flink.FlinkPipelineRunner.run(FlinkPipelineRunner.java:83)'
INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.beam.runners.fnexecution.jobsubmission.JobInvocation.runPipeline(JobInvocation.java:83)'radarradar
INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)'
INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)'
INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)'
INFO:apache_beam.utils.subprocess_server:b'\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)'
INFO:apache_beam.utils.subprocess_server:b'\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)'
INFO:apache_beam.utils.subprocess_server:b'\tat java.base/java.lang.Thread.run(Thread.java:834)'
INFO:apache_beam.utils.subprocess_server:b'Caused by: java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.'
INFO:apache_beam.utils.subprocess_server:b'\tat java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)'
INFO:apache_beam.utils.subprocess_server:b'\tat java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)'
INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:947)'
INFO:apache_beam.utils.subprocess_server:b'\t... 11 more'
INFO:apache_beam.utils.subprocess_server:b'Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.'
INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$7(RestClusterClient.java:359)'
INFO:apache_beam.utils.subprocess_server:b'\tat java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986)'
INFO:apache_beam.utils.subprocess_server:b'\tat java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970)'
INFO:apache_beam.utils.subprocess_server:b'\tat java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)'
INFO:apache_beam.utils.subprocess_server:b'\tat java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)'
INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$8(FutureUtils.java:274)'
INFO:apache_beam.utils.subprocess_server:b'\tat java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)'
INFO:apache_beam.utils.subprocess_server:b'\tat java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)'
INFO:apache_beam.utils.subprocess_server:b'\tat java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)'
INFO:apache_beam.utils.subprocess_server:b'\tat java.base/java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:610)'
INFO:apache_beam.utils.subprocess_server:b'\tat java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1085)'
INFO:apache_beam.utils.subprocess_server:b'\tat java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)'
INFO:apache_beam.utils.subprocess_server:b'\t... 3 more'
INFO:apache_beam.utils.subprocess_server:b'Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Failed to deserialize JobGraph.]'
INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:390)'
INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:374)'
INFO:apache_beam.utils.subprocess_server:b'\tat java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)'
INFO:apache_beam.utils.subprocess_server:b'\t... 4 more'
ERROR:root:org.apache.flink.runtime.rest.util.RestClientException: [Failed to deserialize JobGraph.]
INFO:apache_beam.utils.subprocess_server:b'[flink-runner-job-invoker] INFO org.apache.beam.runners.fnexecution.artifact.AbstractLegacyArtifactRetrievalService - Manifest at /tmp/beam-tempht7lpipz/artifactsotk2otzl/job_364b1df0-7e66-4759-997f-91f87179932b/MANIFEST has 1 artifact locations'
INFO:apache_beam.runners.portability.portable_runner:Job state changed to FAILED
Traceback (most recent call last):
  File "examples/wordcount.py", line 152, in <module>
    run()
  File "examples/wordcount.py", line 132, in run
    result.wait_until_finish()
  File "/home/rjurczak/envs/env/lib/python3.7/site-packages/apache_beam/runners/portability/portable_runner.py", line 550, in wait_until_finish
    (self._job_id, self._state, self._last_error_message()))
RuntimeError: Pipeline BeamApp-rjurczak-0713164027-2a729669_d00db59c-cda9-46be-9bd8-1b8406d155a5 failed in state FAILED: org.apache.flink.runtime.rest.util.RestClientException: [Failed to deserialize JobGraph.]

1 Ответ

0 голосов
/ 14 июля 2020

Это довольно короткий ответ. Похоже на ту же ошибку, что и здесь .

Убедитесь, что версия CLI Flink совпадает с версией вашего мастера Flink, работающего в Kubernetes.

...