У меня вопрос по поводу Флинка.Я запускаю приложение в локальном кластере с 1 TaskManager и 4 Taskslots.
После некоторого времени запуска приложения я получил сообщение об ошибке Timeout:
java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id feea6a6702a0cf960ae2847b5bd25665 timed out.
Я видел некоторыесообщения с этой темой, но любой ответ на него.Не могли бы вы помочь мне увидеть основную причину или возможные проблемы?
Я использую Flink версии 1.5.3
Кажется, что докер-контейнер менеджеров задач и JobManager останавливается, когда это происходит.
Позвольте мне добавить трассировку ошибок изЖурналы контейнера JobManager:
2019-06-09 13:31:06,300 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Socket Window NgsiEvent (ef3a860de48d54544d973754c6170d8b) switched from state FAILING to FAILED.
java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id 63dbab620797b84da023b33578478238 timed out.
at org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster.java:1609)
at org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:339)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:154)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2019-06-09 13:31:06,308 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Could not restart the job Socket Window NgsiEvent (ef3a860de48d54544d973754c6170d8b) because the restart strategy prevented it.
java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id 63dbab620797b84da023b33578478238 timed out.
at org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster.java:1609)
at org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:339)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:154)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2019-06-09 13:31:06,317 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping checkpoint coordinator for job ef3a860de48d54544d973754c6170d8b.
2019-06-09 13:31:06,322 INFO org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore - Shutting down
2019-06-09 13:31:06,331 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@16363182f31f:36715]] Caused by: [16363182f31f]
2019-06-09 13:31:06,351 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job ef3a860de48d54544d973754c6170d8b reached globally terminal state FAILED.
2019-06-09 13:31:06,434 INFO org.apache.flink.runtime.jobmaster.JobMaster - Stopping the JobMaster for job Socket Window NgsiEvent(ef3a860de48d54544d973754c6170d8b).
2019-06-09 13:31:06,447 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Suspending SlotPool.
2019-06-09 13:31:06,448 INFO org.apache.flink.runtime.jobmaster.JobMaster - Close ResourceManager connection 883e842633b0fd9a2e53ab45778581fe: JobManager is shutting down..
2019-06-09 13:31:06,449 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcActor - The rpc endpoint org.apache.flink.runtime.jobmaster.slotpool.SlotPool has not been started yet. Discarding message org.apache.flink.runtime.rpc.messages.LocalRpcInvocation until processing is started.
2019-06-09 13:31:06,457 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Disconnect job manager 00000000000000000000000000000000@akka.tcp://flink@jobmanager:6123/user/jobmanager_2 for job ef3a860de48d54544d973754c6170d8b from the resource manager.
2019-06-09 13:31:06,459 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Stopping SlotPool.
2019-06-09 13:31:06,460 INFO org.apache.flink.runtime.jobmaster.JobManagerRunner - JobManagerRunner already shutdown.
2019-06-09 13:31:16,304 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@16363182f31f:36715]] Caused by: [16363182f31f: Name or service not known]
2019-06-09 13:31:26,320 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@16363182f31f:36715]] Caused by: [16363182f31f: Name or service not known]
2019-06-09 13:31:36,286 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@16363182f31f:36715]] Caused by: [16363182f31f]
Заранее спасибо!