Flink HA автономный кластер не удалось - PullRequest
0 голосов
/ 20 декабря 2018

2 компьютера , 203,204оба запускаются jobmanager и taskmanager на каждом компьютере

master
hz203:9081
hz204:9081
slaves
hz203
hz204
flink-conf.yaml
jobmanager.rpc.port: 6123
rest.port: 9081
blob.server.port: 6124
query.server.port: 6125
web.tmpdir: /home/ctu/flink/deploy/webTmp
web.log.path: /home/ctu/flink/deploy/log
taskmanager.tmp.dirs: /home/ctu/flink/deploy/taskManagerTmp
high-availability: zookeeper
high-availability.storageDir: file:///home/ctu/flink/deploy/HA
high-availability.zookeeper.quorum: 10.0.1.79:2181
high-availability.zookeeper.path.root: /flink
high-availability.cluster-id: /flink
run ./start-cluster.sh
Starting HA cluster with 2 masters.
Starting standalonesession daemon on host hz203.
Starting standalonesession daemon on host hz204.
Starting taskexecutor daemon on host hz203.
Starting taskexecutor daemon on host hz204.
logs
2018-12-20 20:44:03,843 INFO  org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  - Starting ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/rest_server_lock'}.
2018-12-20 20:44:03,864 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Web frontend listening at http://127.0.0.1:9081.
2018-12-20 20:44:03,875 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting RPC endpoint for org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at akka://flink/user/resourcemanager .
2018-12-20 20:44:03,989 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/dispatcher .
2018-12-20 20:44:03,999 INFO  org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  - Starting ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/resource_manager_lock'}.
2018-12-20 20:44:04,008 INFO  org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  - Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.
2018-12-20 20:44:04,009 INFO  org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  - Starting ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/dispatcher_lock'}.
2018-12-20 20:44:04,010 INFO  org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  - Starting ZooKeeperLeaderRetrievalService /leader/dispatcher_lock.
2018-12-20 20:44:04,206 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: /127.0.0.1:43012
2018-12-20 20:44:04,221 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@127.0.0.1:43012] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@127.0.0.1:43012]] Caused by: [Connection refused: /127.0.0.1:43012]
2018-12-20 20:44:04,301 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: /127.0.0.1:43012
2018-12-20 20:44:04,301 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@127.0.0.1:43012] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@127.0.0.1:43012]] Caused by: [Connection refused: /127.0.0.1:43012]
2018-12-20 20:44:04,378 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: /127.0.0.1:43012
2018-12-20 20:44:04,378 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@127.0.0.1:43012] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@127.0.0.1:43012]] Caused by: [Connection refused: /127.0.0.1:43012]
2018-12-20 20:44:04,451 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: /127.0.0.1:43012
2018-12-20 20:44:04,451 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@127.0.0.1:43012] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@127.0.0.1:43012]] Caused by: [Connection refused: /127.0.0.1:43012]
2018-12-20 20:44:04,520 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: /127.0.0.1:43012
questions
`akka.tcp://flink@127.0.0.1:33567/user/resourcemanager` --- Why the 127.0.0.1 instead of the `jobmanager` ip in the `masters's` config file?

1 Ответ

0 голосов
/ 21 декабря 2018

Проблема в том, что мы исправили ошибку в версии 1.6.11.6.0 мы не учитывали параметр командной строки --host в методе ClusterEntrypoint#loadConfiguration, как вы можете видеть здесь по сравнению с кодом версии 1.6.1 .

Таким образом, обновление до последней версии 1.6.x должно решить проблему.В общем, я бы всегда рекомендовал обновиться до последней версии с исправлением ошибок, если это возможно.

...