Я настраиваю Elasticsearch 3-узловый кластер с докером. Это мой составной файл докера:
version: '2.0'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.3.0
environment:
- cluster.name=test-cluster
- node.name=elastic_1
- ES_JAVA_OPTS=-Xms512m -Xmx512m
- bootstrap.memory_lock=true
- discovery.zen.minimum_master_nodes=2
- discovery.zen.ping.unicast.hosts=elasticsearch,elasticsearch2,elasticsearch3
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- test_es_cluster_data:/usr/share/elasticsearch/data
networks:
- esnet
elasticsearch2:
extends:
file: ./docker-compose.yml
service: elasticsearch
environment:
- node.name=elastic_2
volumes:
- test_es_cluster2_data:/usr/share/elasticsearch/data
elasticsearch3:
extends:
file: ./docker-compose.yml
service: elasticsearch
environment:
- node.name=elastic_3
volumes:
- test_es_cluster3_data:/usr/share/elasticsearch/data
volumes:
test_es_cluster_data:
test_es_cluster2_data:
test_es_cluster3_data:
networks:
esnet:
Когда кластер запущен, я убиваю master (astic_1), чтобы проверить аварийное переключение. Я ожидаю, что будет выбран новый мастер, а кластер должен постоянно отвечать на запросы на чтение.
Хорошо, мастер выбран, но кластер не отвечает довольно долго (~ 45 с).
Пожалуйста, найдите журналы изastic_2 иastic_3 после того, как мастер остановлен (docker stop escluster_elasticsearch_1):
elastic_2:
...
[2018-07-04T14:47:04,495][INFO ][o.e.d.z.ZenDiscovery ] [elastic_2] master_left [{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300}], reason [shut_down]
...
[2018-07-04T14:47:04,509][WARN ][o.e.c.NodeConnectionsService] [elastic_2] failed to connect to node {elastic_1}{...}{172.24.0.3}{172.24.0.3:9300} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [elastic_1][172.24.0.3:9300] connect_exception
...
[2018-07-04T14:47:07,565][INFO ][o.e.c.s.ClusterApplierService] [elastic_2] detected_master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300}, reason: apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [4]])
[2018-07-04T14:47:35,301][WARN ][r.suppressed ] path: /_cat/health, params: {}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
...
[2018-07-04T14:47:53,933][WARN ][o.e.c.s.ClusterApplierService] [elastic_2] cluster state applier task [apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [4]])] took [46.3s] above the warn threshold of 30s
[2018-07-04T14:47:53,934][INFO ][o.e.c.s.ClusterApplierService] [elastic_2] removed {{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300},}, reason: apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [5]])
[2018-07-04T14:47:56,931][WARN ][o.e.t.TransportService ] [elastic_2] Received response for a request that has timed out, sent [48367ms] ago, timed out [18366ms] ago, action [internal:discovery/zen/fd/master_ping], node [{elastic_3}{...}{172.24.0.4}{172.24.0.4:9300}], id [1035]
elastic_3:
[2018-07-04T14:47:04,494][INFO ][o.e.d.z.ZenDiscovery ] [elastic_3] master_left [{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300}], reason [shut_down]
...
[2018-07-04T14:47:04,519][WARN ][o.e.c.NodeConnectionsService] [elastic_3] failed to connect to node {elastic_1}{...}{172.24.0.3}{172.24.0.3:9300} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [elastic_1][172.24.0.3:9300] connect_exception
...
[2018-07-04T14:47:07,550][INFO ][o.e.c.s.MasterService ] [elastic_3] zen-disco-elected-as-master ([1] nodes joined)[, ], reason: new_master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300}
[2018-07-04T14:47:35,026][WARN ][r.suppressed ] path: /_cat/nodes, params: {v=}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
...
[2018-07-04T14:47:37,560][WARN ][o.e.d.z.PublishClusterStateAction] [elastic_3] timed out waiting for all nodes to process published state [4] (timeout [30s], pending nodes: [{elastic_2}{...}{172.24.0.2}{172.24.0.2:9300}])
[2018-07-04T14:47:37,561][INFO ][o.e.c.s.ClusterApplierService] [elastic_3] new_master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300}, reason: apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [4] source [zen-disco-elected-as-master ([1] nodes joined)[, ]]])
[2018-07-04T14:47:41,021][WARN ][o.e.c.s.MasterService ] [elastic_3] cluster state update task [zen-disco-elected-as-master ([1] nodes joined)[, ]] took [33.4s] above the warn threshold of 30s
[2018-07-04T14:47:41,022][INFO ][o.e.c.s.MasterService ] [elastic_3] zen-disco-node-failed({elastic_1}{...}{172.24.0.3}{172.24.0.3:9300}), reason(transport disconnected), reason: removed {{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300},}
[2018-07-04T14:47:56,929][INFO ][o.e.c.s.ClusterApplierService] [elastic_3] removed {{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300},}, reason: apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [5] source [zen-disco-node-failed({elastic_1}{...}{172.24.0.3}{172.24.0.3:9300}), reason(transport disconnected)]])
Почему кластеру требуется так много времени для стабилизации и ответа на запросы?
Удивительно, что:
а) избран новый мастер (astic_3):
[2018-07-04T14:47:07,550][INFO ] ... [elastic_3] zen-disco-elected-as-master ([1] nodes joined)[, ], reason: new_master {elastic_3}...
б) затем она определяетсяastic_2:
[2018-07-04T14:47:07,565][INFO ] ... [elastic_2] detected_master {elastic_3}...
c) затем мастер останавливается при ожидании обработки опубликованного состояния:
[2018-07-04T14:47:37,560][WARN ] ... [elastic_3] timed out waiting for all nodes to process published state [4] (timeout [30s], pending nodes: [{elastic_2}...])
d) эластичный_2 применяет состояние кластера с предупреждением:
[2018-07-04T14:47:53,933][WARN ] ... [elastic_2] cluster state applier task [apply cluster state (from master [master {elastic_3}...])] took [46.3s] above the warn threshold of 30s
Что может вызвать тайм-аут (с)? Все это выполняется на локальной машине (нет проблем с сетью). Я что-то пропустил?
Между тем, запрос обоих параметровastic_2 иastic_3 заканчивается исключением MasterNotDiscoveredException. Согласно документации, ожидается ответ кластера (https://www.elastic.co/guide/en/elasticsearch/reference/6.3/modules-discovery-zen.html#no-master-block).
Кто-нибудь испытывал это? Буду признателен за любые советы по этому вопросу.