Docker рой не будет перезапущен после потери недавно добавленного менеджера - PullRequest
0 голосов
/ 08 января 2020

У меня есть рой с одним узлом (A), в котором есть пара долго работающих сервисов. Недавно я добавил новый узел (B), первоначально как рабочий, но затем повысил его до менеджера. Однако теперь рой сломан и docker на А просто не перезапустится. Я восстановил новый добавленный узел B, удалив на нем /var/lib/docker.

Проблема началась, когда я запустил это в узле B:

# docker service create --name storage-2 --hostname minio-2.media-server --constraint 'node.labels.storage==true' --publish published=9020,target=9000 --replicas 1 minio/minio server /mnt/ssd-2/minio-2.media-server/storage/
owqwx29gtr5v2ozjrarmf3t08
overall progress: 0 out of 1 tasks 
1/1:   
Error response from daemon: rpc error: code = DeadlineExceeded desc = context deadline exceeded
root@media-server:~# docker service ls
Error response from daemon: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online.

Я попытался присоединить узел B к рою из A путем повторного применения исходного токена соединения, но я просто получаю следующий ответ:

Error response from daemon: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 192.168.1.148:2377: connect: connection refused"

Теперь давайте перейдем к узлу A: Это мой daemon.json для этого узла:

# cat /etc/docker/daemon.json
{
 "runtimes": {
   "nvidia": {
     "path": "/usr/bin/nvidia-container-runtime",
     "runtimeArgs": []
   }
 },
 "default-runtime": "nvidia",
 "userns-remap": "my-user",
 "data-root": "/opt/var/docker/"
}

Когда я делаю sudo systemctl restart docker, это типичный Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.

Если я запускаю dockerd --debug, я получаю следующее:

time="2020-01-07T16:07:47.940589246-05:00" level=info msg="User namespaces: ID ranges will be mapped to subuid/subgid ranges of: my-user:my-user"
time="2020-01-07T16:07:47.940649613-05:00" level=debug msg="Creating user namespaced daemon root: /opt/var/docker/1000.999"
time="2020-01-07T16:07:47.940969396-05:00" level=debug msg="Listener created for HTTP on unix (/var/run/docker.sock)"
time="2020-01-07T16:07:47.941193653-05:00" level=info msg="libcontainerd: started new docker-containerd process" pid=3731
time="2020-01-07T16:07:47-05:00" level=info msg="starting containerd" module=containerd revision=773c489c9c1b21a6d78b5c538cd395416ec50f88 version=v1.0.3 
time="2020-01-07T16:07:47-05:00" level=debug msg="changing OOM score to -500" module=containerd 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.content.v1.content"..." module=containerd type=io.containerd.content.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.btrfs"..." module=containerd type=io.containerd.snapshotter.v1 
time="2020-01-07T16:07:47-05:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.btrfs" error="path /opt/var/docker/1000.999/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter" module=containerd 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.overlayfs"..." module=containerd type=io.containerd.snapshotter.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.metadata.v1.bolt"..." module=containerd type=io.containerd.metadata.v1 
time="2020-01-07T16:07:47-05:00" level=warning msg="could not use snapshotter btrfs in metadata plugin" error="path /opt/var/docker/1000.999/containerd/daemon/io.containerd.snapshotter.v1.btrfs must be a btrfs filesystem to be used with the btrfs snapshotter" module="containerd/io.containerd.metadata.v1.bolt" 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.differ.v1.walking"..." module=containerd type=io.containerd.differ.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.gc.v1.scheduler"..." module=containerd type=io.containerd.gc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.containers"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.content"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.diff"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.events"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.healthcheck"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.images"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.leases"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.namespaces"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.snapshots"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.monitor.v1.cgroups"..." module=containerd type=io.containerd.monitor.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.runtime.v1.linux"..." module=containerd type=io.containerd.runtime.v1 
time="2020-01-07T16:07:47-05:00" level=debug msg="loading tasks in namespace" module="containerd/io.containerd.runtime.v1.linux" namespace=plugins.moby 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.tasks"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.version"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg="loading plugin "io.containerd.grpc.v1.introspection"..." module=containerd type=io.containerd.grpc.v1 
time="2020-01-07T16:07:47-05:00" level=info msg=serving... address="/var/run/docker/containerd/docker-containerd-debug.sock" module="containerd/debug" 
time="2020-01-07T16:07:47-05:00" level=info msg=serving... address="/var/run/docker/containerd/docker-containerd.sock" module="containerd/grpc" 
time="2020-01-07T16:07:47-05:00" level=info msg="containerd successfully booted in 0.002536s" module=containerd 
time="2020-01-07T16:07:47.951820862-05:00" level=info msg="User namespaces: ID ranges will be mapped to subuid/subgid ranges of: my-user:my-user"
time="2020-01-07T16:07:47.951973737-05:00" level=debug msg="Golang's threads limit set to 230760"
time="2020-01-07T16:07:47.952395645-05:00" level=debug msg="processing event stream" module=libcontainerd namespace=plugins.moby
time="2020-01-07T16:07:47.961837504-05:00" level=debug msg="restored container" alive=true container=36d74b0ce14aef040e5994573e42909679ce20e6d8c7e21c9a5ccfa82b9ef38c module=libcontainerd namespace=plugins.moby pid=1966
time="2020-01-07T16:07:47.993831037-05:00" level=error msg="failed to enable plugin 'vieux/sshfs:latest': error creating containerd container: id already in use"
time="2020-01-07T16:07:47.993929990-05:00" level=debug msg="Using default logging driver json-file"
time="2020-01-07T16:07:47.994191355-05:00" level=debug msg="[graphdriver] priority list: [btrfs zfs overlay2 aufs overlay devicemapper vfs]"
time="2020-01-07T16:07:48.003990866-05:00" level=debug msg="backingFs=extfs,  projectQuotaSupported=false"
time="2020-01-07T16:07:48.004000013-05:00" level=info msg="[graphdriver] using prior storage driver: overlay2"
time="2020-01-07T16:07:48.004006714-05:00" level=debug msg="Initialized graph driver overlay2"
time="2020-01-07T16:07:48.015593752-05:00" level=debug msg="Failed to load mount 074d0e7ec2bbf9e234b77b9f94e5bdcb2d506be83937f4704d5546606e886f00: invalid mount id value"
time="2020-01-07T16:07:48.015693673-05:00" level=debug msg="Failed to load mount 0f4801970ae34e1b3573f3361fdb30e1ae1d0ff9909c4d938a8ac366a8f13162: invalid mount id value"
time="2020-01-07T16:07:48.015838872-05:00" level=debug msg="Failed to load mount 17d3e93832d5f14b6a64dafff6ace99aa823f12c5d014697cd96257269caf49f: invalid mount id value"
time="2020-01-07T16:07:48.015852094-05:00" level=debug msg="Failed to load mount 19a2396129ae436e154ab21f8a55933918e742381db7cf7cfd8263398afdeb4c: invalid mount id value"
time="2020-01-07T16:07:48.015889445-05:00" level=debug msg="Failed to load mount 1aca7d8b62b97b368c34bf01a428c6b45d37f28d8592e5e106f613c805ef4801: invalid mount id value"
time="2020-01-07T16:07:48.015901494-05:00" level=debug msg="Failed to load mount 1cf5677cd23bdfd158b2890f5e4e3b0a33dfe0215676ffc3fb36b261223567af: invalid mount id value"
time="2020-01-07T16:07:48.015966443-05:00" level=debug msg="Failed to load mount 2b7f467b2d9afad7399456a8bb3a5dd74ee1ed6dee7bc4bb7feaeda313d0fce5: invalid mount id value"
time="2020-01-07T16:07:48.016134312-05:00" level=debug msg="Failed to load mount 3b2a3257406cb76615cc0f33698a3a5d2332e91358b526e7858883ebb00054f2: invalid mount id value"
time="2020-01-07T16:07:48.016146780-05:00" level=debug msg="Failed to load mount 449ef233ea403394a1e9ea8d4deb1a74a966dd44256f1dd6833d14853fc3d2a9: invalid mount id value"
time="2020-01-07T16:07:48.016158344-05:00" level=debug msg="Failed to load mount 45369784a3382c4a520f478b8a71a148b79c7d01ad56d1e6af58ed4b17517dcf: invalid mount id value"
time="2020-01-07T16:07:48.016287912-05:00" level=debug msg="Failed to load mount 5b14ff5fe4cdaaa8f8b491d227bf7b4354b02a9b13bba05d9da880b4bc617b96: invalid checksum digest format"
time="2020-01-07T16:07:48.016327990-05:00" level=debug msg="Failed to load mount 6719acf4f4ec553baa73b30a33d1bab4fa7054d6d33d8f28b74e9c466fcfea81: invalid mount id value"
time="2020-01-07T16:07:48.016344990-05:00" level=debug msg="Failed to load mount 6e14a906fca9c1448199bb3626c66057609d7129d6fa5b365dcf759ecf4e4b0d: invalid mount id value"
time="2020-01-07T16:07:48.016538443-05:00" level=debug msg="Failed to load mount 824070e6c6437dcd6cb7dba111e6b294f8cbb477386165cf6f2aae18ec716cb5: invalid mount id value"
time="2020-01-07T16:07:48.016628599-05:00" level=debug msg="Failed to load mount 8acc1c7d16433ff26ae83ffa5a390ef76d731136501085c6ce66fa1ebf10c595: invalid mount id value"
time="2020-01-07T16:07:48.016868507-05:00" level=debug msg="Failed to load mount 9d771c8e19874f77d617335c7418baa5912a090e1ed5f208ba50faaea1f4e6bd: invalid mount id value"
time="2020-01-07T16:07:48.016882377-05:00" level=debug msg="Failed to load mount 9de56a13c2b9d5e2e0c167bba5e95a69b1181f29c00e2a0502867afa0d89a7bf: invalid mount id value"
time="2020-01-07T16:07:48.017022459-05:00" level=debug msg="Failed to load mount a504507cb86ddd1ea10ff86bd61a3b203fb19a0d1e66a1ea443e41b125a37850: invalid mount id value"
time="2020-01-07T16:07:48.017459238-05:00" level=debug msg="Failed to load mount d0499c8ceb86c9a06e6dcacadbe94bad3a9bd3d31a2ffe1f973e14df6eb3b970: invalid mount id value"
time="2020-01-07T16:07:48.018010790-05:00" level=debug msg="Max Concurrent Downloads: 3"
time="2020-01-07T16:07:48.018016142-05:00" level=debug msg="Max Concurrent Uploads: 5"
time="2020-01-07T16:07:48-05:00" level=debug msg="garbage collected" d=1.399144ms module="containerd/io.containerd.gc.v1.scheduler" 
time="2020-01-07T16:07:48.083758903-05:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
time="2020-01-07T16:07:48.083902282-05:00" level=warning msg="Your kernel does not support swap memory limit"
time="2020-01-07T16:07:48.083925855-05:00" level=warning msg="Your kernel does not support cgroup rt period"
time="2020-01-07T16:07:48.083931706-05:00" level=warning msg="Your kernel does not support cgroup rt runtime"
time="2020-01-07T16:07:48.084397055-05:00" level=info msg="Loading containers: start."
time="2020-01-07T16:07:48.084582669-05:00" level=debug msg="processing event stream" module=libcontainerd namespace=moby

... hundreds of similar lines after...

time="2020-01-07T16:07:48.110453451-05:00" level=debug msg="Loaded container ffc8dc10f49a4555b9da0479b071e1da55882d73ebb78dea35f601edf8d4db2a, isRunning: false"
time="2020-01-07T16:07:49.562933661-05:00" level=debug msg="restoring container" container=8c74783e68992ed9f6ca0ecb37eec49bcc17c0069c5e7fc73cf10dd78ce57e87 paused=false running=false
time="2020-01-07T16:07:49.562959806-05:00" level=debug msg="restoring container" container=a63d3e14ecaed0de71e3ca874d3946ea5379bf000ac5c97abb0e99be1d057359 paused=false running=false
time="2020-01-07T16:07:49.562967753-05:00" level=debug msg="restoring container" container=a078311c21fbe7fca358acaade5346bf32c642519bbd4b457f5c21d05d87c508 paused=false running=false
time="2020-01-07T16:07:49.563120965-05:00" level=debug msg="restoring container" container=33f9cc2d2b5feabbfdf10888f30d9a63e72fe9d04cc0895635bef974c8f73206 paused=false running=false
time="2020-01-07T16:07:49.581404277-05:00" level=debug msg="restoring container" container=07b7f22f20eeecb9b3ed548aaa931ffaa8ca4a37094b34061d9ac41a35574f74 paused=false running=false



 ... hundreds of similar lines after...

time="2020-01-07T16:07:49.733381464-05:00" level=debug msg="restoring container" container=4f4f97ff30217dc80916eebc9c100c6d799a6a035a73c5c01990062368253ebf paused=false running=false
time="2020-01-07T16:07:49.741439388-05:00" level=debug msg="restoring container" container=39c168eb717931505c00852b38233f4f6bfc3c2b61549e1cabc212a67a729686 paused=false running=false
time="2020-01-07T16:07:49.741561635-05:00" level=debug msg="restoring container" container=b8041f2cf71dd5939ebf91a9a23ba02b4d65915e7e2160bc32391820e320602d paused=false running=false
time="2020-01-07T16:07:49.742282264-05:00" level=debug msg="Option Experimental: false"
time="2020-01-07T16:07:49.742323616-05:00" level=debug msg="Option DefaultDriver: bridge"
time="2020-01-07T16:07:49.742342572-05:00" level=debug msg="Option DefaultNetwork: bridge"
time="2020-01-07T16:07:49.742360955-05:00" level=debug msg="Network Control Plane MTU: 1500"
time="2020-01-07T16:07:49.760484779-05:00" level=debug msg="/sbin/iptables, [--wait -t filter -C FORWARD -j DOCKER-ISOLATION]"
time="2020-01-07T16:07:49.762934318-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -D PREROUTING -m addrtype --dst-type LOCAL -j DOCKER]"
time="2020-01-07T16:07:49.765664342-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -D OUTPUT -m addrtype --dst-type LOCAL ! --dst 127.0.0.0/8 -j DOCKER]"
time="2020-01-07T16:07:49.768232163-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -D OUTPUT -m addrtype --dst-type LOCAL -j DOCKER]"
time="2020-01-07T16:07:49.770920179-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -D PREROUTING]"
time="2020-01-07T16:07:49.773264508-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -D OUTPUT]"

... many many lines of ip routing after...

time="2020-01-07T16:07:50.011816215-05:00" level=debug msg="releasing IPv4 pools from network bridge (fbb914a007fc5a6fb39c1478503309bb03dea56479854613649657c528a44720)"
time="2020-01-07T16:07:50.011859810-05:00" level=debug msg="ReleaseAddress(LocalDefault/172.17.0.0/16, 172.17.0.1)"
time="2020-01-07T16:07:50.011894185-05:00" level=debug msg="Received set for ordinal 1, start 0, end 0, any false, release true, serial:false curr:0 \n"
time="2020-01-07T16:07:50.011917547-05:00" level=debug msg="ReleasePool(LocalDefault/172.17.0.0/16)"
time="2020-01-07T16:07:50.013635623-05:00" level=debug msg="cleanupServiceDiscovery for network:fbb914a007fc5a6fb39c1478503309bb03dea56479854613649657c528a44720"
time="2020-01-07T16:07:50.013670741-05:00" level=debug msg="cleanupServiceBindings for fbb914a007fc5a6fb39c1478503309bb03dea56479854613649657c528a44720"
time="2020-01-07T16:07:50.016796612-05:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
time="2020-01-07T16:07:50.016872068-05:00" level=debug msg="Allocating IPv4 pools for network bridge (04075f9c3581691ac86f78e25f4432eaff14e91ecd69eb1ce93240886b78e2fe)"
time="2020-01-07T16:07:50.016902902-05:00" level=debug msg="RequestPool(LocalDefault, 172.17.0.0/16, , map[], false)"
time="2020-01-07T16:07:50.016940532-05:00" level=debug msg="Received set for ordinal 0, start 0, end 0, any false, release false, serial:false curr:0 \n"
time="2020-01-07T16:07:50.016966550-05:00" level=debug msg="Received set for ordinal 65535, start 0, end 0, any false, release false, serial:false curr:0 \n"
time="2020-01-07T16:07:50.017002299-05:00" level=debug msg="RequestAddress(LocalDefault/172.17.0.0/16, 172.17.0.1, map[RequestAddressType:com.docker.network.gateway])"
time="2020-01-07T16:07:50.017035822-05:00" level=debug msg="Received set for ordinal 1, start 0, end 0, any false, release false, serial:false curr:0 \n"
time="2020-01-07T16:07:50.017381142-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -C POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE]"
time="2020-01-07T16:07:50.019760527-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -I POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE]"
time="2020-01-07T16:07:50.022422442-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -C DOCKER -i docker0 -j RETURN]"
time="2020-01-07T16:07:50.024866360-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -I DOCKER -i docker0 -j RETURN]"
time="2020-01-07T16:07:50.027363703-05:00" level=debug msg="/sbin/iptables, [--wait -D FORWARD -i docker0 -o docker0 -j DROP]"
time="2020-01-07T16:07:50.029951287-05:00" level=debug msg="/sbin/iptables, [--wait -t filter -C FORWARD -i docker0 -o docker0 -j ACCEPT]"
time="2020-01-07T16:07:50.032367948-05:00" level=debug msg="/sbin/iptables, [--wait -I FORWARD -i docker0 -o docker0 -j ACCEPT]"
time="2020-01-07T16:07:50.034909387-05:00" level=debug msg="/sbin/iptables, [--wait -t filter -C FORWARD -i docker0 ! -o docker0 -j ACCEPT]"
time="2020-01-07T16:07:50.037318217-05:00" level=debug msg="/sbin/iptables, [--wait -I FORWARD -i docker0 ! -o docker0 -j ACCEPT]"
time="2020-01-07T16:07:50.039875811-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -C PREROUTING -m addrtype --dst-type LOCAL -j DOCKER]"
time="2020-01-07T16:07:50.042530975-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -C PREROUTING -m addrtype --dst-type LOCAL -j DOCKER]"
time="2020-01-07T16:07:50.045230805-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -C OUTPUT -m addrtype --dst-type LOCAL -j DOCKER ! --dst 127.0.0.0/8]"
time="2020-01-07T16:07:50.047785924-05:00" level=debug msg="/sbin/iptables, [--wait -t nat -C OUTPUT -m addrtype --dst-type LOCAL -j DOCKER ! --dst 127.0.0.0/8]"
time="2020-01-07T16:07:50.050394918-05:00" level=debug msg="/sbin/iptables, [--wait -t filter -C FORWARD -o docker0 -j DOCKER]"
time="2020-01-07T16:07:50.052717342-05:00" level=debug msg="/sbin/iptables, [--wait -I FORWARD -o docker0 -j DOCKER]"
time="2020-01-07T16:07:50.062268276-05:00" level=debug msg="/sbin/iptables, [--wait -D FORWARD -j DOCKER-ISOLATION-STAGE-1]"
time="2020-01-07T16:07:50.064657407-05:00" level=debug msg="/sbin/iptables, [--wait -I FORWARD -j DOCKER-ISOLATION-STAGE-1]"
time="2020-01-07T16:07:50.066993453-05:00" level=debug msg="/sbin/iptables, [--wait -t filter -C DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2]"
time="2020-01-07T16:07:50.069279455-05:00" level=debug msg="/sbin/iptables, [--wait -t filter -I DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2]"
time="2020-01-07T16:07:50.071533312-05:00" level=debug msg="/sbin/iptables, [--wait -t filter -C DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP]"
time="2020-01-07T16:07:50.073783603-05:00" level=debug msg="/sbin/iptables, [--wait -t filter -I DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP]"
time="2020-01-07T16:07:50.081163012-05:00" level=debug msg="/sbin/iptables, [--wait -t filter -n -L DOCKER-USER]"
time="2020-01-07T16:07:50.083363010-05:00" level=debug msg="/sbin/iptables, [--wait -t filter -C DOCKER-USER -j RETURN]"
time="2020-01-07T16:07:50.085669384-05:00" level=debug msg="/sbin/iptables, [--wait -t filter -C FORWARD -j DOCKER-USER]"
time="2020-01-07T16:07:50.087828138-05:00" level=debug msg="/sbin/iptables, [--wait -D FORWARD -j DOCKER-USER]"
time="2020-01-07T16:07:50.090030604-05:00" level=debug msg="/sbin/iptables, [--wait -I FORWARD -j DOCKER-USER]"
time="2020-01-07T16:07:50.809775582-05:00" level=info msg="Loading containers: done."
time="2020-01-07T16:07:50.955485174-05:00" level=info msg="Docker daemon" commit=9ee9f40 graphdriver(s)=overlay2 version=18.03.1-ce
time="2020-01-07T16:07:50.967234275-05:00" level=debug msg="loaded node credentials" module=node/tls node.id=k55prdk5rxorfnwu5j6xpjpp1 node.role=swarm-manager
time="2020-01-07T16:07:50.974618962-05:00" level=debug msg="(*Agent).run" module=node/agent node.id=k55prdk5rxorfnwu5j6xpjpp1
time="2020-01-07T16:07:50.974758665-05:00" level=debug msg="next certificate renewal scheduled for 691h39m9.025246469s from now" module=node/tls node.id=k55prdk5rxorfnwu5j6xpjpp1 node.role=swarm-manager time="2020-02-05 11:47:00.000000137 -0500 EST m=+2489952.067407376"
time="2020-01-07T16:07:50.976189332-05:00" level=info msg="Listening for connections" addr="[::]:2377" module=node node.id=k55prdk5rxorfnwu5j6xpjpp1 proto=tcp
time="2020-01-07T16:07:50.976195878-05:00" level=info msg="Listening for local connections" addr=/var/run/docker/swarm/control.sock module=node node.id=k55prdk5rxorfnwu5j6xpjpp1 proto=unix
2020-01-07 16:07:50.976335 W | wal: ignored file 0000000000000014-00000000000cdd34.wal.broken in wal
time="2020-01-07T16:07:51.046891166-05:00" level=info msg="20ef9e0a7b6041dc became follower at term 117" module=raft node.id=k55prdk5rxorfnwu5j6xpjpp1
time="2020-01-07T16:07:51.046928313-05:00" level=info msg="newRaft 20ef9e0a7b6041dc [peers: [20ef9e0a7b6041dc], term: 117, commit: 2194765, applied: 2193235, lastindex: 2194765, lastterm: 117]" module=raft node.id=k55prdk5rxorfnwu5j6xpjpp1
time="2020-01-07T16:07:51.050413757-05:00" level=info msg="manager selected by agent for new session: {k55prdk5rxorfnwu5j6xpjpp1 192.168.1.148:2377}" module=node/agent node.id=k55prdk5rxorfnwu5j6xpjpp1
time="2020-01-07T16:07:51.050458302-05:00" level=error msg="error removing task " error="bucket not found" module=node/agent/worker node.id=k55prdk5rxorfnwu5j6xpjpp1
time="2020-01-07T16:07:51.050469991-05:00" level=error msg="error removing task " error="bucket not found" module=node/agent/worker node.id=k55prdk5rxorfnwu5j6xpjpp1



.... many hundreds of similar debug lines after..


time="2020-01-07T16:07:51.067790438-05:00" level=error msg="error removing task " error="bucket not found" module=node/agent/worker node.id=k55prdk5rxorfnwu5j6xpjpp1
time="2020-01-07T16:07:51.067801184-05:00" level=error msg="error removing task " error="bucket not found" module=node/agent/worker node.id=k55prdk5rxorfnwu5j6xpjpp1
time="2020-01-07T16:07:51.067809512-05:00" level=error msg="error removing task " error="bucket not found" module=node/agent/worker node.id=k55prdk5rxorfnwu5j6xpjpp1
time="2020-01-07T16:07:51.089348209-05:00" level=debug msg="(*Agent).UpdateTaskStatus" module=node/agent node.id=k55prdk5rxorfnwu5j6xpjpp1 task.id=0o21qh02tutcemqcxp9a1pnfp
time="2020-01-07T16:07:51.169918243-05:00" level=debug msg="transport: add peer 5db22c04be0bcbfc with address 192.168.1.113:2377"
time="2020-01-07T16:07:51.174166324-05:00" level=debug msg="stop transport"
panic: Copy for *types.BoolValue not implemented

goroutine 499 [running]:
github.com/docker/docker/vendor/github.com/docker/swarmkit/api/deepcopy.Copy(0x55a4645b9e20, 0xc422b78ff8, 0x55a4645b9e20, 0xc422b78fd6)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/deepcopy/copy.go:50 +0x3cb
github.com/docker/docker/vendor/github.com/docker/swarmkit/api.(*ContainerSpec).CopyFrom(0xc421b2c2c0, 0x55a46457e100, 0xc421b2c160)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/specs.pb.go:1134 +0x957
github.com/docker/docker/vendor/github.com/docker/swarmkit/api/deepcopy.Copy(0x55a46457e100, 0xc421b2c2c0, 0x55a46457e100, 0xc421b2c160)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/deepcopy/copy.go:48 +0x251
github.com/docker/docker/vendor/github.com/docker/swarmkit/api.(*TaskSpec).CopyFrom(0xc420956058, 0x55a464607460, 0xc420959d18)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/specs.pb.go:1026 +0x277
github.com/docker/docker/vendor/github.com/docker/swarmkit/api/deepcopy.Copy(0x55a464607460, 0xc420956058, 0x55a464607460, 0xc420959d18)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/deepcopy/copy.go:48 +0x251
github.com/docker/docker/vendor/github.com/docker/swarmkit/api.(*ServiceSpec).CopyFrom(0xc420956028, 0x55a4645c5940, 0xc420959ce8)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/specs.pb.go:905 +0xd4
github.com/docker/docker/vendor/github.com/docker/swarmkit/api/deepcopy.Copy(0x55a4645c5940, 0xc420956028, 0x55a4645c5940, 0xc420959ce8)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/deepcopy/copy.go:48 +0x251
github.com/docker/docker/vendor/github.com/docker/swarmkit/api.(*Service).CopyFrom(0xc420956000, 0x55a464619560, 0xc420959cc0)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/objects.pb.go:435 +0xcd
github.com/docker/docker/vendor/github.com/docker/swarmkit/api.(*Service).Copy(0xc420959cc0, 0x19)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/objects.pb.go:426 +0x62
github.com/docker/docker/vendor/github.com/docker/swarmkit/api.(*Service).CopyStoreObject(0xc420959cc0, 0x55a463bad7c9, 0x7)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/api/objects.pb.go:2323 +0x2d
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/store.(*tx).create(0xc421df22d0, 0x55a463bad7c9, 0x7, 0x55a465165ac0, 0xc420959cc0, 0xc422b78fa7, 0x9)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/store/memory.go:557 +0xca
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/store.CreateService(0x55a465164020, 0xc421df22d0, 0xc420959cc0, 0x28, 0xc421df22d0)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/store/services.go:93 +0xfb
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/store.init.8.func3(0x55a465164020, 0xc421df22d0, 0xc400000001, 0x55a4651583a0, 0xc422596b70, 0x55a46513c2a0, 0xc420400320)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/store/services.go:73 +0x12a
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/store.applyStoreAction(0x55a465164020, 0xc421df22d0, 0xc400000001, 0x55a4651583a0, 0xc422596b70, 0xc4221a3350, 0x150)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/store/memory.go:310 +0xa9
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/store.(*MemoryStore).ApplyStoreActions(0xc4202b10c0, 0xc421b2e9c0, 0x1, 0x1, 0xc421b2e900, 0x0)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/store/memory.go:289 +0x165
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/raft.(*Node).processEntry(0xc420424a00, 0x7f371c8c35c0, 0xc420ff87c0, 0x75, 0x217d4d, 0x0, 0xc421646160, 0x150, 0x150, 0x0, ...)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/raft/raft.go:1906 +0x131
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/raft.(*Node).processCommitted(0xc420424a00, 0x7f371c8c35c0, 0xc420ff87c0, 0x75, 0x217d4d, 0x0, 0xc421646160, 0x150, 0x150, 0x0, ...)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/raft/raft.go:1867 +0x139
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/raft.(*Node).Run(0xc420424a00, 0x7f371c8c35c0, 0xc420ff87c0, 0x0, 0x0)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/state/raft/raft.go:655 +0xfbd
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager.(*Manager).Run.func6(0xc4202a3d40, 0x7f371c8c35c0, 0xc4202b1900)
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/manager.go:546 +0x4c
created by github.com/docker/docker/vendor/github.com/docker/swarmkit/manager.(*Manager).Run
    /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/manager.go:545 +0x167e

Есть ли возможность восстановить рой узла A ? Я не уверен, как поступить

# docker version
Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   9ee9f40
 Built:        Thu Apr 26 07:17:20 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

1 Ответ

1 голос
/ 08 января 2020

Боюсь, что ситуация, в которой вы находитесь, достаточно критическая.

Чтобы понять, почему это произошло, вам нужно знать, как Docker работает Рой и как работает консольный плот.

Swarm использует Raft для управления всем менеджером вашего кластера. Вот очень быстро краткое изложение того, что это значит:
Каждый раз, когда вы добавляете / удаляете менеджера, между всеми менеджерами избираются лидеры. Узел, который получает большинство голосов, сначала становится лидером.

Теперь в вашем случае у вас был один менеджер, который был лидером. Вы добавили еще один, таким образом, состоялись новые выборы, и один из ваших узлов стал новым лидером. Таким образом, у вас было только два менеджера, и это действительно плохая конфигурация.
Вы потеряли узел B из кластера (возможно, когда вы удалили /var/lib/docker) и, поскольку вы только что потеряли менеджера, есть новые выборы продолжаются. Таким образом, узел «кандидат» сам по себе является лидером, однако кластер считает, что есть два менеджера, и, таким образом, узел A ожидает голосования от другого узла. Как вы можете догадаться, ответ никогда не придет, потому что B не работает, как и кластер, пока нет лидера.

Именно поэтому (частично) вам нужно иметь даже количество менеджеров и почему иметь двух из них - плохая идея. На самом деле, даже docker предупреждает вас об этом косвенно:

Вы можете использовать опцию --force в менеджере, чтобы удалить его из роя. Однако это не перенастраивает рой, чтобы гарантировать наличие достаточного количества менеджеров для поддержания кворума в рое.

Здесь использование --force или потеря узла менеджера приводит к той же ситуации.

Так что я не знаю, есть ли решение вашей проблемы, но я чувствую, что его нет (у меня была такая же проблема, и мне пришлось заново установить демон).

Надеюсь, это поможет вам в будущем!

...