После сборки кластера kops на AWS наши модули kube-system изначально запускаются, но через 5-10 минут контейнеры kubedns в модулях kube-dns не проходят проверку готовности к готовности:
kube-dns-55c9b74794-cmn5n 2/3 Running 0 10m
kube-dns-55c9b74794-qb2jb 2/3 Running 0 10m
У нас естьсуществующий кластер, работающий с той же конфигурацией в той же учетной записи AWS и VPC, который не затронут - проблема затрагивает только новые кластеры.
- k8s версия 1.11.7
- kops версия 1.12.1
- kube-dns image: k8s.gcr.io/k8s-dns-kube-dns-amd64:1.14.10
Pod события следующим образом:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 22m default-scheduler Successfully assigned kube-system/kube-dns-67964b9cfb-rdsks to ip-10-16-19-163.eu-west-2.compute.internal
Normal Pulling 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal pulling image "k8s.gcr.io/k8s-dns-kube-dns-amd64:1.14.10"
Normal Pulled 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Successfully pulled image "k8s.gcr.io/k8s-dns-kube-dns-amd64:1.14.10"
Normal Created 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Created container
Normal Started 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Started container
Normal Pulling 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal pulling image "k8s.gcr.io/k8s-dns-dnsmasq-nanny-amd64:1.14.10"
Normal Pulled 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Successfully pulled image "k8s.gcr.io/k8s-dns-dnsmasq-nanny-amd64:1.14.10"
Normal Created 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Created container
Normal Started 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Started container
Normal Pulling 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal pulling image "k8s.gcr.io/k8s-dns-sidecar-amd64:1.14.10"
Normal Started 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Started container
Normal Pulled 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Successfully pulled image "k8s.gcr.io/k8s-dns-sidecar-amd64:1.14.10"
Normal Created 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Created container
Warning Unhealthy 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60150->100.115.188.194:8081: read: connection reset by peer
Warning Unhealthy 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60176->100.115.188.194:8081: read: connection reset by peer
Warning Unhealthy 22m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60198->100.115.188.194:8081: read: connection reset by peer
Warning Unhealthy 21m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60216->100.115.188.194:8081: read: connection reset by peer
Warning Unhealthy 21m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60234->100.115.188.194:8081: read: connection reset by peer
Warning Unhealthy 21m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60254->100.115.188.194:8081: read: connection reset by peer
Warning Unhealthy 21m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60276->100.115.188.194:8081: read: connection reset by peer
Warning Unhealthy 21m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60300->100.115.188.194:8081: read: connection reset by peer
Warning Unhealthy 21m kubelet, ip-10-16-19-163.eu-west-2.compute.internal Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60326->100.115.188.194:8081: read: connection reset by peer
Warning Unhealthy 2m35s (x111 over 20m) kubelet, ip-10-16-19-163.eu-west-2.compute.internal (combined from similar events): Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:63086->100.115.188.194:8081: read: connection reset by peer
развертывание kube-dns yaml:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
k8s-addon: kube-dns.addons.k8s.io
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
name: kube-dns
namespace: kube-system
spec:
progressDeadlineSeconds: 2147483647
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: kube-dns
strategy:
rollingUpdate:
maxSurge: 10%
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
annotations:
prometheus.io/port: "10055"
prometheus.io/scrape: "true"
scheduler.alpha.kubernetes.io/critical-pod: ""
scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly",
"operator":"Exists"}]'
creationTimestamp: null
labels:
k8s-app: kube-dns
spec:
containers:
- args:
- --config-dir=/kube-dns-config
- --dns-port=10053
- --domain=cluster.local.
- --v=2
env:
- name: PROMETHEUS_PORT
value: "10055"
image: k8s.gcr.io/k8s-dns-kube-dns-amd64:1.14.10
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 5
httpGet:
path: /healthcheck/kubedns
port: 10054
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: kubedns
ports:
- containerPort: 10053
name: dns-local
protocol: UDP
- containerPort: 10053
name: dns-tcp-local
protocol: TCP
- containerPort: 10055
name: metrics
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readiness
port: 8081
scheme: HTTP
initialDelaySeconds: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /kube-dns-config
name: kube-dns-config
- args:
- -v=2
- -logtostderr
- -configDir=/etc/k8s/dns/dnsmasq-nanny
- -restartDnsmasq=true
- --
- -k
- --cache-size=1000
- --dns-forward-max=150
- --no-negcache
- --log-facility=-
- --server=/cluster.local/127.0.0.1#10053
- --server=/in-addr.arpa/127.0.0.1#10053
- --server=/in6.arpa/127.0.0.1#10053
image: k8s.gcr.io/k8s-dns-dnsmasq-nanny-amd64:1.14.10
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 5
httpGet:
path: /healthcheck/dnsmasq
port: 10054
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: dnsmasq
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
resources:
requests:
cpu: 150m
memory: 20Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/k8s/dns/dnsmasq-nanny
name: kube-dns-config
- args:
- --v=2
- --logtostderr
- --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,A
- --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,A
image: k8s.gcr.io/k8s-dns-sidecar-amd64:1.14.10
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 5
httpGet:
path: /metrics
port: 10054
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: sidecar
ports:
- containerPort: 10054
name: metrics
protocol: TCP
resources:
requests:
cpu: 10m
memory: 20Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: Default
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: kube-dns
serviceAccountName: kube-dns
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 420
name: kube-dns
optional: true
name: kube-dns-config
status:
conditions:
- lastTransitionTime: "2019-09-24T13:22:06Z"
lastUpdateTime: "2019-09-24T13:22:06Z"
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: "False"
type: Available
observedGeneration: 7
replicas: 3
unavailableReplicas: 3
updatedReplicas: 2
Эта проблема влияет на любые новые сборки кластеров во всех наших учетных записях AWS - однако в каждой учетной записи есть запущенные кластеры с одинаковой конфигурацией, которые работают безлюбые проблемы.
Модули могут подключаться к конечной точке готовности (curl не установлен в контейнере kubedns, поэтому используется wget):
kubectl -n kube-system exec -it kube-dns-55c9b74794-cmn5n -c kubedns -- wget http://100.125.236.68:8081/readiness
Connecting to 100.125.236.68:8081 (100.125.236.68:8081)
readiness 100% |*******************************|