модуль kube-dns не прошел проверку работоспособности после прохождения начальной проверки кластера kops - PullRequest
0 голосов
/ 24 сентября 2019

После сборки кластера kops на AWS наши модули kube-system изначально запускаются, но через 5-10 минут контейнеры kubedns в модулях kube-dns не проходят проверку готовности к готовности:

kube-dns-55c9b74794-cmn5n                                           2/3     Running   0          10m
kube-dns-55c9b74794-qb2jb                                           2/3     Running   0          10m

У нас естьсуществующий кластер, работающий с той же конфигурацией в той же учетной записи AWS и VPC, который не затронут - проблема затрагивает только новые кластеры.

  1. k8s версия 1.11.7
  2. kops версия 1.12.1
  3. kube-dns image: k8s.gcr.io/k8s-dns-kube-dns-amd64:1.14.10

Pod события следующим образом:

Events:
  Type     Reason     Age                    From                                                 Message
  ----     ------     ----                   ----                                                 -------
  Normal   Scheduled  22m                    default-scheduler                                    Successfully assigned kube-system/kube-dns-67964b9cfb-rdsks to ip-10-16-19-163.eu-west-2.compute.internal
  Normal   Pulling    22m                    kubelet, ip-10-16-19-163.eu-west-2.compute.internal  pulling image "k8s.gcr.io/k8s-dns-kube-dns-amd64:1.14.10"
  Normal   Pulled     22m                    kubelet, ip-10-16-19-163.eu-west-2.compute.internal  Successfully pulled image "k8s.gcr.io/k8s-dns-kube-dns-amd64:1.14.10"
  Normal   Created    22m                    kubelet, ip-10-16-19-163.eu-west-2.compute.internal  Created container
  Normal   Started    22m                    kubelet, ip-10-16-19-163.eu-west-2.compute.internal  Started container
  Normal   Pulling    22m                    kubelet, ip-10-16-19-163.eu-west-2.compute.internal  pulling image "k8s.gcr.io/k8s-dns-dnsmasq-nanny-amd64:1.14.10"
  Normal   Pulled     22m                    kubelet, ip-10-16-19-163.eu-west-2.compute.internal  Successfully pulled image "k8s.gcr.io/k8s-dns-dnsmasq-nanny-amd64:1.14.10"
  Normal   Created    22m                    kubelet, ip-10-16-19-163.eu-west-2.compute.internal  Created container
  Normal   Started    22m                    kubelet, ip-10-16-19-163.eu-west-2.compute.internal  Started container
  Normal   Pulling    22m                    kubelet, ip-10-16-19-163.eu-west-2.compute.internal  pulling image "k8s.gcr.io/k8s-dns-sidecar-amd64:1.14.10"
  Normal   Started    22m                    kubelet, ip-10-16-19-163.eu-west-2.compute.internal  Started container
  Normal   Pulled     22m                    kubelet, ip-10-16-19-163.eu-west-2.compute.internal  Successfully pulled image "k8s.gcr.io/k8s-dns-sidecar-amd64:1.14.10"
  Normal   Created    22m                    kubelet, ip-10-16-19-163.eu-west-2.compute.internal  Created container
  Warning  Unhealthy  22m                    kubelet, ip-10-16-19-163.eu-west-2.compute.internal  Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60150->100.115.188.194:8081: read: connection reset by peer
  Warning  Unhealthy  22m                    kubelet, ip-10-16-19-163.eu-west-2.compute.internal  Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60176->100.115.188.194:8081: read: connection reset by peer
  Warning  Unhealthy  22m                    kubelet, ip-10-16-19-163.eu-west-2.compute.internal  Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60198->100.115.188.194:8081: read: connection reset by peer
  Warning  Unhealthy  21m                    kubelet, ip-10-16-19-163.eu-west-2.compute.internal  Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60216->100.115.188.194:8081: read: connection reset by peer
  Warning  Unhealthy  21m                    kubelet, ip-10-16-19-163.eu-west-2.compute.internal  Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60234->100.115.188.194:8081: read: connection reset by peer
  Warning  Unhealthy  21m                    kubelet, ip-10-16-19-163.eu-west-2.compute.internal  Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60254->100.115.188.194:8081: read: connection reset by peer
  Warning  Unhealthy  21m                    kubelet, ip-10-16-19-163.eu-west-2.compute.internal  Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60276->100.115.188.194:8081: read: connection reset by peer
  Warning  Unhealthy  21m                    kubelet, ip-10-16-19-163.eu-west-2.compute.internal  Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60300->100.115.188.194:8081: read: connection reset by peer
  Warning  Unhealthy  21m                    kubelet, ip-10-16-19-163.eu-west-2.compute.internal  Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:60326->100.115.188.194:8081: read: connection reset by peer
  Warning  Unhealthy  2m35s (x111 over 20m)  kubelet, ip-10-16-19-163.eu-west-2.compute.internal  (combined from similar events): Readiness probe failed: Get http://100.115.188.194:8081/readiness: read tcp 10.16.19.163:63086->100.115.188.194:8081: read: connection reset by peer

развертывание kube-dns yaml:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    k8s-addon: kube-dns.addons.k8s.io
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
  name: kube-dns
  namespace: kube-system
spec:
  progressDeadlineSeconds: 2147483647
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: kube-dns
  strategy:
    rollingUpdate:
      maxSurge: 10%
      maxUnavailable: 0
    type: RollingUpdate
  template:
    metadata:
      annotations:
        prometheus.io/port: "10055"
        prometheus.io/scrape: "true"
        scheduler.alpha.kubernetes.io/critical-pod: ""
        scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly",
          "operator":"Exists"}]'
      creationTimestamp: null
      labels:
        k8s-app: kube-dns
    spec:
      containers:
      - args:
        - --config-dir=/kube-dns-config
        - --dns-port=10053
        - --domain=cluster.local.
        - --v=2
        env:
        - name: PROMETHEUS_PORT
          value: "10055"
        image: k8s.gcr.io/k8s-dns-kube-dns-amd64:1.14.10
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 5
          httpGet:
            path: /healthcheck/kubedns
            port: 10054
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: kubedns
        ports:
        - containerPort: 10053
          name: dns-local
          protocol: UDP
        - containerPort: 10053
          name: dns-tcp-local
          protocol: TCP
        - containerPort: 10055
          name: metrics
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readiness
            port: 8081
            scheme: HTTP
          initialDelaySeconds: 3
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        resources:
          limits:
            memory: 170Mi
          requests:
            cpu: 100m
            memory: 70Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /kube-dns-config
          name: kube-dns-config
      - args:
        - -v=2
        - -logtostderr
        - -configDir=/etc/k8s/dns/dnsmasq-nanny
        - -restartDnsmasq=true
        - --
        - -k
        - --cache-size=1000
        - --dns-forward-max=150
        - --no-negcache
        - --log-facility=-
        - --server=/cluster.local/127.0.0.1#10053
        - --server=/in-addr.arpa/127.0.0.1#10053
        - --server=/in6.arpa/127.0.0.1#10053
        image: k8s.gcr.io/k8s-dns-dnsmasq-nanny-amd64:1.14.10
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 5
          httpGet:
            path: /healthcheck/dnsmasq
            port: 10054
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: dnsmasq
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        resources:
          requests:
            cpu: 150m
            memory: 20Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/k8s/dns/dnsmasq-nanny
          name: kube-dns-config
      - args:
        - --v=2
        - --logtostderr
        - --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,A
        - --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,A
        image: k8s.gcr.io/k8s-dns-sidecar-amd64:1.14.10
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 5
          httpGet:
            path: /metrics
            port: 10054
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: sidecar
        ports:
        - containerPort: 10054
          name: metrics
          protocol: TCP
        resources:
          requests:
            cpu: 10m
            memory: 20Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: Default
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: kube-dns
      serviceAccountName: kube-dns
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 420
          name: kube-dns
          optional: true
        name: kube-dns-config
status:
  conditions:
  - lastTransitionTime: "2019-09-24T13:22:06Z"
    lastUpdateTime: "2019-09-24T13:22:06Z"
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  observedGeneration: 7
  replicas: 3
  unavailableReplicas: 3
  updatedReplicas: 2

Эта проблема влияет на любые новые сборки кластеров во всех наших учетных записях AWS - однако в каждой учетной записи есть запущенные кластеры с одинаковой конфигурацией, которые работают безлюбые проблемы.

Модули могут подключаться к конечной точке готовности (curl не установлен в контейнере kubedns, поэтому используется wget):

kubectl -n kube-system exec -it kube-dns-55c9b74794-cmn5n -c kubedns -- wget http://100.125.236.68:8081/readiness
Connecting to 100.125.236.68:8081 (100.125.236.68:8081)
readiness            100% |*******************************|     
...