сбой модуля kubernetes с неудачным перезапуском контейнера - PullRequest
0 голосов
/ 28 апреля 2020

Я пытаюсь настроить ведение журнала Prometheus, я пытаюсь развернуть ниже yamls, но в модуле произошел сбой с «Откат перезапуска сбой контейнера»

Полное описание:

Name:         prometheus-75dd748df4-wrwlr
Namespace:    monitoring
Priority:     0
Node:         kbs-vm-02/172.16.1.8
Start Time:   Tue, 28 Apr 2020 06:13:22 +0000
Labels:       app=prometheus
              pod-template-hash=75dd748df4
Annotations:  <none>
Status:       Running
IP:           10.44.0.7
IPs:
  IP:           10.44.0.7
Controlled By:  ReplicaSet/prometheus-75dd748df4
Containers:
  prom:
    Container ID:  docker://50fb273836c5522bbbe01d8db36e18688e0f673bc54066f364290f0f6854a74f
    Image:         quay.io/prometheus/prometheus:v2.4.3
    Image ID:      docker-pullable://quay.io/prometheus/prometheus@sha256:8e0e85af45fc2bcc18bd7221b8c92fe4bb180f6bd5e30aa2b226f988029c2085
    Port:          9090/TCP
    Host Port:     0/TCP
    Args:
      --config.file=/prometheus-cfg/prometheus.yml
      --storage.tsdb.path=/data
      --storage.tsdb.retention=$(STORAGE_LOCAL_RETENTION)
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 28 Apr 2020 06:14:08 +0000
      Finished:     Tue, 28 Apr 2020 06:14:08 +0000
    Ready:          False
    Restart Count:  3
    Limits:
      memory:  1Gi
    Requests:
      cpu:     200m
      memory:  500Mi
    Environment Variables from:
      prometheus-config-flags  ConfigMap  Optional: false
    Environment:               <none>
    Mounts:
      /data from storage (rw)
      /prometheus-cfg from config-file (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-bt7dw (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  config-file:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      prometheus-config-file
    Optional:  false
  storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  prometheus-storage-claim
    ReadOnly:   false
  prometheus-token-bt7dw:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-token-bt7dw
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                From                Message
  ----     ------            ----               ----                -------
  Warning  FailedScheduling  76s (x3 over 78s)  default-scheduler   running "VolumeBinding" filter plugin for pod "prometheus-75dd748df4-wrwlr": pod has unbound immediate PersistentVolumeClaims
  Normal   Scheduled         73s                default-scheduler   Successfully assigned monitoring/prometheus-75dd748df4-wrwlr to kbs-vm-02
  Normal   Pulled            28s (x4 over 72s)  kubelet, kbs-vm-02  Container image "quay.io/prometheus/prometheus:v2.4.3" already present on machine
  Normal   Created           28s (x4 over 72s)  kubelet, kbs-vm-02  Created container prom
  Normal   Started           27s (x4 over 71s)  kubelet, kbs-vm-02  Started container prom
  Warning  BackOff           13s (x6 over 69s)  kubelet, kbs-vm-02  Back-off restarting failed container

развертывание файл:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  namespace: monitoring
  labels:
    app: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      securityContext:
        fsGroup: 1000
      serviceAccountName: prometheus
      containers:
      - image: quay.io/prometheus/prometheus:v2.4.3
        name: prom
        args:
        - '--config.file=/prometheus-cfg/prometheus.yml'
        - '--storage.tsdb.path=/data'
        - '--storage.tsdb.retention=$(STORAGE_LOCAL_RETENTION)'
        envFrom:
        - configMapRef:
            name: prometheus-config-flags
        ports:
        - containerPort: 9090
          name: prom-port
        resources:
          limits:
            memory: 1Gi
          requests:
            cpu: 200m
            memory: 500Mi
        volumeMounts:
        - name: config-file
          mountPath: /prometheus-cfg
        - name: storage
          mountPath: /data
      volumes:
      - name: config-file
        configMap:
          name: prometheus-config-file
      - name: storage
        persistentVolumeClaim:
          claimName: prometheus-storage-claim

PV Yaml:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-storage
  namespace: monitoring
  labels:
    app: prometheus
spec:
  capacity:
    storage: 12Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/data"

PV C Данные Yaml:

[vidya@KBS-VM-01 7-1_prometheus]$ cat prometheus/prom-pvc.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prometheus-storage-claim
  namespace: monitoring
  labels:
    app: prometheus
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

Знаете ли вы, что это за проблема и как ее исправить? Это. Пожалуйста, также дайте мне знать, что больше файлов нужно отправить,

Угадаю, что-то не так с конфигами хранилища, просматривая в журналах событий

Предупреждение FailedScheduling 76s (x3 over 78s) по умолчанию работает планировщик Подключаемый модуль фильтра «VolumeBinding» для модуля «prometheus-75dd748df4-wrwlr»: модуль pod имеет несвязанные немедленные PersistentVolumeClaims

Я использую локальное хранилище.

[vidya@KBS-VM-01 7-1_prometheus]$ kubectl describe pvc prometheus-storage-claim -n monitoring
Name:          prometheus-storage-claim
Namespace:     monitoring
StorageClass:
Status:        Bound
Volume:        prometheus-storage
Labels:        app=prometheus
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      12Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Mounted By:    prometheus-75dd748df4-wrwlr
Events:
  Type    Reason         Age   From                         Message
  ----    ------         ----  ----                         -------
  Normal  FailedBinding  37m   persistentvolume-controller  no persistent volumes available for this claim and no storage class is set



[vidya@KBS-VM-01 7-1_prometheus]$ kubectl logs prometheus-75dd748df4-zlncv -n monitoring
level=info ts=2020-04-28T07:49:07.885529914Z caller=main.go:238 msg="Starting Prometheus" version="(version=2.4.3, branch=HEAD, revision=167a4b4e73a8eca8df648d2d2043e21bdb9a7449)"
level=info ts=2020-04-28T07:49:07.885635014Z caller=main.go:239 build_context="(go=go1.11.1, user=root@1e42b46043e9, date=20181004-08:42:02)"
level=info ts=2020-04-28T07:49:07.885812014Z caller=main.go:240 host_details="(Linux 3.10.0-1062.1.1.el7.x86_64 #1 SMP Fri Sep 13 22:55:44 UTC 2019 x86_64 prometheus-75dd748df4-zlncv (none))"
level=info ts=2020-04-28T07:49:07.885833214Z caller=main.go:241 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2020-04-28T07:49:07.885849614Z caller=main.go:242 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-04-28T07:49:07.888695413Z caller=main.go:554 msg="Starting TSDB ..."
level=info ts=2020-04-28T07:49:07.889017612Z caller=main.go:423 msg="Stopping scrape discovery manager..."
level=info ts=2020-04-28T07:49:07.889033512Z caller=main.go:437 msg="Stopping notify discovery manager..."
level=info ts=2020-04-28T07:49:07.889041112Z caller=main.go:459 msg="Stopping scrape manager..."
level=info ts=2020-04-28T07:49:07.889048812Z caller=main.go:433 msg="Notify discovery manager stopped"
level=info ts=2020-04-28T07:49:07.889071612Z caller=main.go:419 msg="Scrape discovery manager stopped"
level=info ts=2020-04-28T07:49:07.889083112Z caller=main.go:453 msg="Scrape manager stopped"
level=info ts=2020-04-28T07:49:07.889098012Z caller=manager.go:638 component="rule manager" msg="Stopping rule manager..."
level=info ts=2020-04-28T07:49:07.889109912Z caller=manager.go:644 component="rule manager" msg="Rule manager stopped"
level=info ts=2020-04-28T07:49:07.889124912Z caller=notifier.go:512 component=notifier msg="Stopping notification manager..."
level=info ts=2020-04-28T07:49:07.889137812Z caller=main.go:608 msg="Notifier manager stopped"
level=info ts=2020-04-28T07:49:07.889169012Z caller=web.go:397 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=error ts=2020-04-28T07:49:07.889653412Z caller=main.go:617 err="opening storage failed: lock DB directory: open /data/lock: permission denied"

1 Ответ

1 голос
/ 28 апреля 2020

Проблема здесь в том, что pv c не связан с pv в первую очередь потому, что нет класса хранения, чтобы связать pv с pv c и емкостью в pv (12Gi) и запросами в pv c (10Gi ) не соответствует. Таким образом, в конце kubernetes не мог понять, к какому pv должен быть привязан pv c.

  1. Добавить storageClassName: manual в spe c как PV, так и PV C.
  2. Сделать емкость в PV и запросы в PV C одинаковыми, т.е. 10 Ги

PV

apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-storage
  namespace: monitoring
  labels:
    app: prometheus
spec:
  storageClassName: manual
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/data"

PV C

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prometheus-storage-claim
  namespace: monitoring
  labels:
    app: prometheus
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

Обновление:

Запуск модуля как root путем добавления runAsUser: 0 должен устранить ошибку open /data/lock: permission denied

...