Я развернул Prometheus, используя щелчок для развертывания - "https://github.com/GoogleCloudPlatform/click-to-deploy/blob/master/k8s/prometheus/manifest/alertmanager-statefulset.yaml" на кластере GKE
Когда я изменяю карту конфигурации alert-manager и перезапускаю alert-manager pods, я не получаю никаких предупреждений в alert-manager.
apiVersion: v1
kind: ConfigMap
metadata:
name: $APP_INSTANCE_NAME-alertmanager-config
labels:
app.kubernetes.io/name: $APP_INSTANCE_NAME
app.kubernetes.io/component: alertmanager
data:
alertmanager.yml: |
global:
smtp_smarthost: 'XXXX:25'
smtp_from: 'XX@XX.com'
templates:
- '/etc/alertmanager/template/*.tmpl'
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receiver: team-X-mails
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'cluster', 'service']
receivers:
- name: 'team-X-mails'
email_configs:
- to: 'YY@YY.com'
Когда я использую команду curl, чтобы вызвать предупреждение в alertmanager, я могу увидеть предупреждение и получить уведомление по электронной почте.
Карта конфигурации Prometheus для предупреждения
alerts.yaml: |-
"groups":
- "name": "general.rules"
"rules":
- "alert": "TargetDown"
"annotations":
"description": "{{ $value }}% of {{ $labels.job }} targets are down."
"summary": "Targets are down"
"expr": "100 * (count(up == 0) BY (job) / count(up) BY (job)) > 10"
"for": "10m"
"labels":
"severity": "critical"
- "alert": "DeadMansSwitch"
"annotations":
"description": "This is a DeadMansSwitch meant to ensure that the entire Alerting pipeline is functional."
"summary": "Alerting DeadMansSwitch"
"expr": "vector(1)"
"labels":
"severity": "none"
Я вижу, как предупреждение срабатывает в Prometheus.
prometheus-alertmanager-0 1/1 Running 0 67m
prometheus-deployer-7vl74 0/1 Completed 0 4h17m
prometheus-grafana-0 1/1 Running 0 4h17m
prometheus-kube-state-metrics-7674795b44-4z6zr 2/2 Running 0 4h17m
prometheus-node-exporter-7bdvj 1/1 Running 0 4h17m
prometheus-prometheus-0 1/1 Running 0 77m
и
prometheus-alertmanager LoadBalancer X.X.X.X X.X.X.X 9093:31157/TCP 4h18m
prometheus-alertmanager-operated ClusterIP None <none> 6783/TCP,9093/TCP 112m
prometheus-kube-state-metrics ClusterIP X.X.X.X <none> 8080/TCP,8081/TCP 4h18m
prometheus-prometheus LoadBalancer X.X.X.X X.X.X.X 9090:31257/TCP 4h18m