Наша служба origin-node.service на главном узле дает сбой:
root@master> systemctl start origin-node.service
Job for origin-node.service failed because the control process exited with error code. See "systemctl status origin-node.service" and "journalctl -xe" for details.
root@master> systemctl status origin-node.service -l
[...]
May 05 07:17:47 master origin-node[44066]: bootstrap.go:195] Part of the existing bootstrap client certificate is expired: 2020-02-20 13:14:27 +0000 UTC
May 05 07:17:47 master origin-node[44066]: bootstrap.go:56] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file
May 05 07:17:47 master origin-node[44066]: certificate_store.go:131] Loading cert/key pair from "/etc/origin/node/certificates/kubelet-client-current.pem".
May 05 07:17:47 master origin-node[44066]: server.go:262] failed to run Kubelet: cannot create certificate signing request: Post https://lb.openshift-cluster.mydomain.com:8443/apis/certificates.k8s.io/v1beta1/certificatesigningrequests: EOF
Таким образом, похоже, что kubelet-client-current.pem и / или kubelet-server-current.pem содержат просроченный сертификат и служба пытается создать CSR, используя конечную точку, которая, вероятно, еще недоступна (потому что мастер не работает). Мы попытались повторно развернуть сертификаты в соответствии с документацией OpenShift Повторное развертывание сертификатов , но это не удалось при обнаружении сертификата с истекшим сроком действия:
root@master> ansible-playbook -i /etc/ansible/hosts openshift-master/redeploy-openshift-ca.yml
[...]
TASK [openshift_certificate_expiry : Fail when certs are near or already expired] *******************************************************************************************************************************************
fatal: [master.openshift-cluster.mydomain.com]: FAILED! => {"changed": false, "msg": "Cluster certificates found to be expired or within 60 days of expiring. You may view the report at /root/cert-expiry-report.20200505T042754.html or /root/cert-expiry-report.20200505T042754.json.\n"}
[...]
root@master> cat /root/cert-expiry-report.20200505T042754.json
[...]
"kubeconfigs": [
{
"cert_cn": "O:system:cluster-admins, CN:system:admin",
"days_remaining": -75,
"expiry": "2020-02-20 13:14:27",
"health": "expired",
"issuer": "CN=openshift-signer@1519045219 ",
"path": "/etc/origin/node/node.kubeconfig",
"serial": 27,
"serial_hex": "0x1b"
},
{
"cert_cn": "O:system:cluster-admins, CN:system:admin",
"days_remaining": -75,
"expiry": "2020-02-20 13:14:27",
"health": "expired",
"issuer": "CN=openshift-signer@1519045219 ",
"path": "/etc/origin/node/node.kubeconfig",
"serial": 27,
"serial_hex": "0x1b"
},
[...]
"summary": {
"expired": 2,
"ok": 22,
"total": 24,
"warning": 0
}
}
Существует руководство для OpenShift 4.4 для Восстановление с сертификаты уровня управления с истекшим сроком действия , но это не относится к 3.11, и мы не нашли такого руководства для нашей версии.
Возможно ли воссоздать просроченные сертификаты без работающего главного узла для 3.11? Спасибо за любую помощь.
OpenShift Ansible: https://github.com/openshift/openshift-ansible/releases/tag/openshift-ansible-3.11.153-2
Обновление 2020-05-06: Я также выполнил повторное развертывание сертификатов .yml, но он не работает при той же ЗАДАЧЕ:
root@master> ansible-playbook -i /etc/ansible/hosts playbooks/redeploy-certificates.yml
[...]
TASK [openshift_certificate_expiry : Fail when certs are near or already expired] ******************************************************************************
Wednesday 06 May 2020 04:07:06 -0400 (0:00:00.909) 0:01:07.582 *********
fatal: [master.openshift-cluster.mydomain.com]: FAILED! => {"changed": false, "msg": "Cluster certificates found to be expired or within 60 days of expiring. You may view the report at /root/cert-expiry-report.20200506T040603.html or /root/cert-expiry-report.20200506T040603.json.\n"}
Обновление 2020-05-11: Запуск с -e openshift_certificate_expiry_fail_on_warn=False
приводит к:
root@master> ansible-playbook -i /etc/ansible/hosts -e openshift_certificate_expiry_fail_on_warn=False playbooks/redeploy-certificates.yml
[...]
TASK [Wait for master API to come back online] *****************************************************************************************************************
Monday 11 May 2020 03:48:56 -0400 (0:00:00.111) 0:02:25.186 ************
skipping: [master.openshift-cluster.mydomain.com]
TASK [openshift_control_plane : restart master] ****************************************************************************************************************
Monday 11 May 2020 03:48:56 -0400 (0:00:00.257) 0:02:25.444 ************
changed: [master.openshift-cluster.mydomain.com] => (item=api)
changed: [master.openshift-cluster.mydomain.com] => (item=controllers)
RUNNING HANDLER [openshift_control_plane : verify API server] **************************************************************************************************
Monday 11 May 2020 03:48:57 -0400 (0:00:00.945) 0:02:26.389 ************
FAILED - RETRYING: verify API server (120 retries left).
FAILED - RETRYING: verify API server (119 retries left).
[...]
FAILED - RETRYING: verify API server (1 retries left).
fatal: [master.openshift-cluster.mydomain.com]: FAILED! => {"attempts": 120, "changed": false, "cmd": ["curl", "--silent", "--tlsv1.2", "--max-time", "2", "--cacert", "/etc/origin/master/ca-bundle.crt", "https://lb.openshift-cluster.mydomain.com:8443/healthz/ready"], "delta": "0:00:00.182367", "end": "2020-05-11 03:51:52.245644", "msg": "non-zero return code", "rc": 35, "start": "2020-05-11 03:51:52.063277", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
root@master> systemctl status origin-node.service -l
[...]
May 11 04:23:28 master.openshift-cluster.mydomain.com origin-node[109972]: E0511 04:23:28.077964 109972 bootstrap.go:195] Part of the existing bootstrap client certificate is expired: 2020-02-20 13:14:27 +0000 UTC
May 11 04:23:28 master.openshift-cluster.mydomain.com origin-node[109972]: I0511 04:23:28.078001 109972 bootstrap.go:56] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file
May 11 04:23:28 master.openshift-cluster.mydomain.com origin-node[109972]: I0511 04:23:28.080555 109972 certificate_store.go:131] Loading cert/key pair from "/etc/origin/node/certificates/kubelet-client-current.pem".
May 11 04:23:28 master.openshift-cluster.mydomain.com origin-node[109972]: F0511 04:23:28.130968 109972 server.go:262] failed to run Kubelet: cannot create certificate signing request: Post https://lb.openshift-cluster.mydomain.com:8443/apis/certificates.k8s.io/v1beta1/certificatesigningrequests: EOF
[...]