Skip to content

Commit

Permalink
Add instructions for monitoring of certificate status (#765)
Browse files Browse the repository at this point in the history
* Add Prometheus alerts configuration

Signed-off-by: JenTing Hsiao <jenting.hsiao@suse.com>

* Add anchor for subdomains and subpaths

Signed-off-by: JenTing Hsiao <jenting.hsiao@suse.com>

* Add more Grafana dashboard resource

Signed-off-by: JenTing Hsiao <jenting.hsiao@suse.com>

* Add monitoring certificate status

Signed-off-by: JenTing Hsiao <jenting.hsiao@suse.com>

* Add monitoring custom certificates

Signed-off-by: JenTing Hsiao <jenting.hsiao@suse.com>

* Add skuba addon labeling

Signed-off-by: JenTing Hsiao <jenting.hsiao@suse.com>

* The flags are optionals

Signed-off-by: JenTing Hsiao <jenting.hsiao@suse.com>

* Remove monitoring custom secret certs

Signed-off-by: JenTing Hsiao <jenting.hsiao@suse.com>

* Apply suggestions from code review

Co-Authored-By: David Ko <dko@suse.com>

* Monitoring custom secret certificates

Signed-off-by: JenTing Hsiao <jenting.hsiao@suse.com>

* Add minor wording and formatting fixes

* Slight wording review for certificate section

* Change wording according to suggestion

Co-Authored-By: c3y1huang <chin-ya.huang@suse.com>

* Move create namespace before

Signed-off-by: JenTing Hsiao <jenting.hsiao@suse.com>

Co-authored-by: David Ko <dko@suse.com>
Co-authored-by: Markus Napp <mnapp@suse.com>
Co-authored-by: c3y1huang <chin-ya.huang@suse.com>
  • Loading branch information
4 people authored Apr 30, 2020
1 parent cea7adb commit 706182f
Show file tree
Hide file tree
Showing 2 changed files with 201 additions and 34 deletions.
135 changes: 101 additions & 34 deletions adoc/admin-monitoring-stack.adoc
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
[[monitoring_stack]]
= Monitoring Stack

[IMPORTANT]
Expand Down Expand Up @@ -37,8 +38,7 @@ Grafana is an open-source system for querying, analysing and visualizing metrics

. NGINX Ingress Controller
+
Please refer to <<nginx-ingress>> on how to congifure ingress in your cluster. Deploying NGINX Ingress Controller also allows us to provide TLS termination to our services and to provide basic authentication to the Prometheus Expression browser/API.

Please refer to <<nginx-ingress>> on how to configure ingress in your cluster. Deploying NGINX Ingress Controller also allows us to provide TLS termination to our services and to provide basic authentication to the Prometheus Expression browser/API.

. Create DNS entries
+
Expand Down Expand Up @@ -82,11 +82,20 @@ Or add this entry to `/etc/hosts`
10.86.4.158 example.com
----

. Monitoring namespace
+
We will deploy our monitoring stack in its own namespace and therefore create one.
+
[source,bash]
----
kubectl create namespace monitoring
----

. Configure Authentication
+
We need to create a `basic-auth` secret so the NGINX Ingress Controller can perform authentication.
+
Install apache2-utils, which contains htpasswd, on your local workstation.
Install `apache2-utils`, which contains `htpasswd`, on your local workstation.
+
[source,bash]
----
Expand Down Expand Up @@ -117,15 +126,6 @@ Create secret in {kube} cluster
kubectl create secret generic -n monitoring prometheus-basic-auth --from-file=auth
----

. Monitoring namespace
+
We will deploy our monitoring stack in its own namespace and therefore create one.
+
[source,bash]
----
kubectl create namespace monitoring
----

. TLS
+
You must configure your certificates for the components as secrets in the {kube} cluster.
Expand Down Expand Up @@ -176,15 +176,16 @@ kubectl create -n monitoring secret tls monitoring-tls \
----

== Installation
.Two different examples of using ingress
[NOTE]
====

There will be two different ways of using ingress for accessing the monitoring system.
One will be using `subdomains` such as `+prometheus.example.com+`, `+prometheus-alertmanager.example.com+`, and `+grafana.example.com+`.
Another deployment will be using `subpaths` for accessing monitoring system such as `example.com/prometheus`, `example.com/alertmanager`, and `example.com/grafana`.
====

- <<installation_for_subdomains>>: Using `subdomains` for accessing monitoring system such as `+prometheus.example.com+`, `+prometheus-alertmanager.example.com+`, and `+grafana.example.com+`.

- <<installation_for_subpaths>>: Using `subpaths` for accessing monitoring system such as `example.com/prometheus`, `example.com/alertmanager`, and `example.com/grafana`.

[[installation_for_subdomains]]
=== Installation For Subdomains

[NOTE]
====
This installation example shows how to install and configure Prometheus and Grafana using subdomains such as `prometheus.example.com`, `prometheus-alertmanager.example.com`, and `grafana.example.com`.
Expand Down Expand Up @@ -346,22 +347,23 @@ prometheus-server prometheus.example.com 80, 44
[[alertmanager_configuration_example]]
==== Alertmanager Configuration Example

The configuration sets one "receiver" to get notified by email when a node meets one of these conditions:
The configuration example sets one "receiver" to get notified by email when one of below conditions is met:

* Node is unschedulable
* Node runs out of disk space
* Node has memory pressure
* Node has disk pressure
* Node is unschedulable: severity is `critical` because the node cannot accept new pods
* Node runs out of disk space: severity is `critical` because the node cannot accept new pods
* Node has memory pressure: severity is `warning`
* Node has disk pressure: severity is `warning`
* Certificates is going to expire in 7 days: severity is `critical`
* Certificates is going to expire in 30 days: severity is `warning`
* Certificates is going to expire in 3 months: severity is `info`

The first two are critical because the node cannot accept new pods, the last two are just warnings.

The Alertmanager configuration can be added to `prometheus-config-values.yaml` by adding the `alertmanagerFiles` section.

For more information on how to configure Alertmanager, refer to link:https://prometheus.io/docs/alerting/configuration[Prometheus: Alerting - Configuration].

. Configuring Alertmanager
. Configure alerting receiver in Alertmanager
+
The Alertmanager handles alerts sent by Prometheus server, it takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email. It also takes care of silencing and inhibition of alerts.
+
Add the `alertmanagerFiles` section to your Prometheus configuration.
Add the `alertmanagerFiles` section to your Prometheus configuration file `prometheus-config-values.yaml`.
+
For more information on how to configure Alertmanager, refer to link:https://prometheus.io/docs/alerting/configuration[Prometheus: Alerting - Configuration].
+
----
alertmanagerFiles:
Expand Down Expand Up @@ -401,7 +403,9 @@ alertmanagerFiles:
email_configs:
- to: 'admin@example.com'
----
. Replace the empty set of rules `rules: {}` in the `serverFiles` section of the configuration file.
. Configures alerting rules in Prometheus server
+
Replace the `serverFiles` section of the Prometheus configuration file `prometheus-config-values.yaml`.
+
For more information on how to configure alerts, refer to: link:https://prometheus.io/docs/alerting/notification_examples/[Prometheus: Alerting - Notification Template Examples]
+
Expand Down Expand Up @@ -437,6 +441,62 @@ serverFiles:
severity: warning
annotations:
description: '{{ $labels.node }} has insufficient available memory'
- name: caasp.certs.rules
rules:
- alert: KubernetesCertificateExpiry3Months
expr: (cert_exporter_cert_expires_in_seconds / 86400) < 90
labels:
severity: info
annotations:
description: 'The cert for {{ $labels.filename }} on {{ $labels.nodename }} node is going to expire in 3 months'
- alert: KubernetesCertificateExpiry30Days
expr: (cert_exporter_cert_expires_in_seconds / 86400) < 30
labels:
severity: warning
annotations:
description: 'The cert for {{ $labels.filename }} on {{ $labels.nodename }} node is going to expire in 30 days'
- alert: KubernetesCertificateExpiry7Days
expr: (cert_exporter_cert_expires_in_seconds / 86400) < 7
labels:
severity: critical
annotations:
description: 'The cert for {{ $labels.filename }} on {{ $labels.nodename }} node is going to expire in 7 days'
- alert: KubeconfigCertificateExpiry3Months
expr: (cert_exporter_kubeconfig_expires_in_seconds / 86400) < 90
labels:
severity: info
annotations:
description: 'The cert for {{ $labels.filename }} on {{ $labels.nodename }} node is going to expire in 3 months'
- alert: KubeconfigCertificateExpiry30Days
expr: (cert_exporter_kubeconfig_expires_in_seconds / 86400) < 30
labels:
severity: warning
annotations:
description: 'The cert for {{ $labels.filename }} on {{ $labels.nodename }} node is going to expire in 30 days'
- alert: KubeconfigCertificateExpiry7Days
expr: (cert_exporter_kubeconfig_expires_in_seconds / 86400) < 7
labels:
severity: critical
annotations:
description: 'The cert for {{ $labels.filename }} on {{ $labels.nodename }} node is going to expire in 7 days'
- alert: AddonCertificateExpiry3Months
expr: (cert_exporter_secret_expires_in_seconds / 86400) < 90
labels:
severity: info
annotations:
description: 'The cert for {{ $labels.secret_name }} is going to expire in 3 months'
- alert: AddonCertificateExpiry30Days
expr: (cert_exporter_secret_expires_in_seconds / 86400) < 30
labels:
severity: warning
annotations:
description: 'The cert for {{ $labels.secret_name }} is going to expire in 30 days'
- alert: AddonCertificateExpiry7Days
expr: (cert_exporter_secret_expires_in_seconds / 86400) < 7
labels:
severity: critical
annotations:
description: 'The cert for {{ $labels.secret_name }} is going to expire in 7 days'
----
. To apply the changed configuration, run:
+
Expand Down Expand Up @@ -646,10 +706,16 @@ After pasting in the url, the view will change to another form.
----
# monitor SUSE CaaS Platform cluster
kubectl apply -f https://raw.githubusercontent.com/SUSE/caasp-monitoring/master/grafana-dashboards-caasp-cluster.yaml
# monitor etcd
# monitor SUSE CaaS Platform etcd cluster
kubectl apply -f https://raw.githubusercontent.com/SUSE/caasp-monitoring/master/grafana-dashboards-caasp-etcd-cluster.yaml
# monitor namespaces
# monitor SUSE CaaS Platform nodes
kubectl apply -f https://raw.githubusercontent.com/SUSE/caasp-monitoring/master/grafana-dashboards-caasp-nodes.yaml
# monitor SUSE CaaS Platform namespaces
kubectl apply -f https://raw.githubusercontent.com/SUSE/caasp-monitoring/master/grafana-dashboards-caasp-namespaces.yaml
# monitor SUSE CaaS Platform pods
kubectl apply -f https://raw.githubusercontent.com/SUSE/caasp-monitoring/master/grafana-dashboards-caasp-pods.yaml
# monitor SUSE CaaS Platform certificates
kubectl apply -f https://raw.githubusercontent.com/SUSE/caasp-monitoring/master/grafana-dashboards-caasp-certificates.yaml
----

* Build your own dashboard
Expand Down Expand Up @@ -694,6 +760,7 @@ continues with definition of dashboard JSON
kubectl apply -f grafana-dashboards-caasp-cluster.yaml
----

[[installation_for_subpaths]]
=== Installation For Subpaths

[NOTE]
Expand Down
100 changes: 100 additions & 0 deletions adoc/admin-security-certificates.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,98 @@ The following certificates are created by `skuba`:
|Client
|===

== Monitoring Certificates

We use cert-exporter to monitor nodes' on-host certificates and addons' secret certificates. The cert-exporter collects the metrics of certificates expiration periodically (1 hour by default) and exposes them through the `/metrics` endpoint. Then, the Prometheus server can scrape these metrics from the endpoint periodically.

[source,bash]
----
helm repo add suse https://kubernetes-charts.suse.com
helm install suse/cert-exporter --name ${RELEASE_NAME}
----

=== Prerequisites

. To monitor certificates, we need to set up monitoring stack by following the <<monitoring_stack>> on how to deploy it.
. Label the skuba addon certificates
+
[source,bash]
----
kubectl label --overwrite secret oidc-dex-cert -n kube-system caasp.suse.com/skuba-addon=true
kubectl label --overwrite secret oidc-gangway-cert -n kube-system caasp.suse.com/skuba-addon=true
kubectl label --overwrite secret metrics-server-cert -n kube-system caasp.suse.com/skuba-addon=true
----

=== Prometheus Alerts

Use Prometheus alerts to reactively receive the status of the certificates, follow the <<alertmanager_configuration_example>> on how to configure the Prometheus Alertmanager and Prometheus Server.

=== Grafana Dashboards

Use Grafana to proactively monitor the status of the certificates, follow the <<adding_grafana_dashboards>> to install the Grafana dashboard to monitors certificates.

=== Monitor Custom Secret Certificates

You can monitor custom secret TLS certificates that you created manually or using link:https://cert-manager.io/[cert-manager].

For example:

. Monitor cert-manager issued certificates in the `cert-managert-test` namespace.
+
[source,bash]
----
helm install suse/cert-exporter \
--name ${RELEASE_NAME} \
--set customSecret.enabled=true \
--set customSecret.certs[0].name=cert-manager \
--set customSecret.certs[0].namespace=cert-manager-test \
--set customSecret.certs[0].includeKeys="{ca.crt,tls.crt}" \
--set customSecret.certs[0].annotationSelector="{cert-manager.io/certificate-name}"
----
. Monitor certificates in all namespaces filtered by label selector.
+
[source,bash]
----
helm install suse/cert-exporter \
--name ${RELEASE_NAME} \
--set customSecret.enabled=true \
--set customSecret.certs[0].name=self-signed-cert \
--set customSecret.certs[0].includeKeys="{ca.crt,tls.crt}" \
--set customSecret.certs[0].labelSelector="{key=value}"
----
. Deploy both 1. and 2. together.
+
[source,bash]
----
helm install suse/cert-exporter \
--name ${RELEASE_NAME} \
--set customSecret.enabled=true \
--set customSecret.certs[0].name=cert-manager \
--set customSecret.certs[0].namespace=cert-manager-test \
--set customSecret.certs[0].includeKeys="{ca.crt,tls.crt}" \
--set customSecret.certs[0].annotationSelector="{cert-manager.io/certificate-name}" \
--set customSecret.certs[1].name=self-signed-cert \
--set customSecret.certs[1].includeKeys="{ca.crt,tls.crt}" \
--set customSecret.certs[1].labelSelector="{key=value}"
----
. Monitor custom certificates only, disregarding node and addon certificates.
+
[source,bash]
----
helm install suse/cert-exporter \
--name ${RELEASE_NAME} \
--set node.enabled=false \
--set addon.enabled=false \
--set customSecret.enabled=true \
--set customSecret.certs[0].name=cert-manager \
--set customSecret.certs[0].namespace=cert-manager-test \
--set customSecret.certs[0].includeKeys="{ca.crt,tls.crt}" \
--set customSecret.certs[0].annotationSelector="{cert-manager.io/certificate-name}" \
--set customSecret.certs[1].name=self-signed-cert \
--set customSecret.certs[1].includeKeys="{ca.crt,tls.crt}" \
--set customSecret.certs[1].labelSelector="{key=value}"
----

== Deployment with a Custom CA Certificate

[WARNING]
Expand Down Expand Up @@ -280,6 +372,8 @@ kind: Secret
metadata:
name: oidc-dex-cert
namespace: kube-system
labels:
caasp.suse.com/skuba-addon: "true"
type: kubernetes.io/tls
data:
ca.crt: cat <TRUSTED_CA_CERT_PATH> | base64 | awk '{print}' ORS='' && echo
Expand Down Expand Up @@ -331,6 +425,8 @@ kind: Secret
metadata:
name: oidc-gangway-cert
namespace: kube-system
labels:
caasp.suse.com/skuba-addon: "true"
type: kubernetes.io/tls
data:
ca.crt: cat <TRUSTED_CA_CERT_PATH> | base64 | awk '{print}' ORS='' && echo
Expand Down Expand Up @@ -445,6 +541,8 @@ kind: Secret
metadata:
name: oidc-dex-cert
namespace: kube-system
labels:
caasp.suse.com/skuba-addon: "true"
type: kubernetes.io/tls
data:
ca.crt: cat <CA_CERT_PATH> | base64 | awk '{print}' ORS='' && echo
Expand Down Expand Up @@ -501,6 +599,8 @@ kind: Secret
metadata:
name: oidc-gangway-cert
namespace: kube-system
labels:
caasp.suse.com/skuba-addon: "true"
type: kubernetes.io/tls
data:
ca.crt: cat <CA_CERT_PATH> | base64 | awk '{print}' ORS='' && echo
Expand Down

0 comments on commit 706182f

Please sign in to comment.