Skip to content

Commit 1fd62b7

Browse files
imjasonhvdemeester
authored andcommitted
Mark webhook and controller as safe-to-evict
The safe-to-evict annotation tells the cluster autoscaler whether the pod can be evicted to allow the node it's on to scale down. This was set to false (by me!) 2 years ago in tektoncd@fc6ef39 to prevent service unreliability during scale-down events. If the no webhook replicas are available, users can't create/update/delete Tekton objects; if no controller replicas are available, status updates from Pod events, etc., won't be processed. Unfortunately, blocking node eviction means the node that the pod(s) get scheduled to can't be scaled down. Furthermore, the nodes can't be fully drained when updating the cluster. This can leave a cluster in a mid-upgrade state that can make issues difficult to diagnose and reason about. With this change, a cluster scale-down event might cause temporary service unreliability with the default single-replica configuration. As with tektoncd#3787 if a user/operator wants to prevent this, they should configure more replicas for HA. (cherry picked from commit 5350069) Signed-off-by: Vincent Demeester <vdemeest@redhat.com>
1 parent 8d31e08 commit 1fd62b7

File tree

3 files changed

+4
-4
lines changed

3 files changed

+4
-4
lines changed

config/controller.yaml

-2
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,6 @@ spec:
3737
app.kubernetes.io/part-of: tekton-pipelines
3838
template:
3939
metadata:
40-
annotations:
41-
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
4240
labels:
4341
app.kubernetes.io/name: controller
4442
app.kubernetes.io/component: controller

config/webhook.yaml

-2
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,6 @@ spec:
4040
app.kubernetes.io/part-of: tekton-pipelines
4141
template:
4242
metadata:
43-
annotations:
44-
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
4543
labels:
4644
app.kubernetes.io/name: webhook
4745
app.kubernetes.io/component: webhook

docs/enabling-ha.md

+4
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,10 @@ spec:
8888
minReplicas: 1
8989
```
9090
91+
By default, the Webhook deployment is _not_ configured to block a [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) from scaling down the node that's running the only replica of the deployment using the `cluster-autoscaler.kubernetes.io/safe-to-evict` annotation.
92+
This means that during node drains, the Webhook might be unavailable temporarily, during which time Tekton resources can't be created, updated or deleted.
93+
To avoid this, you can add the `safe-to-evict` annotation set to `false` to block node drains during autoscaling, or, better yet, configure multiple replicas of the Webhook deployment.
94+
9195
### Avoiding Disruptions
9296

9397
To avoid the Webhook Service becoming unavailable during node unavailability (e.g., during node upgrades), you can ensure that a minimum number of Webhook replicas are available at time by defining a [`PodDisruptionBudget`](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) which sets a `minAvailable` greater than zero:

0 commit comments

Comments
 (0)