Skip to content

Commit 96d095d

Browse files
committed
Update PersistentVolumeClaimPolicy to approved api
1 parent a20d3a6 commit 96d095d

File tree

1 file changed

+82
-65
lines changed
  • keps/sig-apps/1847-autoremove-statefulset-pvcs

1 file changed

+82
-65
lines changed

keps/sig-apps/1847-autoremove-statefulset-pvcs/README.md

+82-65
Original file line numberDiff line numberDiff line change
@@ -10,17 +10,19 @@
1010
- [Background](#background)
1111
- [Changes required](#changes-required)
1212
- [User Stories](#user-stories)
13+
- [Story 0](#story-0)
1314
- [Story 1](#story-1)
1415
- [Story 2](#story-2)
16+
- [Story 3](#story-3)
1517
- [Notes/Constraints/Caveats (optional)](#notesconstraintscaveats-optional)
1618
- [Risks and Mitigations](#risks-and-mitigations)
1719
- [Design Details](#design-details)
1820
- [Objects Associated with the StatefulSet](#objects-associated-with-the-statefulset)
1921
- [Volume delete policy for the StatefulSet created PVCs](#volume-delete-policy-for-the-statefulset-created-pvcs)
20-
- [<code>DeleteOnScaledown</code>](#)
21-
- [<code>DeleteOnStatefulSetDeletion</code>](#-1)
22+
- [<code>OnScaleDown</code> policy of <code>Delete</code>.](#-policy-of-)
23+
- [<code>OnSetDeletion</code> policy of <code>Delete</code>.](#-policy-of--1)
2224
- [Non-Cascading Deletion](#non-cascading-deletion)
23-
- [Mutating <code>PersistentVolumeClaimDeletePolicy</code>](#mutating-)
25+
- [Mutating <code>PersistentVolumeClaimPolicy</code>](#mutating-)
2426
- [Cluster role change for statefulset controller](#cluster-role-change-for-statefulset-controller)
2527
- [Test Plan](#test-plan)
2628
- [Graduation Criteria](#graduation-criteria)
@@ -90,8 +92,8 @@ That functionality will continue to be governed by the ReclaimPolicy of the stor
9092

9193
### Background
9294

93-
The `garbagecollector` controller is responsible for ensuring that when a statefulset set
94-
is deleted the corresponding pods spawned from the StatefulSet is deleted. The
95+
The `garbagecollector` controller is responsible for ensuring that when a StatefulSet
96+
is deleted, the corresponding pods spawned from the StatefulSet are deleted as well. The
9597
`garbagecollector` uses an `OwnerReference` added to the `Pod` by the StatefulSet
9698
controller to delete the Pod. This proposal leverages a similar mechanism to automatically
9799
delete the PVCs created by the controller from the StatefulSet's VolumeClaimTemplate.
@@ -100,21 +102,29 @@ delete the PVCs created by the controller from the StatefulSet's VolumeClaimTemp
100102

101103
The following changes are required:
102104

103-
1. Add `PersistentVolumeClaimDeletePolicy` to the StatefulSet spec with the following policies.
104-
* `Retain` - this is the default policy and is considered in cases where no policy is
105-
specified. This would be the existing behaviour - when a StatefulSet is deleted, no
106-
action is taken with respect to the PVCs created by the StatefulSet.
107-
* `DeleteOnStatefulSetDeletion` - PVCs corresponding to the StatefulSet are deleted when StatefulSet
108-
themselves get deleted. When Pods are deleted as part of a scale down, PVCs are not
109-
deleted. Thus there may be PVCs owned by the StatefulSet that are not attached to a Pod.
110-
* `DeleteOnScaledown` - When a pod is deleted on scale down, the corresponding PVC is deleted as well.
111-
On a scale down followed by a scale up, will wait until the old PVC for the removed Pod is deleted and ensure
112-
that the PVC used is a freshly created one. This policy also implies the
113-
former, that is, PVCs will also be deleted when the StatefulSet is deleted.
105+
1. Add `PersistentVolumeClaimPolicy` to the StatefulSet spec with the following fields.
106+
* `OnSetDeletion` - specifies if the VolumeClaimTemplate PVCs are deleted when
107+
their StatefulSet is deleted.
108+
* `OnScaleDown` - specifies if VolumeClaimTemplate PVCs are deleted when
109+
their corresponding pod is deleted on a StatefulSet scale-down, that is,
110+
when the number of pods in a StatefulSet is reduced via the Replicas field.
111+
112+
These fields may be set to the following values.
113+
* `Retain` - the default policy, which is also used when no policy is
114+
specified. This specifies the existing behaviour: when a StatefulSet is
115+
deleted or scaled down, no action is taken with respect to the PVCs
116+
created by the StatefulSet.
117+
* `Delete` - specifies that the appropriate PVCs as described above will be
118+
deleted in the corresponding scenario, either on StatefulSet deletion or scale-down.
114119
2. Add `patch` to the statefulset controller rbac cluster role for `persistentvolumeclaims`.
115120

116121
### User Stories
117122

123+
#### Story 0
124+
The user is happy with legacy behavior of a stateful set. They leave all fields
125+
of `PersistentVolumeClaimPolicy` to `Retain`. Nothing traditional
126+
StatefulSet behavior changes neither on set deletion nor on scale-down.
127+
118128
#### Story 1
119129
The user is running a StatefulSet as part of an application with a finite lifetime. During
120130
the application's existence the StatefulSet maintains per-pod state, even across scale-up
@@ -123,23 +133,35 @@ so that scale-up can leverage the existing volumes. When the application is fini
123133
volumes created by the StatefulSet are no longer needed and can be automatically
124134
reclaimed.
125135

126-
The user would set the `PersistentVolumeClaimDeletePolicy` as `DeleteOnStatefulSetDelete`
127-
which would ensure that the PVCs created automatically during the StatefulSet activation
128-
is deleted once the StatefulSet is deleted.
136+
The user would set `PersistentVolumeClaimPolicy.OnSetDeletion` to `Delete, which
137+
would ensure that the PVCs created automatically during the StatefulSet
138+
activation is deleted once the StatefulSet is deleted.
129139

130140
#### Story 2
131-
User is very cost conscious and can sustain slower scale-up speeds, even after a
132-
scale-down. The user does not want to pay for volumes that are not in use, and
133-
so wants them to be reclaimed as soon as possible, including during
134-
scale-down. On scale-up a new volume will be provisioned and the new pod will
135-
have to re-intitialize. However, for short-lived interruptions when a pod is
136-
killed & recreated, like a rolling update or node disruptions, the data on
137-
volumes is persisted. This is a key property that ephemeral storage, like
138-
emptyDir, cannot provide.
139-
140-
User would set the `PersistentVolumeClaimDeletePolicy` as `DeleteOnScaledown` ensuring
141-
PVCs are deleted when corresponding Pods are deleted. New Pods created during scale-up
142-
followed by a scale-down will wait for freshly created PVCs.
141+
The user is cost conscious, and can sustain slower scale-up speeds even after a
142+
scale-down, because scaling events are rare, and volume data can be
143+
reconstructed, albeit slowly, during a scale up. However, it is necessary to
144+
bring down the StatefulSet temporarily by deleting it, and then bring it back up
145+
by reusing the volumes. This is accomplished by setting
146+
`PersistentVolumeClaimPolicy.OnScaleDown` to `Delete`, and leaving
147+
`PersistentVolumeClaimPolicy.OnSetDeletion` at `Retain`.
148+
149+
#### Story 3
150+
User is very cost conscious, and can sustain slower scale-up speeds even after a
151+
scale-down. The user does not want to pay for volumes that are not in use in any
152+
circumstance, and so wants them to be reclaimed as soon as possible. On scale-up
153+
a new volume will be provisioned and the new pod will have to
154+
re-intitialize. However, for short-lived interruptions when a pod is killed &
155+
recreated, like a rolling update or node disruptions, the data on volumes is
156+
persisted. This is a key property that ephemeral storage, like emptyDir, cannot
157+
provide.
158+
159+
User would set the `PersistentVolumeClaimPolicy.OnScaleDown` as well as
160+
`PersistentVolumeClaimPolicy.OnSetDeletion` to `Delete`, ensuring PVCs are
161+
deleted when corresponding Pods are deleted. New Pods created during scale-up
162+
followed by a scale-down will wait for freshly created PVCs. PVCs are deleted as
163+
well when the set is deleted, reclaiming volumes as quickly as possible and
164+
minimizing expense.
143165

144166
### Notes/Constraints/Caveats (optional)
145167

@@ -152,12 +174,12 @@ VolumeClaimTemplate it will be deleted according to the deletion policy.
152174

153175
### Risks and Mitigations
154176

155-
Currently the PVCs created by StatefulSet are not deleted automatically. Using the
156-
`DeleteOnScaledown` or `DeleteOnStatefulSetDeletion` would delete the PVCs
157-
automatically. Since this involves persistent data being deleted, users should take
158-
appropriate care using this feature. Having the `Retain` behaviour as default will ensure
159-
that the PVCs remain intact by default and only a conscious choice made by user will
160-
involve any persistent data being deleted.
177+
Currently the PVCs created by StatefulSet are not deleted automatically. Using
178+
`OnScaleDown` or `OnSetDeletion` set to `Delete` would delete the PVCs
179+
automatically. Since this involves persistent data being deleted, users should
180+
take appropriate care using this feature. Having the `Retain` behaviour as
181+
default will ensure that the PVCs remain intact by default and only a conscious
182+
choice made by user will involve any persistent data being deleted.
161183

162184
This proposed API causes the PVCs associated with the StatefulSet to have
163185
behavior close to, but not the same as, ephemeral volumes, such as emptyDir or
@@ -189,12 +211,13 @@ are affected.
189211

190212
### Volume delete policy for the StatefulSet created PVCs
191213

192-
A new field named `PersistentVolumeClaimDeletePolicy` of the type
193-
`StatefulSetPersistentVolumeClaimDeletePolicy` will be added to the StatefulSet. This
194-
field will represent the user indication on whether the associated PVCs can be
195-
automatically deleted or not. The default policy would be `Retain`.
214+
A new field named `PersistentVolumeClaimPolicy` of the type
215+
`StatefulSetPersistentVolumeClaimPolicy` will be added to the StatefulSet. This
216+
will represent the user indication for which circumstances the associated PVCs
217+
can be automatically deleted or not, as described above. The default policy
218+
would be to retain PVCs in all cases.
196219

197-
The `PersistentVolumeClaimDeletePolicy` field will be mutable. The deletion
220+
The `PersistentVolumeClaimPolicy` object will be mutable. The deletion
198221
mechanism will be based on reconciliation, so as long as the field is changed
199222
far from StatefulSet deletion or scale-down, the policy will work as
200223
expected. Mutability does introduce race conditions if it is changed while a
@@ -207,14 +230,12 @@ manually deleting PVCs. The latter case will result in lost data, but only in
207230
PVCs that were originally declared to have been deleted. Life does not always
208231
have an undo button.
209232

210-
#### `DeleteOnScaledown`
233+
#### `OnScaleDown` policy of `Delete`.
211234

212-
If `PersistentVolumeClaimDeletePolicy` is set to `DeleteOnScaledown`, the Pod will be set
213-
as the owner of the PVCs created from the `VolumeClaimTemplates` just before the
214-
scale-down is performed by the StatefulSet controller. When a Pod is deleted, the PVC
215-
owned by the Pod is also deleted. When `DeleteOnScaledown` policy is set and the
216-
Statefulset gets deleted the PVCs also will get deleted (similar to
217-
`DeleteOnStatefulSetDeletion` policy).
235+
If `PersistentVolumeClaimPolicy.OnScaleDown` is set to `Delete`, the Pod will be
236+
set as the owner of the PVCs created from the `VolumeClaimTemplates` just before
237+
the scale-down is performed by the StatefulSet controller. When a Pod is
238+
deleted, the PVC owned by the Pod is also deleted.
218239

219240
The current StatefulSet controller implementation ensures that the manually deleted pods
220241
are restored before the scale-down logic is run. This combined with the fact that the
@@ -226,21 +247,17 @@ the PVC was referred to by the deleted Pod and is in the process of getting
226247
deleted. The controller will skip the reconcile loop until PVC deletion finishes, avoiding
227248
a race condition.
228249

229-
In addition, on PVC creation an OwnerRef is added for when the StatefulSet is
230-
deleted. See the `DeleteOnStatefulSetDeletion` policy below for further details
231-
how this will be handled.
232-
233-
#### `DeleteOnStatefulSetDeletion`
250+
#### `OnSetDeletion` policy of `Delete`.
234251

235-
When `PersistentVolumeClaimDeletePolicy` is set to
236-
`DeleteOnStatefulSetDeletion`, when a VolumeClaimTemplate PVC is created, an
237-
owner reference in PVC will be added to point to the StatefulSet. When a
238-
scale-up or scale-down occurs, the PVC is unchanged. PVCs previously in use
239-
before scale-down will be used again when the scale-up occurs.
252+
When `PersistentVolumeClaimPolicy.OnSetDeletion` is set to `Delete`, when a
253+
VolumeClaimTemplate PVC is created, an owner reference in PVC will be added to
254+
point to the StatefulSet. When a scale-up or scale-down occurs, the PVC is
255+
unchanged. PVCs previously in use before scale-down will be used again when the
256+
scale-up occurs.
240257

241258
In the existing StatefulSet reconcile loop, the associated VolumeClaimTemplate
242259
PVCs will be checked to see if the ownerRef is correct according to the
243-
`PersistentVolumeClaimDeletePolicy` and updated accordingly. This includes PVCs
260+
`PersistentVolumeClaimPolicy` and updated accordingly. This includes PVCs
244261
that have been manually provisioned. It will be most consistent and easy
245262
to reason about if all VolumeClaimTemplate PVCs are treated uniformly rather
246263
than trying to guess at their provenance.
@@ -253,16 +270,16 @@ ensures that PVC deletion happens only after the StatefulSet is deleted. This is
253270
necessary because of PVC protection which does not allow PVC deletion until all
254271
pods referencing it are deleted.
255272

256-
`Retain` `PersistentVolumeClaimDeletePolicy` will ensure the current behaviour: no PVC
257-
deletion is performed as part of StatefulSet controller.
273+
The deletion policies may be combined in order to get the delete behavior both
274+
on set deletion as well as scale-down.
258275

259276
#### Non-Cascading Deletion
260277

261278
When StatefulSet is deleted without cascading, eg `kubectl delete --cascade=false`, then
262279
existing behavior is retained and no PVC will be deleted. Only the StatefulSet resource
263280
will be affected.
264281

265-
#### Mutating `PersistentVolumeClaimDeletePolicy`
282+
#### Mutating `PersistentVolumeClaimPolicy`
266283

267284
Recall that as defined above, the PVCs associated with a StatefulSet are found
268285
by the StatefulSet volumeClaimTemplate static naming scheme. The Pods associated
@@ -339,13 +356,13 @@ In order to update the PVC ownerReference, the `buildControllerRoles` will be up
339356
This features adds a new field to the StatefulSet. The default value for the new field
340357
maintains the existing behavior of StatefulSets.
341358

342-
On a downgrade, the `PersistentVolumeClaimReclaimPolicy` field will be hidden on
359+
On a downgrade, the `PersistentVolumeClaimPolicy` field will be hidden on
343360
any StatefulSets. The behavior in this case will be identical to mutating they
344361
policy field to `Retain`, as described above, including the edge cases
345362
introduced if this is done during a scale-down or StatefulSet deletion.
346363

347364
### Version Skew Strategy
348-
There are only kubecontroller manager changes involved (in addition to the
365+
There are only kube-controller-manager changes involved (in addition to the
349366
apiserver changes for dealing with the new StatefulSet field). Node components
350367
are not involved so there is no version skew between nodes and the control plane.
351368

0 commit comments

Comments
 (0)