10
10
- [ Background] ( #background )
11
11
- [ Changes required] ( #changes-required )
12
12
- [ User Stories] ( #user-stories )
13
+ - [ Story 0] ( #story-0 )
13
14
- [ Story 1] ( #story-1 )
14
15
- [ Story 2] ( #story-2 )
16
+ - [ Story 3] ( #story-3 )
15
17
- [ Notes/Constraints/Caveats (optional)] ( #notesconstraintscaveats-optional )
16
18
- [ Risks and Mitigations] ( #risks-and-mitigations )
17
19
- [ Design Details] ( #design-details )
18
20
- [ Objects Associated with the StatefulSet] ( #objects-associated-with-the-statefulset )
19
21
- [ Volume delete policy for the StatefulSet created PVCs] ( #volume-delete-policy-for-the-statefulset-created-pvcs )
20
- - [ <code >DeleteOnScaledown </code >] ( # )
21
- - [ <code >DeleteOnStatefulSetDeletion </code >] ( #-1 )
22
+ - [ <code >OnScaleDown </code > policy of < code >Delete</ code >. ] ( #-policy-of- )
23
+ - [ <code >OnSetDeletion </code > policy of < code >Delete</ code >. ] ( #-policy-of- -1 )
22
24
- [ Non-Cascading Deletion] ( #non-cascading-deletion )
23
- - [ Mutating <code >PersistentVolumeClaimDeletePolicy </code >] ( #mutating- )
25
+ - [ Mutating <code >PersistentVolumeClaimPolicy </code >] ( #mutating- )
24
26
- [ Cluster role change for statefulset controller] ( #cluster-role-change-for-statefulset-controller )
25
27
- [ Test Plan] ( #test-plan )
26
28
- [ Graduation Criteria] ( #graduation-criteria )
@@ -90,8 +92,8 @@ That functionality will continue to be governed by the ReclaimPolicy of the stor
90
92
91
93
### Background
92
94
93
- The ` garbagecollector ` controller is responsible for ensuring that when a statefulset set
94
- is deleted the corresponding pods spawned from the StatefulSet is deleted. The
95
+ The ` garbagecollector ` controller is responsible for ensuring that when a StatefulSet
96
+ is deleted, the corresponding pods spawned from the StatefulSet are deleted as well . The
95
97
` garbagecollector ` uses an ` OwnerReference ` added to the ` Pod ` by the StatefulSet
96
98
controller to delete the Pod. This proposal leverages a similar mechanism to automatically
97
99
delete the PVCs created by the controller from the StatefulSet's VolumeClaimTemplate.
@@ -100,21 +102,29 @@ delete the PVCs created by the controller from the StatefulSet's VolumeClaimTemp
100
102
101
103
The following changes are required:
102
104
103
- 1 . Add ` PersistentVolumeClaimDeletePolicy ` to the StatefulSet spec with the following policies.
104
- * ` Retain ` - this is the default policy and is considered in cases where no policy is
105
- specified. This would be the existing behaviour - when a StatefulSet is deleted, no
106
- action is taken with respect to the PVCs created by the StatefulSet.
107
- * ` DeleteOnStatefulSetDeletion ` - PVCs corresponding to the StatefulSet are deleted when StatefulSet
108
- themselves get deleted. When Pods are deleted as part of a scale down, PVCs are not
109
- deleted. Thus there may be PVCs owned by the StatefulSet that are not attached to a Pod.
110
- * ` DeleteOnScaledown ` - When a pod is deleted on scale down, the corresponding PVC is deleted as well.
111
- On a scale down followed by a scale up, will wait until the old PVC for the removed Pod is deleted and ensure
112
- that the PVC used is a freshly created one. This policy also implies the
113
- former, that is, PVCs will also be deleted when the StatefulSet is deleted.
105
+ 1 . Add ` PersistentVolumeClaimPolicy ` to the StatefulSet spec with the following fields.
106
+ * ` OnSetDeletion ` - specifies if the VolumeClaimTemplate PVCs are deleted when
107
+ their StatefulSet is deleted.
108
+ * ` OnScaleDown ` - specifies if VolumeClaimTemplate PVCs are deleted when
109
+ their corresponding pod is deleted on a StatefulSet scale-down, that is,
110
+ when the number of pods in a StatefulSet is reduced via the Replicas field.
111
+
112
+ These fields may be set to the following values.
113
+ * ` Retain ` - the default policy, which is also used when no policy is
114
+ specified. This specifies the existing behaviour: when a StatefulSet is
115
+ deleted or scaled down, no action is taken with respect to the PVCs
116
+ created by the StatefulSet.
117
+ * ` Delete ` - specifies that the appropriate PVCs as described above will be
118
+ deleted in the corresponding scenario, either on StatefulSet deletion or scale-down.
114
119
2 . Add ` patch ` to the statefulset controller rbac cluster role for ` persistentvolumeclaims ` .
115
120
116
121
### User Stories
117
122
123
+ #### Story 0
124
+ The user is happy with legacy behavior of a stateful set. They leave all fields
125
+ of ` PersistentVolumeClaimPolicy ` to ` Retain ` . Nothing traditional
126
+ StatefulSet behavior changes neither on set deletion nor on scale-down.
127
+
118
128
#### Story 1
119
129
The user is running a StatefulSet as part of an application with a finite lifetime. During
120
130
the application's existence the StatefulSet maintains per-pod state, even across scale-up
@@ -123,23 +133,35 @@ so that scale-up can leverage the existing volumes. When the application is fini
123
133
volumes created by the StatefulSet are no longer needed and can be automatically
124
134
reclaimed.
125
135
126
- The user would set the ` PersistentVolumeClaimDeletePolicy ` as ` DeleteOnStatefulSetDelete `
127
- which would ensure that the PVCs created automatically during the StatefulSet activation
128
- is deleted once the StatefulSet is deleted.
136
+ The user would set ` PersistentVolumeClaimPolicy.OnSetDeletion ` to `Delete, which
137
+ would ensure that the PVCs created automatically during the StatefulSet
138
+ activation is deleted once the StatefulSet is deleted.
129
139
130
140
#### Story 2
131
- User is very cost conscious and can sustain slower scale-up speeds, even after a
132
- scale-down. The user does not want to pay for volumes that are not in use, and
133
- so wants them to be reclaimed as soon as possible, including during
134
- scale-down. On scale-up a new volume will be provisioned and the new pod will
135
- have to re-intitialize. However, for short-lived interruptions when a pod is
136
- killed & recreated, like a rolling update or node disruptions, the data on
137
- volumes is persisted. This is a key property that ephemeral storage, like
138
- emptyDir, cannot provide.
139
-
140
- User would set the ` PersistentVolumeClaimDeletePolicy ` as ` DeleteOnScaledown ` ensuring
141
- PVCs are deleted when corresponding Pods are deleted. New Pods created during scale-up
142
- followed by a scale-down will wait for freshly created PVCs.
141
+ The user is cost conscious, and can sustain slower scale-up speeds even after a
142
+ scale-down, because scaling events are rare, and volume data can be
143
+ reconstructed, albeit slowly, during a scale up. However, it is necessary to
144
+ bring down the StatefulSet temporarily by deleting it, and then bring it back up
145
+ by reusing the volumes. This is accomplished by setting
146
+ ` PersistentVolumeClaimPolicy.OnScaleDown ` to ` Delete ` , and leaving
147
+ ` PersistentVolumeClaimPolicy.OnSetDeletion ` at ` Retain ` .
148
+
149
+ #### Story 3
150
+ User is very cost conscious, and can sustain slower scale-up speeds even after a
151
+ scale-down. The user does not want to pay for volumes that are not in use in any
152
+ circumstance, and so wants them to be reclaimed as soon as possible. On scale-up
153
+ a new volume will be provisioned and the new pod will have to
154
+ re-intitialize. However, for short-lived interruptions when a pod is killed &
155
+ recreated, like a rolling update or node disruptions, the data on volumes is
156
+ persisted. This is a key property that ephemeral storage, like emptyDir, cannot
157
+ provide.
158
+
159
+ User would set the ` PersistentVolumeClaimPolicy.OnScaleDown ` as well as
160
+ ` PersistentVolumeClaimPolicy.OnSetDeletion ` to ` Delete ` , ensuring PVCs are
161
+ deleted when corresponding Pods are deleted. New Pods created during scale-up
162
+ followed by a scale-down will wait for freshly created PVCs. PVCs are deleted as
163
+ well when the set is deleted, reclaiming volumes as quickly as possible and
164
+ minimizing expense.
143
165
144
166
### Notes/Constraints/Caveats (optional)
145
167
@@ -152,12 +174,12 @@ VolumeClaimTemplate it will be deleted according to the deletion policy.
152
174
153
175
### Risks and Mitigations
154
176
155
- Currently the PVCs created by StatefulSet are not deleted automatically. Using the
156
- ` DeleteOnScaledown ` or ` DeleteOnStatefulSetDeletion ` would delete the PVCs
157
- automatically. Since this involves persistent data being deleted, users should take
158
- appropriate care using this feature. Having the ` Retain ` behaviour as default will ensure
159
- that the PVCs remain intact by default and only a conscious choice made by user will
160
- involve any persistent data being deleted.
177
+ Currently the PVCs created by StatefulSet are not deleted automatically. Using
178
+ ` OnScaleDown ` or ` OnSetDeletion ` set to ` Delete ` would delete the PVCs
179
+ automatically. Since this involves persistent data being deleted, users should
180
+ take appropriate care using this feature. Having the ` Retain ` behaviour as
181
+ default will ensure that the PVCs remain intact by default and only a conscious
182
+ choice made by user will involve any persistent data being deleted.
161
183
162
184
This proposed API causes the PVCs associated with the StatefulSet to have
163
185
behavior close to, but not the same as, ephemeral volumes, such as emptyDir or
@@ -189,12 +211,13 @@ are affected.
189
211
190
212
### Volume delete policy for the StatefulSet created PVCs
191
213
192
- A new field named ` PersistentVolumeClaimDeletePolicy ` of the type
193
- ` StatefulSetPersistentVolumeClaimDeletePolicy ` will be added to the StatefulSet. This
194
- field will represent the user indication on whether the associated PVCs can be
195
- automatically deleted or not. The default policy would be ` Retain ` .
214
+ A new field named ` PersistentVolumeClaimPolicy ` of the type
215
+ ` StatefulSetPersistentVolumeClaimPolicy ` will be added to the StatefulSet. This
216
+ will represent the user indication for which circumstances the associated PVCs
217
+ can be automatically deleted or not, as described above. The default policy
218
+ would be to retain PVCs in all cases.
196
219
197
- The ` PersistentVolumeClaimDeletePolicy ` field will be mutable. The deletion
220
+ The ` PersistentVolumeClaimPolicy ` object will be mutable. The deletion
198
221
mechanism will be based on reconciliation, so as long as the field is changed
199
222
far from StatefulSet deletion or scale-down, the policy will work as
200
223
expected. Mutability does introduce race conditions if it is changed while a
@@ -207,14 +230,12 @@ manually deleting PVCs. The latter case will result in lost data, but only in
207
230
PVCs that were originally declared to have been deleted. Life does not always
208
231
have an undo button.
209
232
210
- #### ` DeleteOnScaledown `
233
+ #### ` OnScaleDown ` policy of ` Delete ` .
211
234
212
- If ` PersistentVolumeClaimDeletePolicy ` is set to ` DeleteOnScaledown ` , the Pod will be set
213
- as the owner of the PVCs created from the ` VolumeClaimTemplates ` just before the
214
- scale-down is performed by the StatefulSet controller. When a Pod is deleted, the PVC
215
- owned by the Pod is also deleted. When ` DeleteOnScaledown ` policy is set and the
216
- Statefulset gets deleted the PVCs also will get deleted (similar to
217
- ` DeleteOnStatefulSetDeletion ` policy).
235
+ If ` PersistentVolumeClaimPolicy.OnScaleDown ` is set to ` Delete ` , the Pod will be
236
+ set as the owner of the PVCs created from the ` VolumeClaimTemplates ` just before
237
+ the scale-down is performed by the StatefulSet controller. When a Pod is
238
+ deleted, the PVC owned by the Pod is also deleted.
218
239
219
240
The current StatefulSet controller implementation ensures that the manually deleted pods
220
241
are restored before the scale-down logic is run. This combined with the fact that the
@@ -226,21 +247,17 @@ the PVC was referred to by the deleted Pod and is in the process of getting
226
247
deleted. The controller will skip the reconcile loop until PVC deletion finishes, avoiding
227
248
a race condition.
228
249
229
- In addition, on PVC creation an OwnerRef is added for when the StatefulSet is
230
- deleted. See the ` DeleteOnStatefulSetDeletion ` policy below for further details
231
- how this will be handled.
232
-
233
- #### ` DeleteOnStatefulSetDeletion `
250
+ #### ` OnSetDeletion ` policy of ` Delete ` .
234
251
235
- When ` PersistentVolumeClaimDeletePolicy ` is set to
236
- ` DeleteOnStatefulSetDeletion ` , when a VolumeClaimTemplate PVC is created, an
237
- owner reference in PVC will be added to point to the StatefulSet. When a
238
- scale-up or scale-down occurs, the PVC is unchanged. PVCs previously in use
239
- before scale-down will be used again when the scale-up occurs.
252
+ When ` PersistentVolumeClaimPolicy.OnSetDeletion ` is set to ` Delete ` , when a
253
+ VolumeClaimTemplate PVC is created, an owner reference in PVC will be added to
254
+ point to the StatefulSet. When a scale-up or scale-down occurs, the PVC is
255
+ unchanged. PVCs previously in use before scale-down will be used again when the
256
+ scale-up occurs.
240
257
241
258
In the existing StatefulSet reconcile loop, the associated VolumeClaimTemplate
242
259
PVCs will be checked to see if the ownerRef is correct according to the
243
- ` PersistentVolumeClaimDeletePolicy ` and updated accordingly. This includes PVCs
260
+ ` PersistentVolumeClaimPolicy ` and updated accordingly. This includes PVCs
244
261
that have been manually provisioned. It will be most consistent and easy
245
262
to reason about if all VolumeClaimTemplate PVCs are treated uniformly rather
246
263
than trying to guess at their provenance.
@@ -253,16 +270,16 @@ ensures that PVC deletion happens only after the StatefulSet is deleted. This is
253
270
necessary because of PVC protection which does not allow PVC deletion until all
254
271
pods referencing it are deleted.
255
272
256
- ` Retain ` ` PersistentVolumeClaimDeletePolicy ` will ensure the current behaviour: no PVC
257
- deletion is performed as part of StatefulSet controller .
273
+ The deletion policies may be combined in order to get the delete behavior both
274
+ on set deletion as well as scale-down .
258
275
259
276
#### Non-Cascading Deletion
260
277
261
278
When StatefulSet is deleted without cascading, eg ` kubectl delete --cascade=false ` , then
262
279
existing behavior is retained and no PVC will be deleted. Only the StatefulSet resource
263
280
will be affected.
264
281
265
- #### Mutating ` PersistentVolumeClaimDeletePolicy `
282
+ #### Mutating ` PersistentVolumeClaimPolicy `
266
283
267
284
Recall that as defined above, the PVCs associated with a StatefulSet are found
268
285
by the StatefulSet volumeClaimTemplate static naming scheme. The Pods associated
@@ -339,13 +356,13 @@ In order to update the PVC ownerReference, the `buildControllerRoles` will be up
339
356
This features adds a new field to the StatefulSet. The default value for the new field
340
357
maintains the existing behavior of StatefulSets.
341
358
342
- On a downgrade, the ` PersistentVolumeClaimReclaimPolicy ` field will be hidden on
359
+ On a downgrade, the ` PersistentVolumeClaimPolicy ` field will be hidden on
343
360
any StatefulSets. The behavior in this case will be identical to mutating they
344
361
policy field to ` Retain ` , as described above, including the edge cases
345
362
introduced if this is done during a scale-down or StatefulSet deletion.
346
363
347
364
### Version Skew Strategy
348
- There are only kubecontroller manager changes involved (in addition to the
365
+ There are only kube-controller- manager changes involved (in addition to the
349
366
apiserver changes for dealing with the new StatefulSet field). Node components
350
367
are not involved so there is no version skew between nodes and the control plane.
351
368
0 commit comments