@@ -98,10 +98,6 @@ tags, and then generate with `hack/update-toc.sh`.
98
98
- [ Upgrade / Downgrade Strategy] ( #upgrade--downgrade-strategy )
99
99
- [ Version Skew Strategy] ( #version-skew-strategy )
100
100
- [ Production Readiness Review Questionnaire] ( #production-readiness-review-questionnaire )
101
- - [ Feature enablement and rollback] ( #feature-enablement-and-rollback )
102
- - [ Rollout, Upgrade and Rollback Planning] ( #rollout-upgrade-and-rollback-planning )
103
- - [ Monitoring requirements] ( #monitoring-requirements )
104
- - [ Dependencies] ( #dependencies )
105
101
- [ Scalability] ( #scalability )
106
102
- [ Troubleshooting] ( #troubleshooting )
107
103
- [ Implementation History] ( #implementation-history )
@@ -312,11 +308,18 @@ to leave the PVCs as is during the StatefulSet deletion.
312
308
313
309
If ` VolumeReclaimPolicy ` is set to ` RemoveOnScaledown ` Pod is set as the owner of the PVCs created
314
310
from the ` VolumeClaimTemplates ` . When a Pod is deleted, the PVC owned by the Pod is
315
- also deleted. During scale-up, if a PVC has an OwnerRef that does not match the Pod, it
311
+ also deleted.
312
+
313
+ During scale-up, if a PVC has an OwnerRef that does not match the Pod, it
316
314
potentially indicates that the PVC is referred by the deleted Pod and is in the process of
317
315
getting deleted. Controller will exit the current reconcile loop and attempt to reconcile in the
318
316
next iteration. This avoids a race with PVC deletion.
319
317
318
+ Current scaleset controller implementation ensures that the manually deleted pods are restored
319
+ before the scale down logic is run. The Pod owner reference is only added to the PVC just
320
+ before the scaling down by the controller. This ensures that the manual deletions do not
321
+ automatically delete the PVCs in question.
322
+
320
323
When ` VolumeReclaimPolicy ` is set to ` RemoveOnStatefulSetDeletion ` the owner reference in
321
324
PVC points to the StatefulSet. When a scale up or down occurs, the PVC would remain the same.
322
325
PVCs previously in use before scale down will be used again when the scale up occurs. The PVC deletion
@@ -334,6 +337,28 @@ Inorder to update the PVC ownerreference, the `buildControllerRoles` will be upd
334
337
335
338
### Test Plan
336
339
340
+ 1 . Unit tests
341
+
342
+ 1 . e2e tests
343
+ - RemoveOnScaleDown
344
+ 1 . Create 2 pod stateful set, scale to 1 pod, confirm PV deleted
345
+ 1 . Create 2 pod stateful set, add data to PVs, scale to 1 pod, scale back to 2, confirm PV empty
346
+ 1 . Create 2 pod stateful set, delete stateful set, confirm PVs deleted
347
+ 1.Create 2 pod stateful set, add data to PVs, manually delete one pod, confirm pod comes back and PV has data (PV not deleted)
348
+ 1 . As above, but manually delete all pods in stateful set
349
+ 1 . Create 2 pod stateful set, add data to PVs, manually delete one pod, immediately scale down to one pod, confirm PV is deleted
350
+ 1 . Create 2 pod stateful set, add data to PVs, manually delete one pod, immediately scale down to one pod, scale back to two pods, confirm PV is empty
351
+ - RemoveOnStatefulSetDeletion
352
+ 1 . Create 2 pod stateful set, scale to 1 pod, confirm PV still exists
353
+ 1 . Create 2 pod stateful set, add data to PVs, scale to 1 pod, scale back to 2, confirm PV has data (PV not deleted)
354
+ 1 . Create 2 pod stateful set, delete stateful set, confirm PVs deleted
355
+ 1 . Create 2 pod stateful set, add data to PVs, manually delete one pod, confirm pod comes back and PV has data (PV not deleted)
356
+ 1 . As above, but manually delete all pods in stateful set
357
+ 1 . Create 2 pod stateful set, add data to PVs, manually delete one pod, immediately scale down to one pod, confirm PV exists
358
+ 1 . Create 2 pod stateful set, add data to PVs, manually delete one pod, immediately scale down to one pod, scale back to two pods, confirm PV has data
359
+ - Retain:
360
+ 1 . same tests as above, but PVs not removed in any case
361
+
337
362
<!--
338
363
**Note:** *Not required until targeted at a release.*
339
364
@@ -463,186 +488,6 @@ you need any help or guidance.
463
488
464
489
-->
465
490
466
- ### Feature enablement and rollback
467
-
468
- _ This section must be completed when targeting alpha to a release._
469
-
470
- * ** How can this feature be enabled / disabled in a live cluster?**
471
- - [ ] Feature gate (also fill in values in ` kep.yaml ` )
472
- - Feature gate name:
473
- - Components depending on the feature gate:
474
- - [ ] Other
475
- - Describe the mechanism:
476
- - Will enabling / disabling the feature require downtime of the control
477
- plane?
478
- - Will enabling / disabling the feature require downtime or reprovisioning
479
- of a node? (Do not assume ` Dynamic Kubelet Config ` feature is enabled).
480
-
481
- * ** Does enabling the feature change any default behavior?**
482
- Any change of default behavior may be surprising to users or break existing
483
- automations, so be extremely careful here.
484
-
485
- * ** Can the feature be disabled once it has been enabled (i.e. can we rollback
486
- the enablement)?**
487
- Also set ` disable-supported ` to ` true ` or ` false ` in ` kep.yaml ` .
488
- Describe the consequences on existing workloads (e.g. if this is runtime
489
- feature, can it break the existing applications?).
490
-
491
- * ** What happens if we reenable the feature if it was previously rolled back?**
492
-
493
- * ** Are there any tests for feature enablement/disablement?**
494
- The e2e framework does not currently support enabling and disabling feature
495
- gates. However, unit tests in each component dealing with managing data created
496
- with and without the feature are necessary. At the very least, think about
497
- conversion tests if API types are being modified.
498
-
499
- ### Rollout, Upgrade and Rollback Planning
500
-
501
- _ This section must be completed when targeting beta graduation to a release._
502
-
503
- * ** How can a rollout fail? Can it impact already running workloads?**
504
- Try to be as paranoid as possible - e.g. what if some components will restart
505
- in the middle of rollout?
506
-
507
- * ** What specific metrics should inform a rollback?**
508
-
509
- * ** Were upgrade and rollback tested? Was upgrade->downgrade->upgrade path tested?**
510
- Describe manual testing that was done and the outcomes.
511
- Longer term, we may want to require automated upgrade/rollback tests, but we
512
- are missing a bunch of machinery and tooling and do that now.
513
-
514
- * ** Is the rollout accompanied by any deprecations and/or removals of features,
515
- APIs, fields of API types, flags, etc.?**
516
- Even if applying deprecation policies, they may still surprise some users.
517
-
518
- ### Monitoring requirements
519
-
520
- _ This section must be completed when targeting beta graduation to a release._
521
-
522
- * ** How can an operator determine if the feature is in use by workloads?**
523
- Ideally, this should be a metrics. Operations against Kubernetes API (e.g.
524
- checking if there are objects with field X set) may be last resort. Avoid
525
- logs or events for this purpose.
526
-
527
- * ** What are the SLIs (Service Level Indicators) an operator can use to
528
- determine the health of the service?**
529
- - [ ] Metrics
530
- - Metric name:
531
- - [ Optional] Aggregation method:
532
- - Components exposing the metric:
533
- - [ ] Other (treat as last resort)
534
- - Details:
535
-
536
- * ** What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
537
- At the high-level this usually will be in the form of "high percentile of SLI
538
- per day <= X". It's impossible to provide a comprehensive guidance, but at the very
539
- high level (they needs more precise definitions) those may be things like:
540
- - per-day percentage of API calls finishing with 5XX errors <= 1%
541
- - 99% percentile over day of absolute value from (job creation time minus expected
542
- job creation time) for cron job <= 10%
543
- - 99,9% of /health requests per day finish with 200 code
544
-
545
- * ** Are there any missing metrics that would be useful to have to improve
546
- observability if this feature?**
547
- Describe the metrics themselves and the reason they weren't added (e.g. cost,
548
- implementation difficulties, etc.).
549
-
550
- ### Dependencies
551
-
552
- _ This section must be completed when targeting beta graduation to a release._
553
-
554
- * ** Does this feature depend on any specific services running in the cluster?**
555
- Think about both cluster-level services (e.g. metrics-server) as well
556
- as node-level agents (e.g. specific version of CRI). Focus on external or
557
- optional services that are needed. For example, if this feature depends on
558
- a cloud provider API, or upon an external software-defined storage or network
559
- control plane.
560
-
561
- For each of the fill in the following, thinking both about running user workloads
562
- and creating new ones, as well as about cluster-level services (e.g. DNS):
563
- - [ Dependency name]
564
- - Usage description:
565
- - Impact of its outage on the feature:
566
- - Impact of its degraded performance or high error rates on the feature:
567
-
568
-
569
- ### Scalability
570
-
571
- _ For alpha, this section is encouraged: reviewers should consider these questions
572
- and attempt to answer them._
573
-
574
- _ For beta, this section is required: reviewers must answer these questions._
575
-
576
- _ For GA, this section is required: approvers should be able to confirms the
577
- previous answers based on experience in the field._
578
-
579
- * ** Will enabling / using this feature result in any new API calls?**
580
- Describe them, providing:
581
- - API call type (e.g. PATCH pods)
582
- - estimated throughput
583
- - originating component(s) (e.g. Kubelet, Feature-X-controller)
584
- focusing mostly on:
585
- - components listing and/or watching resources they didn't before
586
- - API calls that may be triggered by changes of some Kubernetes resources
587
- (e.g. update of object X triggers new updates of object Y)
588
- - periodic API calls to reconcile state (e.g. periodic fetching state,
589
- heartbeats, leader election, etc.)
590
-
591
- * ** Will enabling / using this feature result in introducing new API types?**
592
- Describe them providing:
593
- - API type
594
- - Supported number of objects per cluster
595
- - Supported number of objects per namespace (for namespace-scoped objects)
596
-
597
- * ** Will enabling / using this feature result in any new calls to cloud
598
- provider?**
599
-
600
- * ** Will enabling / using this feature result in increasing size or count
601
- of the existing API objects?**
602
- Describe them providing:
603
- - API type(s):
604
- - Estimated increase in size: (e.g. new annotation of size 32B)
605
- - Estimated amount of new objects: (e.g. new Object X for every existing Pod)
606
-
607
- * ** Will enabling / using this feature result in increasing time taken by any
608
- operations covered by [ existing SLIs/SLOs] [ ] ?**
609
- Think about adding additional work or introducing new steps in between
610
- (e.g. need to do X to start a container), etc. Please describe the details.
611
-
612
- * ** Will enabling / using this feature result in non-negligible increase of
613
- resource usage (CPU, RAM, disk, IO, ...) in any components?**
614
- Things to keep in mind include: additional in-memory state, additional
615
- non-trivial computations, excessive access to disks (including increased log
616
- volume), significant amount of data send and/or received over network, etc.
617
- This through this both in small and large cases, again with respect to the
618
- [ supported limits] [ ] .
619
-
620
- ### Troubleshooting
621
-
622
- Troubleshooting section serves the ` Playbook ` role as of now. We may consider
623
- splitting it into a dedicated ` Playbook ` document (potentially with some monitoring
624
- details). For now we leave it here though.
625
-
626
- _ This section must be completed when targeting beta graduation to a release._
627
-
628
- * ** How does this feature react if the API server and/or etcd is unavailable?**
629
-
630
- * ** What are other known failure modes?**
631
- For each of them fill in the following information by copying the below template:
632
- - [ Failure mode brief description]
633
- - Detection: How can it be detected via metrics? Stated another way:
634
- how can an operator troubleshoot without loogging into a master or worker node?
635
- - Mitigations: What can be done to stop the bleeding, especially for already
636
- running user workloads?
637
- - Diagnostics: What are the useful log messages and their required logging
638
- levels that could help debugging the issue?
639
- Not required until feature graduated to Beta.
640
- - Testing: Are there any tests for failure mode? If not describe why.
641
-
642
- * ** What steps should be taken if SLOs are not being met to determine the problem?**
643
-
644
- [ supported limits ] : https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
645
- [ existing SLIs/SLOs ] : https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
646
491
647
492
## Implementation History
648
493
@@ -658,23 +503,15 @@ Major milestones might include
658
503
-->
659
504
660
505
## Drawbacks
661
-
506
+ The Statefulset field update is required.
662
507
<!--
663
508
Why should this KEP _not_ be implemented?
664
509
-->
665
510
666
511
## Alternatives
667
-
512
+ Users can delete the PVC manually. This is the motivation of the KEP.
668
513
<!--
669
514
What other approaches did you consider and why did you rule them out? These do
670
515
not need to be as detailed as the proposal, but should include enough
671
516
information to express the idea and why it was not acceptable.
672
517
-->
673
-
674
- ## Infrastructure Needed (optional)
675
-
676
- <!--
677
- Use this section if you need things from the project/SIG. Examples include a
678
- new subproject, repos requested, github details. Listing these here allows a
679
- SIG to get the process for these resources started right away.
680
- -->
0 commit comments