@@ -401,8 +401,14 @@ For alpha:
401
401
and further development efforts.
402
402
- Focus should be on supported user stories as listed above.
403
403
404
- Once this data is available, additional test plans should be added for the next
405
- phase of graduation.
404
+ For beta:
405
+
406
+ - Add e2e tests that exercise all available swap configurations via the CRI.
407
+ - Add e2e tests that verify pod-level control of swap utilization.
408
+ - Add e2e tests that verify swap performance with pods using a tmpfs.
409
+ - Verify new system-reserved settings for swap memory.
410
+ - Verify MemoryPressure behaviour with swap enabled and document any changes
411
+ for configuring eviction.
406
412
407
413
### Graduation Criteria
408
414
@@ -416,8 +422,6 @@ phase of graduation.
416
422
417
423
#### Beta
418
424
419
- _ (Tentative.)_
420
-
421
425
- Add support for controlling swap consumption at the pod level [ via cgroups] .
422
426
- Handle usage of swap during container restart boundaries for writes to tmpfs
423
427
(which may require pod cgroup change beyond what container runtime will do at
@@ -426,6 +430,7 @@ _(Tentative.)_
426
430
detects on the host.
427
431
- Consider introducing new configuration modes for swap, such as a node-wide
428
432
swap limit for workloads.
433
+ - Add swap memory to the Kubelet stats api.
429
434
- Determine a set of metrics for node QoS in order to evaluate the performance
430
435
of nodes with and without swap enabled.
431
436
- Better understand relationship of swap with memory QoS in cgroup v2
@@ -437,6 +442,8 @@ _(Tentative.)_
437
442
438
443
#### GA
439
444
445
+ _ (Tentative.)_
446
+
440
447
- Test a wide variety of scenarios that may be affected by swap support.
441
448
- Remove feature flag.
442
449
@@ -587,13 +594,30 @@ Try to be as paranoid as possible - e.g., what if some components will restart
587
594
mid-rollout?
588
595
-->
589
596
597
+ If a new node with swap memory fails to come online, it will not impact any
598
+ running components.
599
+
600
+ It is possible that if a cluster administrator adds swap memory to an already
601
+ running node, and then performs an in-place upgrade, the new kubelet could fail
602
+ to start unless the configuration was modified to tolerate swap. However, we
603
+ would expect that if a cluster admin is adding swap to the node, they will also
604
+ update the kubelet's configuration to not fail with swap present.
605
+
606
+ Generally, it is considered best practice to add a swap memory partition at
607
+ node image/boot time and not provision it dynamically after a kubelet is
608
+ already running and reporting Ready on a node.
609
+
590
610
###### What specific metrics should inform a rollback?
591
611
592
612
<!--
593
613
What signals should users be paying attention to when the feature is young
594
614
that might indicate a serious problem?
595
615
-->
596
616
617
+ Workload churn or performance degradations on nodes. The metrics will be
618
+ application/use-case specific, but we can provide some suggestions, based on
619
+ the stability metrics identified earlier.
620
+
597
621
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
598
622
599
623
<!--
@@ -602,12 +626,17 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
602
626
are missing a bunch of machinery and tooling and can't do that now.
603
627
-->
604
628
629
+ N/A because swap support lacks a runtime upgrade/downgrade path; kubelet must
630
+ be restarted with or without swap support.
631
+
605
632
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
606
633
607
634
<!--
608
635
Even if applying deprecation policies, they may still surprise some users.
609
636
-->
610
637
638
+ No.
639
+
611
640
### Monitoring Requirements
612
641
613
642
<!--
@@ -622,12 +651,26 @@ checking if there are objects with field X set) may be a last resort. Avoid
622
651
logs or events for this purpose.
623
652
-->
624
653
654
+ KubeletConfiguration has set ` failOnSwap: false ` .
655
+
656
+ The prometheus ` node_exporter ` will also export stats on swap memory
657
+ utilization.
658
+
625
659
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
626
660
627
661
<!--
628
662
Pick one more of these and delete the rest.
629
663
-->
630
664
665
+ TBD. We will determine a set of metrics as a requirement for beta graduation.
666
+ We will need more production data; there is not a single metric or set of
667
+ metrics that can be used to generally quantify node performance.
668
+
669
+ This section to be updated before the feature can be marked as graduated, and
670
+ to be worked on during 1.23 development.
671
+
672
+ We will also add swap memory utilization to the Kubelet stats API, to provide a means of monitoring this beyond cadvisor Prometheus stats.
673
+
631
674
- [ ] Metrics
632
675
- Metric name:
633
676
- [ Optional] Aggregation method:
@@ -647,13 +690,17 @@ high level (needs more precise definitions) those may be things like:
647
690
- 99,9% of /health requests per day finish with 200 code
648
691
-->
649
692
693
+ N/A
694
+
650
695
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
651
696
652
697
<!--
653
698
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
654
699
implementation difficulties, etc.).
655
700
-->
656
701
702
+ N/A
703
+
657
704
### Dependencies
658
705
659
706
<!--
@@ -784,6 +831,8 @@ details). For now, we leave it here.
784
831
785
832
###### How does this feature react if the API server and/or etcd is unavailable?
786
833
834
+ No change. Feature is specific to individual nodes.
835
+
787
836
###### What are other known failure modes?
788
837
789
838
<!--
@@ -799,8 +848,23 @@ For each of them, fill in the following information by copying the below templat
799
848
- Testing: Are there any tests for failure mode? If not, describe why.
800
849
-->
801
850
851
+
852
+ Individual nodes with swap memory enabled may experience performance
853
+ degradations under load. This could potentially cause a cascading failure on
854
+ nodes without swap: if nodes with swap fail Ready checks, workloads may be
855
+ rescheduled en masse.
856
+
857
+ Thus, cluster administrators should be careful while enabling swap. To minimize
858
+ disruption, you may want to taint nodes with swap available to protect against
859
+ this problem. Taints will ensure that workloads which tolerate swap will not
860
+ spill onto nodes without swap under load.
861
+
802
862
###### What steps should be taken if SLOs are not being met to determine the problem?
803
863
864
+ It is suggested that if nodes with swap memory enabled cause performance or
865
+ stability degradations, those nodes are cordoned, drained, and replaced with
866
+ nodes that do not use swap memory.
867
+
804
868
## Implementation History
805
869
806
870
- ** 2015-04-24:** Discussed in [ #7294 ] ( https://github.com/kubernetes/kubernetes/issues/7294 ) .
0 commit comments