Skip to content

Commit 100b05a

Browse files
ehashmanravisantoshgudimetla
authored andcommitted
KEP-2400: Update swap KEP for 1.23 beta (kubernetes#2858)
* Update swap KEP for 1.23 beta Fill out remaining beta PRR questions, add test plans * Address PRR feedback * Add test plan note for eviction manager/MemoryPressure * Add swap memory to Kubelet stats API
1 parent 5a3fdec commit 100b05a

File tree

3 files changed

+72
-6
lines changed

3 files changed

+72
-6
lines changed
+2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 2400
22
alpha:
33
approver: "@deads2k"
4+
beta:
5+
approver: "@deads2k"

keps/sig-node/2400-node-swap/README.md

+68-4
Original file line numberDiff line numberDiff line change
@@ -401,8 +401,14 @@ For alpha:
401401
and further development efforts.
402402
- Focus should be on supported user stories as listed above.
403403

404-
Once this data is available, additional test plans should be added for the next
405-
phase of graduation.
404+
For beta:
405+
406+
- Add e2e tests that exercise all available swap configurations via the CRI.
407+
- Add e2e tests that verify pod-level control of swap utilization.
408+
- Add e2e tests that verify swap performance with pods using a tmpfs.
409+
- Verify new system-reserved settings for swap memory.
410+
- Verify MemoryPressure behaviour with swap enabled and document any changes
411+
for configuring eviction.
406412

407413
### Graduation Criteria
408414

@@ -416,8 +422,6 @@ phase of graduation.
416422

417423
#### Beta
418424

419-
_(Tentative.)_
420-
421425
- Add support for controlling swap consumption at the pod level [via cgroups].
422426
- Handle usage of swap during container restart boundaries for writes to tmpfs
423427
(which may require pod cgroup change beyond what container runtime will do at
@@ -426,6 +430,7 @@ _(Tentative.)_
426430
detects on the host.
427431
- Consider introducing new configuration modes for swap, such as a node-wide
428432
swap limit for workloads.
433+
- Add swap memory to the Kubelet stats api.
429434
- Determine a set of metrics for node QoS in order to evaluate the performance
430435
of nodes with and without swap enabled.
431436
- Better understand relationship of swap with memory QoS in cgroup v2
@@ -437,6 +442,8 @@ _(Tentative.)_
437442

438443
#### GA
439444

445+
_(Tentative.)_
446+
440447
- Test a wide variety of scenarios that may be affected by swap support.
441448
- Remove feature flag.
442449

@@ -587,13 +594,30 @@ Try to be as paranoid as possible - e.g., what if some components will restart
587594
mid-rollout?
588595
-->
589596

597+
If a new node with swap memory fails to come online, it will not impact any
598+
running components.
599+
600+
It is possible that if a cluster administrator adds swap memory to an already
601+
running node, and then performs an in-place upgrade, the new kubelet could fail
602+
to start unless the configuration was modified to tolerate swap. However, we
603+
would expect that if a cluster admin is adding swap to the node, they will also
604+
update the kubelet's configuration to not fail with swap present.
605+
606+
Generally, it is considered best practice to add a swap memory partition at
607+
node image/boot time and not provision it dynamically after a kubelet is
608+
already running and reporting Ready on a node.
609+
590610
###### What specific metrics should inform a rollback?
591611

592612
<!--
593613
What signals should users be paying attention to when the feature is young
594614
that might indicate a serious problem?
595615
-->
596616

617+
Workload churn or performance degradations on nodes. The metrics will be
618+
application/use-case specific, but we can provide some suggestions, based on
619+
the stability metrics identified earlier.
620+
597621
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
598622

599623
<!--
@@ -602,12 +626,17 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
602626
are missing a bunch of machinery and tooling and can't do that now.
603627
-->
604628

629+
N/A because swap support lacks a runtime upgrade/downgrade path; kubelet must
630+
be restarted with or without swap support.
631+
605632
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
606633

607634
<!--
608635
Even if applying deprecation policies, they may still surprise some users.
609636
-->
610637

638+
No.
639+
611640
### Monitoring Requirements
612641

613642
<!--
@@ -622,12 +651,26 @@ checking if there are objects with field X set) may be a last resort. Avoid
622651
logs or events for this purpose.
623652
-->
624653

654+
KubeletConfiguration has set `failOnSwap: false`.
655+
656+
The prometheus `node_exporter` will also export stats on swap memory
657+
utilization.
658+
625659
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
626660

627661
<!--
628662
Pick one more of these and delete the rest.
629663
-->
630664

665+
TBD. We will determine a set of metrics as a requirement for beta graduation.
666+
We will need more production data; there is not a single metric or set of
667+
metrics that can be used to generally quantify node performance.
668+
669+
This section to be updated before the feature can be marked as graduated, and
670+
to be worked on during 1.23 development.
671+
672+
We will also add swap memory utilization to the Kubelet stats API, to provide a means of monitoring this beyond cadvisor Prometheus stats.
673+
631674
- [ ] Metrics
632675
- Metric name:
633676
- [Optional] Aggregation method:
@@ -647,13 +690,17 @@ high level (needs more precise definitions) those may be things like:
647690
- 99,9% of /health requests per day finish with 200 code
648691
-->
649692

693+
N/A
694+
650695
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
651696

652697
<!--
653698
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
654699
implementation difficulties, etc.).
655700
-->
656701

702+
N/A
703+
657704
### Dependencies
658705

659706
<!--
@@ -784,6 +831,8 @@ details). For now, we leave it here.
784831

785832
###### How does this feature react if the API server and/or etcd is unavailable?
786833

834+
No change. Feature is specific to individual nodes.
835+
787836
###### What are other known failure modes?
788837

789838
<!--
@@ -799,8 +848,23 @@ For each of them, fill in the following information by copying the below templat
799848
- Testing: Are there any tests for failure mode? If not, describe why.
800849
-->
801850

851+
852+
Individual nodes with swap memory enabled may experience performance
853+
degradations under load. This could potentially cause a cascading failure on
854+
nodes without swap: if nodes with swap fail Ready checks, workloads may be
855+
rescheduled en masse.
856+
857+
Thus, cluster administrators should be careful while enabling swap. To minimize
858+
disruption, you may want to taint nodes with swap available to protect against
859+
this problem. Taints will ensure that workloads which tolerate swap will not
860+
spill onto nodes without swap under load.
861+
802862
###### What steps should be taken if SLOs are not being met to determine the problem?
803863

864+
It is suggested that if nodes with swap memory enabled cause performance or
865+
stability degradations, those nodes are cordoned, drained, and replaced with
866+
nodes that do not use swap memory.
867+
804868
## Implementation History
805869

806870
- **2015-04-24:** Discussed in [#7294](https://github.com/kubernetes/kubernetes/issues/7294).

keps/sig-node/2400-node-swap/kep.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,12 @@ prr-approvers:
2020
- "@deads2k"
2121

2222
# The target maturity stage in the current dev cycle for this KEP.
23-
stage: alpha
23+
stage: beta
2424

2525
# The most recent milestone for which work toward delivery of this KEP has been
2626
# done. This can be the current (upcoming) milestone, if it is being actively
2727
# worked on.
28-
latest-milestone: "v1.22"
28+
latest-milestone: "v1.23"
2929

3030
# The milestone at which this feature was, or is targeted to be, at each stage.
3131
milestone:

0 commit comments

Comments
 (0)