Support for exposing PSI metrics #3083

dqminh · 2022-03-23T11:28:42Z

This depends on opencontainers/runc#3358, so it should not be merged as-is, but we can review the structure and how metrics are exposed.

k8s-ci-robot · 2022-03-23T11:28:51Z

Hi @dqminh. Thanks for your PR.

I'm waiting for a google member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Signed-off-by: Daniel Dao <dqminh89@gmail.com>

This adds 2 new set of metrics: - `psi_total`: read total number of seconds a resource is under pressure - `psi_avg`: read ratio of time a resource is under pressure over a sliding time window. For more details about these definitions, see: - https://www.kernel.org/doc/html/latest/accounting/psi.html - https://facebookmicrosites.github.io/psi/docs/overview Signed-off-by: Daniel Dao <dqminh89@gmail.com>

This adds support for reading PSI metrics via prometheus. We exposes the following for `psi_total`: ``` container_cpu_psi_total_seconds container_memory_psi_total_seconds container_io_psi_total_seconds ``` And for `psi_avg`: ``` container_cpu_psi_avg10_ratio container_cpu_psi_avg60_ratio container_cpu_psi_avg300_ratio container_memory_psi_avg10_ratio container_memory_psi_avg60_ratio container_memory_psi_avg300_ratio container_io_psi_avg10_ratio container_io_psi_avg60_ratio container_io_psi_avg300_ratio ``` Signed-off-by: Daniel Dao <dqminh89@gmail.com>

szuecs · 2022-11-04T19:01:35Z

@bobbypage does this need to wait longer, anything missing?
I ask because there is an internal request to support this feature. :)

@dqminh likely it needs a rebase, maybe you can do it :)

bobbypage · 2022-11-04T19:41:15Z

We are still waiting for opencontainers/runc#3358 to merged in runc...

szuecs · 2022-11-14T18:52:24Z

@bobbypage not sure should I take over the PR?
@dqminh thoughts?

Signed-off-by: liuqiming.lqm <liuqiming.lqm@alibaba-inc.com>

SuperQ · 2023-04-17T23:29:00Z

Fyi, it's unnecessary and not in best practices to expose the pre-computed averages in Prometheus.

Prometheus can compute arbitrary averages from the Total data.

nmlc · 2023-08-23T05:51:31Z

Any news here? Seems like runc PSI pr got merged

SuperQ · 2023-08-23T06:28:41Z

metrics/prometheus.go

+	if includedMetrics.Has(container.PSITotalMetrics) {
+		c.containerMetrics = append(c.containerMetrics, []containerMetric{
+			{
+				name:        "container_cpu_psi_total_seconds",


Suggested change

name: "container_cpu_psi_total_seconds",

name: "container_pressure_cpu_seconds_total",

SuperQ · 2023-08-23T06:29:17Z

metrics/prometheus.go

+					return getPSIValues(s, &s.Cpu.PSI, "total")
+				},
+			}, {
+				name:        "container_memory_psi_total_seconds",


Suggested change

name: "container_memory_psi_total_seconds",

name: "container_pressure_memory_seconds_total",

SuperQ · 2023-08-23T06:30:10Z

metrics/prometheus.go

+					return getPSIValues(s, &s.Memory.PSI, "total")
+				},
+			}, {
+				name:        "container_io_psi_total_seconds",


Suggested change

name: "container_io_psi_total_seconds",

name: "container_pressure_io_seconds_total",

SuperQ · 2023-08-23T06:31:58Z

metrics/prometheus.go

+	if includedMetrics.Has(container.PSIAvgMetrics) {
+		makePSIAvgMetric := func(controller, window string) containerMetric {
+			return containerMetric{
+				name:        fmt.Sprintf("container_%s_psi_avg%s_ratio", controller, window),
+				help:        fmt.Sprintf("Ratio of time spent under %s pressure over time window of %s seconds", controller, window),
+				valueType:   prometheus.GaugeValue,
+				extraLabels: []string{"kind"},
+				getValues: func(s *info.ContainerStats) metricValues {
+					switch controller {
+					case "cpu":
+						return getPSIValues(s, &s.Cpu.PSI, "avg"+window)
+					case "memory":
+						return getPSIValues(s, &s.Memory.PSI, "avg"+window)
+					case "io":
+						return getPSIValues(s, &s.DiskIo.PSI, "avg"+window)
+					default:
+						return nil
+					}
+				},
+			}
+		}
+		for _, controller := range []string{"cpu", "memory", "io"} {
+			for _, window := range []string{"10", "60", "300"} {
+				c.containerMetrics = append(c.containerMetrics, makePSIAvgMetric(controller, window))
+			}
+		}
+	}


These metrics are unnecessary in Prometheus as we can compute averages from the counters. Please remove them to avoid excess metric cardinality.

Suggested change

if includedMetrics.Has(container.PSIAvgMetrics) {

makePSIAvgMetric := func(controller, window string) containerMetric {

return containerMetric{

name: fmt.Sprintf("container_%s_psi_avg%s_ratio", controller, window),

help: fmt.Sprintf("Ratio of time spent under %s pressure over time window of %s seconds", controller, window),

valueType: prometheus.GaugeValue,

extraLabels: []string{"kind"},

getValues: func(s *info.ContainerStats) metricValues {

switch controller {

case "cpu":

return getPSIValues(s, &s.Cpu.PSI, "avg"+window)

case "memory":

return getPSIValues(s, &s.Memory.PSI, "avg"+window)

case "io":

return getPSIValues(s, &s.DiskIo.PSI, "avg"+window)

default:

return nil

}

},

}

}

for _, controller := range []string{"cpu", "memory", "io"} {

for _, window := range []string{"10", "60", "300"} {

c.containerMetrics = append(c.containerMetrics, makePSIAvgMetric(controller, window))

}

}

}

I believe this may be resolved since the 10/60/300 averages are inherent to the exposed PSI data?

SuperQ · 2023-08-23T06:40:40Z

metrics/prometheus.go

+	case "total":
+		// total is measured as microseconds
+		v = append(v, metricValue{value: float64(time.Duration(psi.Some.Total)*time.Microsecond) / float64(time.Second), timestamp: s.Timestamp, labels: []string{"some"}})
+		v = append(v, metricValue{value: float64(time.Duration(psi.Full.Total)*time.Microsecond) / float64(time.Second), timestamp: s.Timestamp, labels: []string{"full"}})


Note for CPU, we don't need to expose "full". In practice, I've only found the "some" metrics to be useful. The "some" value is a superset of "full". IMO we should just include that to reduce the cardinality.

See the PSI docs:

CPU full is undefined at the system level, but has been reported since 5.13, so it is set to zero for backward compatibility.

I would like to raise my opinion for exposing "CPU full" metrics on a container(/cgroup) level.

While "CPU full" is undefined at the system level, it reports values for other cgroups:

$ cat /sys/fs/cgroup/user.slice/cpu.pressure some avg10=0.00 avg60=0.05 avg300=0.20 total=68139509 full avg10=0.00 avg60=0.00 avg300=0.00 total=40148380

Exposing both might be useful to differentiate between cgroups that try to execute more work than compute time available to them (indicated through "CPU some") and cgroups that are fully blocked/stalled (maybe because of other cgroups and/or process priority, …; indicated through "CPU full").

Interesting, thanks for this info. Are there kernel docs that document the per-cgroup behavior?

IMO this should be a separate metric name, rather than a label. The reason is that since some is a inclusive of full, doing something like sum(rate(container_pressure_cpu_seconds_total[1m])) would be confusing.

I would suggest these two metric names:

container_pressure_cpu_seconds_total

container_pressure_cpu_full_seconds_total

szuecs · 2023-10-31T09:14:58Z

@bobbypage can you set ok to test and do a review?
Would be great to see this in cadvisor

bobbypage · 2023-12-06T20:01:53Z

Waiting for new runc release to include opencontainers/runc@1aa7ca8

MathieuCesbron · 2024-02-23T17:24:25Z

Can we have an update on this ? I would like to access the psi interface. Thanks guys.

dims · 2024-02-23T17:36:18Z

@MathieuCesbron did you see the previous comment (the one before yours?)

dims · 2024-03-26T11:37:37Z

go.mod

 	google.golang.org/grpc v1.33.2
 	k8s.io/klog/v2 v2.4.0
 	k8s.io/utils v0.0.0-20211116205334-6203023598ed
 )
+
+replace github.com/opencontainers/runc => github.com/dqminh/runc v0.0.0-20220513155811-6414629ada8a


Please remove this and use the upstream runc as-is.

zouyee · 2024-04-07T03:19:48Z

runc has released 1.2.0-rc.1

zouyee · 2024-04-17T02:59:20Z

@bobbypage not sure should I take over the PR? @dqminh thoughts?

@szuecs take over ?

SuperQ · 2024-04-17T05:54:31Z

I would be happy to take over this change, I would like to make sure it aligns well with best practices. As it is, it does not.

szuecs · 2024-04-17T11:47:10Z

There is only a release candidate so I would wait until there's a proper release.
I don't mind that someone else would do it.

akgoel18 · 2024-06-06T09:03:20Z

@szuecs Any update on this ? I'm asking this is a internally heavily requested feature. :)

szuecs · 2024-06-06T10:37:24Z

@akgoel18 please check upstream project release cycle, thanks

pacoxu · 2024-07-18T05:50:19Z

opencontainers/runc#3900 was merged and will be released with runc 1.2.0

pacoxu · 2024-10-22T10:08:59Z

https://github.com/opencontainers/runc/releases/tag/v1.2.0 runc v1.2.0 is released today.

Do you have time to bump runc?

dims · 2024-10-22T12:08:45Z

@pacoxu when containerd 2.0 gets out runc used with it will be 1.2.0, so till we have only 1.6/1.7 containerd with k8s, we should stick to older runc(s) both as binary as well as vendoring

haircommander · 2024-10-22T13:36:45Z

@dims why do cadvisor/k8s and containerd need to have an in sync runc version?

dims · 2024-10-22T13:59:07Z

@dims why do cadvisor/k8s and containerd need to have an in sync runc version?

bad things have happened before @haircommander example see opencontainers/runc#3849

alexandremahdhaoui · 2024-10-24T07:04:36Z

@dims @haircommander to address and prevent the issue with "out of sync" runc versions we could add a CI check.

NB: I wrote a tool (usage) in kubernetes-sigs/container-runtime to verify and ensure specified go modules stays in sync with some upstream modules. I'd be glad to open a PR here or in k/k to add a check.

rexagod · 2024-11-12T11:41:01Z

Since the backports are in now, I think we can go ahead with this (and incorporating @SuperQ's suggestions above)?

SuperQ · 2024-11-12T11:43:47Z

I might go ahead and make a fork/alternative to this specific PR.

enp0s3 · 2025-01-18T11:50:09Z

@SuperQ Hi, do you plan to take this over?

SuperQ · 2025-01-18T12:00:58Z

I was hoping I would have time over the holidays. But ended up working on other projects. I could take a look soon. But as-is I don't think this PR should be merged.

xinau · 2025-01-20T15:29:56Z

@SuperQ I would be happy to help on the development of this feature. Is it basically a rebase of this PR + resolving the review comments? Or is there more to be done?

rexagod · 2025-01-21T07:59:13Z

@xinau That sounds good for now. I'd suggest opening a PR with those suggestions, any additional changes may be made incrementally upon review.

issues: google#3052, google#3083, kubernetes/enhancements#4205 This change adds metrics for pressure stall information, that indicate why some or all tasks of a cgroupv2 have waited due to resource congestion (cpu, memory, io). The change exposes this information by including the _PSIStats_ of each controller in it's stats, i.e. _CPUStats.PSI_, _MemoryStats.PSI_ and _DiskStats.PSI_. The information is additionally exposed as Prometheus metrics. The metrics follow the naming outlined by the prometheus/node-exporter, where stalled eq full and waiting eq some. ``` container_pressure_cpu_stalled_seconds_total container_pressure_cpu_waiting_seconds_total container_pressure_memory_stalled_seconds_total container_pressure_memory_waiting_seconds_total container_pressure_io_stalled_seconds_total container_pressure_io_waiting_seconds_total ``` Signed-off-by: Felix Ehrenpfort <felix@ehrenpfort.de>

xinau · 2025-01-26T17:28:11Z

I've opened #3649 with the requested rebase and addressed the review comments.

issues: google#3052, google#3083, kubernetes/enhancements#4205 This change adds metrics for pressure stall information, that indicate why some or all tasks of a cgroupv2 have waited due to resource congestion (cpu, memory, io). The change exposes this information by including the _PSIStats_ of each controller in it's stats, i.e. _CPUStats.PSI_, _MemoryStats.PSI_ and _DiskStats.PSI_. The information is additionally exposed as Prometheus metrics. The metrics follow the naming outlined by the prometheus/node-exporter, where stalled eq full and waiting eq some. ``` container_pressure_cpu_stalled_seconds_total container_pressure_cpu_waiting_seconds_total container_pressure_memory_stalled_seconds_total container_pressure_memory_waiting_seconds_total container_pressure_io_stalled_seconds_total container_pressure_io_waiting_seconds_total ``` Signed-off-by: Felix Ehrenpfort <felix@ehrenpfort.de>

k8s-ci-robot added the needs-ok-to-test label Mar 23, 2022

dqminh force-pushed the psi branch 2 times, most recently from a870125 to 0a22793 Compare May 17, 2022 12:33

dqminh added 2 commits May 17, 2022 13:42

Replace runc with dqminh/runc for psi support

427f67e

Signed-off-by: Daniel Dao <dqminh89@gmail.com>

dqminh force-pushed the psi branch 3 times, most recently from 4c5c036 to 57ac945 Compare May 20, 2022 11:08

dqminh force-pushed the psi branch from 57ac945 to 5a27e49 Compare May 24, 2022 08:35

lukasmrtvy mentioned this pull request Oct 18, 2022

Podman support. #3021

Merged

john-liuqiming pushed a commit to cloud-native-observability/cadvisor that referenced this pull request Apr 4, 2023

merge PR google#3083

1883f03

Signed-off-by: liuqiming.lqm <liuqiming.lqm@alibaba-inc.com>

SuperQ suggested changes Aug 23, 2023

View reviewed changes

dims reviewed Mar 26, 2024

View reviewed changes

pacoxu mentioned this pull request Jul 18, 2024

Support PSI based on cgroupv2 kubernetes/enhancements#4205

Open

15 tasks

pacoxu mentioned this pull request Oct 22, 2024

bump runc to v1.2.0 kubernetes/kubernetes#128261

Closed

xinau mentioned this pull request Jan 26, 2025

Add Pressure Stall Information Metrics #3649

Merged

	name: "container_cpu_psi_total_seconds",
	name: "container_pressure_cpu_seconds_total",

	name: "container_memory_psi_total_seconds",
	name: "container_pressure_memory_seconds_total",

	name: "container_io_psi_total_seconds",
	name: "container_pressure_io_seconds_total",

Support for exposing PSI metrics #3083

Are you sure you want to change the base?

Support for exposing PSI metrics #3083

Conversation

dqminh commented Mar 23, 2022

k8s-ci-robot commented Mar 23, 2022

szuecs commented Nov 4, 2022

bobbypage commented Nov 4, 2022

szuecs commented Nov 14, 2022

SuperQ commented Apr 17, 2023

nmlc commented Aug 23, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ein-stein-chen Oct 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szuecs commented Oct 31, 2023

bobbypage commented Dec 6, 2023

MathieuCesbron commented Feb 23, 2024

dims commented Feb 23, 2024

Choose a reason for hiding this comment

zouyee commented Apr 7, 2024 • edited Loading

zouyee commented Apr 17, 2024

SuperQ commented Apr 17, 2024

szuecs commented Apr 17, 2024

akgoel18 commented Jun 6, 2024

szuecs commented Jun 6, 2024

pacoxu commented Jul 18, 2024 • edited Loading

pacoxu commented Oct 22, 2024

dims commented Oct 22, 2024

haircommander commented Oct 22, 2024

dims commented Oct 22, 2024

alexandremahdhaoui commented Oct 24, 2024

rexagod commented Nov 12, 2024

SuperQ commented Nov 12, 2024

enp0s3 commented Jan 18, 2025

SuperQ commented Jan 18, 2025

xinau commented Jan 20, 2025

rexagod commented Jan 21, 2025 • edited Loading

xinau commented Jan 26, 2025

ein-stein-chen Oct 31, 2023 •

edited

Loading

zouyee commented Apr 7, 2024 •

edited

Loading

pacoxu commented Jul 18, 2024 •

edited

Loading

rexagod commented Jan 21, 2025 •

edited

Loading