-
Notifications
You must be signed in to change notification settings - Fork 303
dcgm-exporter missing metrics for A100 GPU #166
Comments
Hi Anaconda, |
Hi guys - yes, we are working on adding MIG support into |
Any update on this? I am also not seeing I am however seeing the other three metrics mentioned here. This is running with version |
Same issue here with latest versions and all types of GPUs. |
Is this maybe related to #143? |
GPU Machine: A100-PCIE-40GB.
[gpu-monitoring-tools-2.3.1]
I am using latest release of for dcgm-exporter ( 2.1.4-2.3.1-ubuntu18.04).
In prometheus while query executing, I found few missing metrics DCGM_FI_DEV_GPU_UTIL, DCGM_FI_DEV_MEM_COPY_UTIL, DCGM_FI_DEV_ENC_UTIL, DCGM_FI_DEV_DEC_UTIL.
I do see them enabled in default-counters.csv though inside my running pod. Is it a bug or not supporting these metrics for A100 GPU?
I have checked with other GPU Machines (4 Tesla, V100) and everything looks good and able to get all metrics.
Thank you in advance.
The text was updated successfully, but these errors were encountered: