You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the grafana dashboard everything seems to display except for the GPU utilization. Seems the latest dcgm exporter broke GPU utilization export. It does not display GPU utilization properly on DGX A100 (DGX OS 5 i.e. Ubuntu 20.04LTS) nor on DGX-1 (P100) running RHEL8.3. I'm suspecting it's across the board.
I got the GPU utilization to show up if I downgrade the dcgm-exporter container i.e. instead of using the latest nvidia/dcgm-exporter container I use version "2.1.4-2.2.0" or older.
"Some metrics, previously enabled by default, are deprecated and should be replaced with new ones. For example, DCGM_FI_DEV_GPU_UTIL should be replaced with DCGM_FI_PROF_GR_ENGINE_ACTIVE, or DCGM_FI_PROF_SM_ACTIVE or DCGM_FI_PROF_SM_OCCUPANCY, based on your needs; "
In the grafana dashboard everything seems to display except for the GPU utilization. Seems the latest dcgm exporter broke GPU utilization export. It does not display GPU utilization properly on DGX A100 (DGX OS 5 i.e. Ubuntu 20.04LTS) nor on DGX-1 (P100) running RHEL8.3. I'm suspecting it's across the board.
I got the GPU utilization to show up if I downgrade the dcgm-exporter container i.e. instead of using the latest
nvidia/dcgm-exporter
container I use version "2.1.4-2.2.0" or older.Instead of:
deepops/roles/nvidia-dcgm-exporter/defaults/main.yml
Line 1 in f08cab4
Example to fix:
The text was updated successfully, but these errors were encountered: