Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Utilization not being displayed by dcgm-exporter.service #906

Closed
avolkov1 opened this issue Mar 12, 2021 · 2 comments
Closed

GPU Utilization not being displayed by dcgm-exporter.service #906

avolkov1 opened this issue Mar 12, 2021 · 2 comments
Assignees

Comments

@avolkov1
Copy link
Contributor

In the grafana dashboard everything seems to display except for the GPU utilization. Seems the latest dcgm exporter broke GPU utilization export. It does not display GPU utilization properly on DGX A100 (DGX OS 5 i.e. Ubuntu 20.04LTS) nor on DGX-1 (P100) running RHEL8.3. I'm suspecting it's across the board.

I got the GPU utilization to show up if I downgrade the dcgm-exporter container i.e. instead of using the latest nvidia/dcgm-exporter container I use version "2.1.4-2.2.0" or older.

Instead of:

nvidia_dcgm_container: "nvidia/dcgm-exporter"

Example to fix:

nvidia_dcgm_container: "nvidia/dcgm-exporter:2.1.4-2.2.0-ubuntu18.04"
nvidia_dcgm_prom_dir: "/run/prometheus"
nvidia_dcgm_svc_name: "docker.dcgm-exporter.service"
nvidia_dcgm_state: started
nvidia_dcgm_enabled: yes

prometheus_config_dir: /etc/prometheus
prometheus_cfg_endpoint_dir: "{{ prometheus_config_dir }}/endpoints"
nvidia_dcgm_exporter_conf_template: "dcgm-exporter.yml.j2"

has_gpus: false

nvidia_dcgm_max_cpu: "0.5"
@mathrock74
Copy link

see this discussion NVIDIA/gpu-monitoring-tools#143
there seem to be some changes in default metrics configuration:

"Some metrics, previously enabled by default, are deprecated and should be replaced with new ones. For example, DCGM_FI_DEV_GPU_UTIL should be replaced with DCGM_FI_PROF_GR_ENGINE_ACTIVE, or DCGM_FI_PROF_SM_ACTIVE or DCGM_FI_PROF_SM_OCCUPANCY, based on your needs; "

@avolkov1
Copy link
Contributor Author

Resolved by PR #941.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants