Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] <title>CUDA error: invalid configuration argument when using stride=2 in sparse convolutions (RTX 40-series GPUs / Ada Lovelace Architecture) #347

Open
1 task done
shr19976 opened this issue Mar 5, 2025 · 1 comment

Comments

@shr19976
Copy link

shr19976 commented Mar 5, 2025

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Problem Description

When performing sparse convolution operations (especially with stride=2), a RuntimeError: CUDA error: invalid configuration argument occurs.
The error happens during kernel map generation in torchsparse, specifically at:
torchsparse/nn/functional/conv/hash/query.py:48 when calling torch.full().

Key Observations

  1. stride=1 works, stride=2 fails​
  2. ​GPU Architecture Specificity​
    Reproduced only on Ada Lovelace (Compute Capability 8.9) GPUs. Untested on other architectures (e.g., Ampere).
    ​3. Asynchronous Error Reporting​
    Error message mentions possible asynchronous reporting, but CUDA_LAUNCH_BLOCKING=1 does not resolve the issue.

Error Log
Full Stack Trace :
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Traceback (most recent call last):
File "minimal_repro.py", line 16, in
y = conv(x)
...
File ".../torchsparse/nn/functional/conv/hash/query.py", line 48, in convert_transposed_out_in_map
out_in_map_t = torch.full(
^^^^^^^^^^^
RuntimeError: CUDA error: invalid configuration argument

​Additional Information
​Attempted Fixes :

Recompiled torchsparse with explicit GPU architecture:
export TORCH_CUDA_ARCH_LIST="8.9"
pip install --force-reinstall git+https://github.com/mit-han-lab/torchsparse.git

Set CUDA_LAUNCH_BLOCKING=1 and TORCH_USE_CUDA_DSA=1, but no resolution.

Expected Behavior

Questions for Developers
​1. Architecture Compatibility​
Is TorchSparse officially supported on Ada Lovelace (Compute Capability 8.9)?

  1. ​Stride Configuration Limitations​
    Are there known issues with stride=2 in sparse convolutions? Any special parameter requirements?

  2. ​Debugging Suggestions​
    How to further diagnose the torch.full() CUDA configuration error?

Environment

- GCC: 11.4.0
- NVCC:11.8.89
- PyTorch:2.2.0
- PyTorch CUDA:11.8
- TorchSparse:2.1.0
- GPU: NVIDIA GeForce RTX 4070 Ti SUPER (Compute Capability 8.9)
- Python: 3.11.11
- OS: WSL-Ubuntu 22.04

Anything else?

Possible Causes

  1. Mismatch Between TorchSparse and PyTorch/CUDA Versions

  2. GPU Compute Capability Not Properly Supported

  3. Lack of Recompilation from Source with Proper Architecture Flags

@shr19976
Copy link
Author

shr19976 commented Mar 5, 2025

I have installed TorchSparse by running the following command to attempt compatibility between my CUDA and TorchSparse:
** MAX_JOBS=2 TORCH_CUDA_ARCH_LIST="8.9" pip install --upgrade git+https://github.com/mit-han-lab/torchsparse.git **

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant