[BUG] <title>CUDA error: invalid configuration argument when using stride=2 in sparse convolutions (RTX 40-series GPUs / Ada Lovelace Architecture) #347

shr19976 · 2025-03-05T10:30:23Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

Problem Description：

When performing sparse convolution operations (especially with stride=2), a RuntimeError: CUDA error: invalid configuration argument occurs.
The error happens during kernel map generation in torchsparse, specifically at:
torchsparse/nn/functional/conv/hash/query.py:48 when calling torch.full().

Key Observations ：

stride=1 works, stride=2 fails
GPU Architecture Specificity
Reproduced only on Ada Lovelace (Compute Capability 8.9) GPUs. Untested on other architectures (e.g., Ampere).
3. Asynchronous Error Reporting
Error message mentions possible asynchronous reporting, but CUDA_LAUNCH_BLOCKING=1 does not resolve the issue.

Error Log
Full Stack Trace :
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Traceback (most recent call last):
File "minimal_repro.py", line 16, in
y = conv(x)
...
File ".../torchsparse/nn/functional/conv/hash/query.py", line 48, in convert_transposed_out_in_map
out_in_map_t = torch.full(
^^^^^^^^^^^
RuntimeError: CUDA error: invalid configuration argument

Additional Information
Attempted Fixes ：

Recompiled torchsparse with explicit GPU architecture:
export TORCH_CUDA_ARCH_LIST="8.9"
pip install --force-reinstall git+https://github.com/mit-han-lab/torchsparse.git

Set CUDA_LAUNCH_BLOCKING=1 and TORCH_USE_CUDA_DSA=1, but no resolution.

Expected Behavior

Questions for Developers
1. Architecture Compatibility
Is TorchSparse officially supported on Ada Lovelace (Compute Capability 8.9)?

Stride Configuration Limitations
Are there known issues with stride=2 in sparse convolutions? Any special parameter requirements?
Debugging Suggestions
How to further diagnose the torch.full() CUDA configuration error?

Environment

- GCC: 11.4.0
- NVCC:11.8.89
- PyTorch:2.2.0
- PyTorch CUDA:11.8
- TorchSparse:2.1.0
- GPU: NVIDIA GeForce RTX 4070 Ti SUPER (Compute Capability 8.9)
- Python: 3.11.11
- OS: WSL-Ubuntu 22.04

Anything else?

Possible Causes

Mismatch Between TorchSparse and PyTorch/CUDA Versions
GPU Compute Capability Not Properly Supported
Lack of Recompilation from Source with Proper Architecture Flags

The text was updated successfully, but these errors were encountered:

shr19976 · 2025-03-05T16:16:04Z

I have installed TorchSparse by running the following command to attempt compatibility between my CUDA and TorchSparse:
** MAX_JOBS=2 TORCH_CUDA_ARCH_LIST="8.9" pip install --upgrade git+https://github.com/mit-han-lab/torchsparse.git **

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] <title>CUDA error: invalid configuration argument when using stride=2 in sparse convolutions (RTX 40-series GPUs / Ada Lovelace Architecture) #347

[BUG] <title>CUDA error: invalid configuration argument when using stride=2 in sparse convolutions (RTX 40-series GPUs / Ada Lovelace Architecture) #347

shr19976 commented Mar 5, 2025

shr19976 commented Mar 5, 2025 •

edited

Loading

[BUG] <title>CUDA error: invalid configuration argument when using stride=2 in sparse convolutions (RTX 40-series GPUs / Ada Lovelace Architecture) #347

[BUG] <title>CUDA error: invalid configuration argument when using stride=2 in sparse convolutions (RTX 40-series GPUs / Ada Lovelace Architecture) #347

Comments

shr19976 commented Mar 5, 2025

Is there an existing issue for this?

Current Behavior

Expected Behavior

Environment

Anything else?

shr19976 commented Mar 5, 2025 • edited Loading

shr19976 commented Mar 5, 2025 •

edited

Loading