You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[BUG] <title>CUDA error: invalid configuration argument when using stride=2 in sparse convolutions (RTX 40-series GPUs / Ada Lovelace Architecture)
#347
Open
1 task done
shr19976 opened this issue
Mar 5, 2025
· 1 comment
When performing sparse convolution operations (especially with stride=2), a RuntimeError: CUDA error: invalid configuration argument occurs.
The error happens during kernel map generation in torchsparse, specifically at: torchsparse/nn/functional/conv/hash/query.py:48 when calling torch.full().
Key Observations :
stride=1 works, stride=2 fails
GPU Architecture Specificity
Reproduced only on Ada Lovelace (Compute Capability 8.9) GPUs. Untested on other architectures (e.g., Ampere).
3. Asynchronous Error Reporting
Error message mentions possible asynchronous reporting, but CUDA_LAUNCH_BLOCKING=1 does not resolve the issue.
Error Log Full Stack Trace :
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
Traceback (most recent call last):
File "minimal_repro.py", line 16, in
y = conv(x)
...
File ".../torchsparse/nn/functional/conv/hash/query.py", line 48, in convert_transposed_out_in_map
out_in_map_t = torch.full(
^^^^^^^^^^^
RuntimeError: CUDA error: invalid configuration argument
I have installed TorchSparse by running the following command to attempt compatibility between my CUDA and TorchSparse:
** MAX_JOBS=2 TORCH_CUDA_ARCH_LIST="8.9" pip install --upgrade git+https://github.com/mit-han-lab/torchsparse.git **
Is there an existing issue for this?
Current Behavior
Problem Description:
When performing sparse convolution operations (especially with stride=2), a RuntimeError: CUDA error: invalid configuration argument occurs.
The error happens during kernel map generation in torchsparse, specifically at:
torchsparse/nn/functional/conv/hash/query.py:48 when calling torch.full().
Key Observations :
Reproduced only on Ada Lovelace (Compute Capability 8.9) GPUs. Untested on other architectures (e.g., Ampere).
3. Asynchronous Error Reporting
Error message mentions possible asynchronous reporting, but CUDA_LAUNCH_BLOCKING=1 does not resolve the issue.
Error Log
Full Stack Trace :
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.Traceback (most recent call last):
File "minimal_repro.py", line 16, in
y = conv(x)
...
File ".../torchsparse/nn/functional/conv/hash/query.py", line 48, in convert_transposed_out_in_map
out_in_map_t = torch.full(
^^^^^^^^^^^
RuntimeError: CUDA error: invalid configuration argument
Additional Information
Attempted Fixes :
Recompiled torchsparse with explicit GPU architecture:
export TORCH_CUDA_ARCH_LIST="8.9"
pip install --force-reinstall git+https://github.com/mit-han-lab/torchsparse.git
Set CUDA_LAUNCH_BLOCKING=1 and TORCH_USE_CUDA_DSA=1, but no resolution.
Expected Behavior
Questions for Developers
1. Architecture Compatibility
Is TorchSparse officially supported on Ada Lovelace (Compute Capability 8.9)?
Stride Configuration Limitations
Are there known issues with stride=2 in sparse convolutions? Any special parameter requirements?
Debugging Suggestions
How to further diagnose the torch.full() CUDA configuration error?
Environment
Anything else?
Possible Causes
Mismatch Between TorchSparse and PyTorch/CUDA Versions
GPU Compute Capability Not Properly Supported
Lack of Recompilation from Source with Proper Architecture Flags
The text was updated successfully, but these errors were encountered: