-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows R560 PTX Jit errors - invalid argument
in cub radix sort
#1253
Comments
Compiling for SM 89 allowed the pedestrian navigation example to run on Ada correctly.
FLAMEGPU/FLAMEGPU2-pedestrian_navigation-example#7 (comment) |
Unable to reproduce with CUDA 12.6 compiling for SM 70 under linux. The PTX JITer is part of the CUDA driver rather than cuda toolkit which may be relevant / make it os specific. Compute sanitizer found no issues under linux for the Edit: Valgrind found no issues on the host either (in case of heap/stack corruption), but I didn't expect it to given it's behaving and appears to be PTX JIT related. May be worth tryinng Could try and extract the JIT generated SASS to compare to the nvcc generated sass? Though a much smaller test case/ reproduce would make that more feasible (if it reproduces which is uncertain). Could also diff sass generated from ptx on linux and windows to compare Otherwise rolling back the CUDA driver might be an (time consuming) option to try and narrow this down futher to then report it upstream (if a driver problem not an us problem) |
After downgrading then incrementally upgrading the NVIDIA driver on my machine, it does appear to be caused by CUDA 12.6 compatible drivers (560.76 & 560.94 at least) 12.4After downgrading to CUDA 12.4's driver, $ nvidia-smi.exe
Thu Nov 21 20:53:35 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.61 Driver Version: 551.61 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+ $ ./build-70-cu124/bin/Debug/tests.exe --gtest_filter="TestMessage_Array.arrayMessageReorderMemoryLarge"
Running main() from C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\tests\helpers\main.cu
Note: Google Test filter = TestMessage_Array.arrayMessageReorderMemoryLarge
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TestMessage_Array
[ RUN ] TestMessage_Array.arrayMessageReorderMemoryLarge
[ OK ] TestMessage_Array.arrayMessageReorderMemoryLarge (220280 ms)
[----------] 1 test from TestMessage_Array (220281 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (220282 ms total)
[ PASSED ] 1 test. 12.5Using CUDA 12.5's driver, the tests continue to pass $ nvidia-smi.exe
Thu Nov 21 21:29:28 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.85 Driver Version: 555.85 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+ $ ./build-70-cu124/bin/Debug/tests.exe --gtest_filter="TestMessage_Array.arrayMessageReorderMemoryLarge"
...
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (230094 ms total)
[ PASSED ] 1 test.
## CUDA 12.6.0
With the CUDA 12.6.0 driver, `560.76`, the tests once again fail when embedding PTX for SM 70, but executing on an SM86 device.
```console
$ nvidia-smi.exe
Thu Nov 21 21:52:11 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.76 Driver Version: 560.76 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+ $ ./build-70-cu125/bin/Debug/tests.exe --gtest_filter="TestMessage_Array.arrayMessageReorderMemoryLarge"
Running main() from C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\tests\helpers\main.cu
Note: Google Test filter = TestMessage_Array.arrayMessageReorderMemoryLarge
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TestMessage_Array
[ RUN ] TestMessage_Array.arrayMessageReorderMemoryLarge
CUDA error 1 [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub/device/dispatch/dispatch_radix_sort.cuh, 1970]: invalid argument
CUDA error 1 [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub/device/dispatch/dispatch_radix_sort.cuh, 2410]: invalid argument
C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\include\flamegpu/simulation/detail/CUDAErrorChecking.cuh(28): CUDA Error: C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\src\flamegpu\simulation\detail\CUDAAgent.cu(214): cudaErrorInvalidValue invalid argument
unknown file: error: C++ exception with description "C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\include\flamegpu/simulation/detail/CUDAErrorChecking.cuh(28): CUDA Error: C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\src\flamegpu\simulation\detail\CUDAAgent.cu(214): cudaErrorInvalidValue invalid argument" thrown in the test body.
[ FAILED ] TestMessage_Array.arrayMessageReorderMemoryLarge (224887 ms)
[----------] 1 test from TestMessage_Array (224888 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (224889 ms total)
[ PASSED ] 0 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] TestMessage_Array.arrayMessageReorderMemoryLarge
1 FAILED TEST
|
This appears to have been resolved in the Driver which ships with CUDA 12.6 Update 3, $ nvidia-smi.exe
Thu Nov 21 22:27:15 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 561.17 Driver Version: 561.17 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+ $ ./build-70-cu125/bin/Debug/tests.exe --gtest_filter="TestMessage_Array.arrayMessageReorderMemoryLarge"
Running main() from C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\tests\helpers\main.cu
Note: Google Test filter = TestMessage_Array.arrayMessageReorderMemoryLarge
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TestMessage_Array
[ RUN ] TestMessage_Array.arrayMessageReorderMemoryLarge $ ./build-70-cu124/bin/Release/tests.exe
...
[----------] Global test environment tear-down
[==========] 1126 tests from 87 test suites ran. (200626 ms total)
[ PASSED ] 1126 tests.
YOU HAVE 40 DISABLED TESTS |
invalid argument
in cub radix sort
@gubbsjuk - I've narrowed down the errors I could trigger to be caused by the CUDA driver's PTX Jitter, but it appears to have been fixed in the 561 Driver which ships with CUDA 12.6 Update 3 released in the last few days. If/when you next update your CUDA installation / nvidia driver on your RTX 3500 Ada machine, if you could build the Pedestrian Navigation example with the incorrect CUDA architecture specified and confirm if you encounter the runtime errors or not that would be very helpful. i.e. the following or similar via GUIs cd path\to\FLAMEGPU2-pedestrian_navigation-example
cmake -S . -B build-sm70 -DCMAKE_CUDA_ARCHITECTURES=70 -DFLAMEGPU_VISUALISATION=ON
cmake --build build-sm70 --config Release
.\build-sm70\bin\Release\pedestrian_navigation.exe -i map.xml -s 0 |
@ptheywood - Gotcha. Will get on it and report back once done. For reference. Current Nvidia-driver:
|
@ptheywood No runtime errors.
|
@gubbsjuk Thanks for confirming this for us. I'll close this issue now, and edit the original post so we can redirect anyone else who encounters these issues here. |
tldr
On Windows, with 560.xx drivers, (shipping with CUDA 12.6, 12.6 Update 1 and 12.6 Update 2),
invalid argument
errors within CUB'sdispatch_radix_sort.cuh
would occur at runtime when the binary did not contain SM 80 binary, and the code was executed on an SM 86 or 89 device.I.e. something was going wrong when JITing the SM 70 (or lower) PTX into executable code for an SM 8x device.
This appears to have been resolved in the 561.17 driver which ships with CUDA 12.6 Update 3, confirmed on multiple systems.
Alternatively, the errors do not seem to occur if the correct compute capabilities are targetted.
Original content
invalid argument
errors are being encountered indispatch_radix_sort.cuh
in some cases.This was first highlighted by @gubbsjuk in FLAMEGPU/FLAMEGPU2-pedestrian_navigation-example#7 (comment), using the Pedestrina Navigation Example, under Windows using CUDA 12.6 and the included Thrust/Cub 2.5.0, executing on a RTX 3500 Ada GPU (SM_89).
I attempted to reproduce this on under windows with CUDA 12.6 on my 3060ti (SM_86), compiled for SM_86 and was unsuccessfull.
I was however able to encounter errors within
dispatch_radix_sort.cuh
when compiling for SM_50 or SM_70, with PTX embedded, resulting in PTX JITing to SM80/86, using the FLAME GPU 2 test suiteAttaching a debugger to
TestMessage_Array.arrayMessageReorderMemoryLarge
showed the kernel launch arguments from cub all looked fine, pointers all appeared to be valid ranges etc, launched 72k threads for a 64k element sort.Compute sanitizer with memcheck did not highlight any memory errors
The text was updated successfully, but these errors were encountered: