Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows R560 PTX Jit errors - invalid argument in cub radix sort #1253

Closed
ptheywood opened this issue Nov 21, 2024 · 8 comments
Closed

Windows R560 PTX Jit errors - invalid argument in cub radix sort #1253

ptheywood opened this issue Nov 21, 2024 · 8 comments

Comments

@ptheywood
Copy link
Member

ptheywood commented Nov 21, 2024

tldr

On Windows, with 560.xx drivers, (shipping with CUDA 12.6, 12.6 Update 1 and 12.6 Update 2), invalid argument errors within CUB's dispatch_radix_sort.cuh would occur at runtime when the binary did not contain SM 80 binary, and the code was executed on an SM 86 or 89 device.

I.e. something was going wrong when JITing the SM 70 (or lower) PTX into executable code for an SM 8x device.

This appears to have been resolved in the 561.17 driver which ships with CUDA 12.6 Update 3, confirmed on multiple systems.

Alternatively, the errors do not seem to occur if the correct compute capabilities are targetted.


Original content

invalid argument errors are being encountered in dispatch_radix_sort.cuh in some cases.

This was first highlighted by @gubbsjuk in FLAMEGPU/FLAMEGPU2-pedestrian_navigation-example#7 (comment), using the Pedestrina Navigation Example, under Windows using CUDA 12.6 and the included Thrust/Cub 2.5.0, executing on a RTX 3500 Ada GPU (SM_89).

I attempted to reproduce this on under windows with CUDA 12.6 on my 3060ti (SM_86), compiled for SM_86 and was unsuccessfull.

I was however able to encounter errors within dispatch_radix_sort.cuh when compiling for SM_50 or SM_70, with PTX embedded, resulting in PTX JITing to SM80/86, using the FLAME GPU 2 test suite

CUDA error 1 [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include\cub/device/dispatch/dispatch_radix_sort.cuh, 2323]: invalid argument
CUDA error 1 [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include\cub/device/dispatch/dispatch_radix_sort.cuh, 2742]: invalid argument
C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\include\flamegpu/simulation/detail/CUDAErrorChecking.cuh(28): CUDA Error: C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\src\flamegpu\simulation\detail\CUDAAgent.cu(214): cudaErrorInvalidValue invalid argument
unknown file: error: C++ exception with description "C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\include\flamegpu/simulation/detail/CUDAErrorChecking.cuh(28): CUDA Error: C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\src\flamegpu\simulation\detail\CUDAAgent.cu(214): cudaErrorInvalidValue invalid argument" thrown in the test body.

[  FAILED  ] 18 tests, listed below:
[  FAILED  ] MultiThreadDeviceTest.SameModelSeperateThread_Agent
[  FAILED  ] MultiThreadDeviceTest.SameModelSeperateThread_Message
[  FAILED  ] MultiThreadDeviceTest.SameModelSeperateThread_Environment
[  FAILED  ] MultiThreadDeviceTest.SameModelSeperateThread_AgentFunctionCondition
[  FAILED  ] MultiThreadDeviceTest.SameModelMultiDevice_Agent
[  FAILED  ] MultiThreadDeviceTest.SameModelMultiDevice_Message
[  FAILED  ] MultiThreadDeviceTest.SameModelMultiDevice_Environment
[  FAILED  ] MultiThreadDeviceTest.SameModelMultiDevice_AgentFunctionCondition
[  FAILED  ] RTCMultiThreadDeviceTest.SameModelMultiDevice_Message
[  FAILED  ] RTCMultiThreadDeviceTest.SameModelMultiDevice_Environment
[  FAILED  ] RTCMultiThreadDeviceTest.SameModelMultiDevice_AgentFunctionCondition
[  FAILED  ] Spatial3DMessageTest.Wrapped
[  FAILED  ] Spatial3DMessageTest.Wrapped2
[  FAILED  ] Spatial3DMessageTest.Wrapped3
[  FAILED  ] Spatial3DMessageTest.Wrapped_OutOfBounds
[  FAILED  ] TestMessage_Array.arrayMessageReorderMemoryLarge
[  FAILED  ] TestMessage_Array2D.arrayMessageReorderMemoryLarge
[  FAILED  ] TestMessage_Array3D.arrayMessageReorderMemoryLarge

Attaching a debugger to TestMessage_Array.arrayMessageReorderMemoryLarge showed the kernel launch arguments from cub all looked fine, pointers all appeared to be valid ranges etc, launched 72k threads for a 64k element sort.

Compute sanitizer with memcheck did not highlight any memory errors

$ /c/Program\ Files/NVIDIA\ GPU\ Computing\ Toolkit/CUDA/v12.4/compute-sanitizer/compute-sanitizer.exe --tool memcheck ./build-70-cu124/bin/Debug/tests.exe --gtest_filter="TestMessage_Array.arrayMessageReorderMemoryLarge"
========= COMPUTE-SANITIZER
Running main() from C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\tests\helpers\main.cu
Note: Google Test filter = TestMessage_Array.arrayMessageReorderMemoryLarge
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TestMessage_Array
[ RUN      ] TestMessage_Array.arrayMessageReorderMemoryLarge
========= Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaLaunchKernel.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:cuProfilerStop [0x7ffe39e9c1e5]
=========                in C:\WINDOWS\system32\DriverStore\FileRepository\nv_dispi.inf_amd64_78cd02ab022cd554\nvcuda64.dll
=========     Host Frame:cudaLaunchKernel [0x7ffea5f8d22f]
=========                in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\cudart64_12.dll
=========     Host Frame:cudaLaunchKernel<char> in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_runtime.h:217 [0xdc5c0b]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:__device_stub__ZN3cub17CUB_200301_700_NS29DeviceRadixSortOnesweepKernelINS0_21DeviceRadixSortPolicyIjNS0_8NullTypeEjE9Policy900ELb0EjS3_jiiNS0_6detail21identity_decomposer_tEEEvPT5_S9_PT3_PKSA_PT1_PKSE_PT2_PKSI_T4_iiT6_ in C:\Users\ptheywood\AppData\Local\Temp\tmpxft_0000bf8c_00000000-7_CUDAAgent.cudafe1.stub.c:202 [0xdc47d2]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:cub::CUB_200301_700_NS::__wrapper__device_stub_DeviceRadixSortOnesweepKernel<cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>::Policy900,0,unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int,int,int,cub::CUB_200301_700_NS::detail::identity_decomposer_t> in C:\Users\ptheywood\AppData\Local\Temp\tmpxft_0000bf8c_00000000-7_CUDAAgent.cudafe1.stub.c:206 [0xdc609b]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:cub::CUB_200301_700_NS::DeviceRadixSortOnesweepKernel<cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>::Policy900,0,unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int,int,int,cub::CUB_200301_700_NS::detail::identity_decomposer_t> in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\FLAMEGPU\__nv_cuda_kernel_impl:27 [0xdcabf0] 
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:thrust::THRUST_200301_700_NS::cuda_cub::launcher::triple_chevron::doit_host<void (__cdecl*)(int *,int *,unsigned int *,unsigned int const *,unsigned int *,unsigned int const *,cub::CUB_200301_700_NS::NullType *,cub::CUB_200301_700_NS::NullType const *,int,int,int,cub::CUB_200301_700_NS::detail::identity_decomposer_t),int *,int *,unsigned int *,unsigned int *,unsigned int *,unsigned int *,cub::CUB_200301_700_NS::NullType *,cub::CUB_200301_700_NS::NullType *,int,int,int,cub::CUB_200301_700_NS::detail::identity_decomposer_t> in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\thrust\system\cuda\detail\core\triple_chevron_launch.h:70 [0xddcf8b]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:thrust::THRUST_200301_700_NS::cuda_cub::launcher::triple_chevron::doit<void (__cdecl*)(int *,int *,unsigned int *,unsigned int const *,unsigned int *,unsigned int const *,cub::CUB_200301_700_NS::NullType *,cub::CUB_200301_700_NS::NullType const *,int,int,int,cub::CUB_200301_700_NS::detail::identity_decomposer_t),int *,int *,unsigned int *,unsigned int *,unsigned int *,unsigned int *,cub::CUB_200301_700_NS::NullType *,cub::CUB_200301_700_NS::NullType *,int,int,int,cub::CUB_200301_700_NS::detail::identity_decomposer_t> in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\thrust\system\cuda\detail\core\triple_chevron_launch.h:153 [0xddcc90]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:cub::CUB_200301_700_NS::DispatchRadixSort<0,unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>,cub::CUB_200301_700_NS::detail::identity_decomposer_t>::InvokeOnesweep<cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>::Policy800> in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub\device\dispatch\dispatch_radix_sort.cuh:1959 [0xdd292e]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:cub::CUB_200301_700_NS::DispatchRadixSort<0,unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>,cub::CUB_200301_700_NS::detail::identity_decomposer_t>::InvokeManyTiles<cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>::Policy800> in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub\device\dispatch\dispatch_radix_sort.cuh:2227 [0xdcc748]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:cub::CUB_200301_700_NS::DispatchRadixSort<0,unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>,cub::CUB_200301_700_NS::detail::identity_decomposer_t>::Invoke<cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>::Policy800> in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub\device\dispatch\dispatch_radix_sort.cuh:2334 [0xdcc412]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:cub::CUB_200301_700_NS::ChainedPolicy<800,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>::Policy800,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>::Policy700>::Invoke<cub::CUB_200301_700_NS::DispatchRadixSort<0,unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>,cub::CUB_200301_700_NS::detail::identity_decomposer_t> > in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub\util_device.cuh:677 [0xdcbb42]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:cub::CUB_200301_700_NS::ChainedPolicy<900,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>::Policy900,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>::Policy800>::Invoke<cub::CUB_200301_700_NS::DispatchRadixSort<0,unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>,cub::CUB_200301_700_NS::detail::identity_decomposer_t> > in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub\util_device.cuh:674 [0xdcbb86]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:cub::CUB_200301_700_NS::DispatchRadixSort<0,unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>,cub::CUB_200301_700_NS::detail::identity_decomposer_t>::Dispatch in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub\device\dispatch\dispatch_radix_sort.cuh:2412 [0xde607c]

=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:cub::CUB_200301_700_NS::DeviceRadixSort::SortKeys<unsigned int,unsigned int> in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub\device\device_radix_sort.cuh:2148 [0xdd84e7]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:flamegpu::detail::CUDAAgent::validateIDCollisions in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\src\flamegpu\simulation\detail\CUDAAgent.cu:214 [0xdc24a9]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:flamegpu::detail::CUDAAgent::setPopulationData in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\src\flamegpu\simulation\detail\CUDAAgent.cu:154 [0xdbcc92]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:flamegpu::CUDASimulation::setPopulationData in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\src\flamegpu\simulation\CUDASimulation.cu:1400 [0xc0d802]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:flamegpu::test_message_array::test_arrayMessageReorderError in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\tests\test_cases\runtime\messaging\test_array.cu:1128 [0xb05827]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:flamegpu::test_message_array::TestMessage_Array_arrayMessageReorderMemoryLarge_Test::TestBody in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\tests\test_cases\runtime\messaging\test_array.cu:1147 [0xb05ac4]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test,void> in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:2605 [0xf5620d]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::internal::HandleExceptionsInMethodIfSupported<testing::Test,void> in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:2648 [0xf55e43]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::Test::Run in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:2687 [0xf277bc]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::TestInfo::Run in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:2836 [0xf283db]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::TestSuite::Run in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:3015 [0xf28d51]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::internal::UnitTestImpl::RunAllTests in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:5920 [0xf2f55b]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,bool> in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:2605 [0xf562cd]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,bool> in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:2648 [0xf56153]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::UnitTest::Run in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:5484 [0xf29565]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:RUN_ALL_TESTS in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\include\gtest\gtest.h:2317 [0xbebd23]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:main in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\tests\helpers\main.cu:26 [0xbeb640]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:invoke_main in D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:79 [0xfe5c69]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:__scrt_common_main_seh in D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288 [0xfe5b52]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:__scrt_common_main in D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:331 [0xfe5a0e]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:mainCRTStartup in D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp:17 [0xfe5cfe]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:BaseThreadInitThunk [0x7ffecbf1257d]
=========                in C:\WINDOWS\System32\KERNEL32.DLL
=========     Host Frame:RtlUserThreadStart [0x7ffecc6caf08]
=========                in C:\WINDOWS\SYSTEM32\ntdll.dll
=========
========= Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaPeekAtLastError.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:cuProfilerStop [0x7ffe39e9c1e5]
=========                in C:\WINDOWS\system32\DriverStore\FileRepository\nv_dispi.inf_amd64_78cd02ab022cd554\nvcuda64.dll
=========     Host Frame:cudaPeekAtLastError [0x7ffea5f97d59]
=========                in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\cudart64_12.dll
=========     Host Frame:thrust::THRUST_200301_700_NS::cuda_cub::launcher::triple_chevron::doit_host<void (__cdecl*)(int *,int *,unsigned int *,unsigned int const *,unsigned int *,unsigned int const *,cub::CUB_200301_700_NS::NullType *,cub::CUB_200301_700_NS::NullType const *,int,int,int,cub::CUB_200301_700_NS::detail::identity_decomposer_t),int *,int *,unsigned int *,unsigned int *,unsigned int *,unsigned int *,cub::CUB_200301_700_NS::NullType *,cub::CUB_200301_700_NS::NullType *,int,int,int,cub::CUB_200301_700_NS::detail::identity_decomposer_t> in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\thrust\system\cuda\detail\core\triple_chevron_launch.h:72 [0xddcf91]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:thrust::THRUST_200301_700_NS::cuda_cub::launcher::triple_chevron::doit<void (__cdecl*)(int *,int *,unsigned int *,unsigned int const *,unsigned int *,unsigned int const *,cub::CUB_200301_700_NS::NullType *,cub::CUB_200301_700_NS::NullType const *,int,int,int,cub::CUB_200301_700_NS::detail::identity_decomposer_t),int *,int *,unsigned int *,unsigned int *,unsigned int *,unsigned int *,cub::CUB_200301_700_NS::NullType *,cub::CUB_200301_700_NS::NullType *,int,int,int,cub::CUB_200301_700_NS::detail::identity_decomposer_t> in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\thrust\system\cuda\detail\core\triple_chevron_launch.h:153 [0xddcc90]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:cub::CUB_200301_700_NS::DispatchRadixSort<0,unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>,cub::CUB_200301_700_NS::detail::identity_decomposer_t>::InvokeOnesweep<cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>::Policy800> in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub\device\dispatch\dispatch_radix_sort.cuh:1959 [0xdd292e]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:cub::CUB_200301_700_NS::DispatchRadixSort<0,unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>,cub::CUB_200301_700_NS::detail::identity_decomposer_t>::InvokeManyTiles<cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>::Policy800> in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub\device\dispatch\dispatch_radix_sort.cuh:2227 [0xdcc748]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:cub::CUB_200301_700_NS::DispatchRadixSort<0,unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>,cub::CUB_200301_700_NS::detail::identity_decomposer_t>::Invoke<cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>::Policy800> in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub\device\dispatch\dispatch_radix_sort.cuh:2334 [0xdcc412]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:cub::CUB_200301_700_NS::ChainedPolicy<800,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>::Policy800,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>::Policy700>::Invoke<cub::CUB_200301_700_NS::DispatchRadixSort<0,unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>,cub::CUB_200301_700_NS::detail::identity_decomposer_t> > in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub\util_device.cuh:677 [0xdcbb42]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:cub::CUB_200301_700_NS::ChainedPolicy<900,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>::Policy900,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>::Policy800>::Invoke<cub::CUB_200301_700_NS::DispatchRadixSort<0,unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>,cub::CUB_200301_700_NS::detail::identity_decomposer_t> > in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub\util_device.cuh:674 [0xdcbb86]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:cub::CUB_200301_700_NS::DispatchRadixSort<0,unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>,cub::CUB_200301_700_NS::detail::identity_decomposer_t>::Dispatch in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub\device\dispatch\dispatch_radix_sort.cuh:2412 [0xde607c]

=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:cub::CUB_200301_700_NS::DeviceRadixSort::SortKeys<unsigned int,unsigned int> in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub\device\device_radix_sort.cuh:2148 [0xdd84e7]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:flamegpu::detail::CUDAAgent::validateIDCollisions in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\src\flamegpu\simulation\detail\CUDAAgent.cu:214 [0xdc24a9]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:flamegpu::detail::CUDAAgent::setPopulationData in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\src\flamegpu\simulation\detail\CUDAAgent.cu:154 [0xdbcc92]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:flamegpu::CUDASimulation::setPopulationData in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\src\flamegpu\simulation\CUDASimulation.cu:1400 [0xc0d802]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:flamegpu::test_message_array::test_arrayMessageReorderError in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\tests\test_cases\runtime\messaging\test_array.cu:1128 [0xb05827]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:flamegpu::test_message_array::TestMessage_Array_arrayMessageReorderMemoryLarge_Test::TestBody in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\tests\test_cases\runtime\messaging\test_array.cu:1147 [0xb05ac4]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test,void> in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:2605 [0xf5620d]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::internal::HandleExceptionsInMethodIfSupported<testing::Test,void> in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:2648 [0xf55e43]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::Test::Run in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:2687 [0xf277bc]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::TestInfo::Run in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:2836 [0xf283db]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::TestSuite::Run in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:3015 [0xf28d51]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::internal::UnitTestImpl::RunAllTests in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:5920 [0xf2f55b]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,bool> in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:2605 [0xf562cd]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,bool> in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:2648 [0xf56153]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::UnitTest::Run in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:5484 [0xf29565]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:RUN_ALL_TESTS in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\include\gtest\gtest.h:2317 [0xbebd23]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:main in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\tests\helpers\main.cu:26 [0xbeb640]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:invoke_main in D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:79 [0xfe5c69]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:__scrt_common_main_seh in D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288 [0xfe5b52]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:__scrt_common_main in D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:331 [0xfe5a0e]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:mainCRTStartup in D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp:17 [0xfe5cfe]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:BaseThreadInitThunk [0x7ffecbf1257d]
=========                in C:\WINDOWS\System32\KERNEL32.DLL
=========     Host Frame:RtlUserThreadStart [0x7ffecc6caf08]
=========                in C:\WINDOWS\SYSTEM32\ntdll.dll
=========
========= Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaGetLastError.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:cuProfilerStop [0x7ffe39e9c1e5]
=========                in C:\WINDOWS\system32\DriverStore\FileRepository\nv_dispi.inf_amd64_78cd02ab022cd554\nvcuda64.dll
=========     Host Frame:cudaGetLastError [0x7ffea5f819c9]
=========                in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\cudart64_12.dll
=========     Host Frame:cub::CUB_200301_700_NS::Debug in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub\util_debug.cuh:205 [0x306080]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:cub::CUB_200301_700_NS::DispatchRadixSort<0,unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>,cub::CUB_200301_700_NS::detail::identity_decomposer_t>::InvokeOnesweep<cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>::Policy800> in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub\device\dispatch\dispatch_radix_sort.cuh:1972 [0xdd294e]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:cub::CUB_200301_700_NS::DispatchRadixSort<0,unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>,cub::CUB_200301_700_NS::detail::identity_decomposer_t>::InvokeManyTiles<cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>::Policy800> in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub\device\dispatch\dispatch_radix_sort.cuh:2227 [0xdcc748]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:cub::CUB_200301_700_NS::DispatchRadixSort<0,unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>,cub::CUB_200301_700_NS::detail::identity_decomposer_t>::Invoke<cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>::Policy800> in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub\device\dispatch\dispatch_radix_sort.cuh:2334 [0xdcc412]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:cub::CUB_200301_700_NS::ChainedPolicy<800,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>::Policy800,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>::Policy700>::Invoke<cub::CUB_200301_700_NS::DispatchRadixSort<0,unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>,cub::CUB_200301_700_NS::detail::identity_decomposer_t> > in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub\util_device.cuh:677 [0xdcbb42]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:cub::CUB_200301_700_NS::ChainedPolicy<900,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>::Policy900,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>::Policy800>::Invoke<cub::CUB_200301_700_NS::DispatchRadixSort<0,unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>,cub::CUB_200301_700_NS::detail::identity_decomposer_t> > in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub\util_device.cuh:674 [0xdcbb86]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:cub::CUB_200301_700_NS::DispatchRadixSort<0,unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int,cub::CUB_200301_700_NS::DeviceRadixSortPolicy<unsigned int,cub::CUB_200301_700_NS::NullType,unsigned int>,cub::CUB_200301_700_NS::detail::identity_decomposer_t>::Dispatch in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub\device\dispatch\dispatch_radix_sort.cuh:2412 [0xde607c]

=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:cub::CUB_200301_700_NS::DeviceRadixSort::SortKeys<unsigned int,unsigned int> in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub\device\device_radix_sort.cuh:2148 [0xdd84e7]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:flamegpu::detail::CUDAAgent::validateIDCollisions in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\src\flamegpu\simulation\detail\CUDAAgent.cu:214 [0xdc24a9]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:flamegpu::detail::CUDAAgent::setPopulationData in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\src\flamegpu\simulation\detail\CUDAAgent.cu:154 [0xdbcc92]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:flamegpu::CUDASimulation::setPopulationData in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\src\flamegpu\simulation\CUDASimulation.cu:1400 [0xc0d802]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:flamegpu::test_message_array::test_arrayMessageReorderError in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\tests\test_cases\runtime\messaging\test_array.cu:1128 [0xb05827]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:flamegpu::test_message_array::TestMessage_Array_arrayMessageReorderMemoryLarge_Test::TestBody in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\tests\test_cases\runtime\messaging\test_array.cu:1147 [0xb05ac4]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test,void> in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:2605 [0xf5620d]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::internal::HandleExceptionsInMethodIfSupported<testing::Test,void> in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:2648 [0xf55e43]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::Test::Run in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:2687 [0xf277bc]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::TestInfo::Run in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:2836 [0xf283db]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::TestSuite::Run in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:3015 [0xf28d51]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::internal::UnitTestImpl::RunAllTests in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:5920 [0xf2f55b]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,bool> in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:2605 [0xf562cd]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,bool> in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:2648 [0xf56153]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:testing::UnitTest::Run in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\src\gtest.cc:5484 [0xf29565]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:RUN_ALL_TESTS in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\_deps\googletest-src\googletest\include\gtest\gtest.h:2317 [0xbebd23]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:main in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\tests\helpers\main.cu:26 [0xbeb640]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:invoke_main in D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:79 [0xfe5c69]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:__scrt_common_main_seh in D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288 [0xfe5b52]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:__scrt_common_main in D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:331 [0xfe5a0e]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:mainCRTStartup in D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp:17 [0xfe5cfe]
=========                in C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\build-70-cu124\bin\Debug\tests.exe
=========     Host Frame:BaseThreadInitThunk [0x7ffecbf1257d]
=========                in C:\WINDOWS\System32\KERNEL32.DLL
=========     Host Frame:RtlUserThreadStart [0x7ffecc6caf08]
=========                in C:\WINDOWS\SYSTEM32\ntdll.dll
=========
CUDA error 1 [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub/device/dispatch/dispatch_radix_sort.cuh, 1972]: invalid argument
CUDA error 1 [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cub/device/dispatch/dispatch_radix_sort.cuh, 2412]: invalid argument
C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\include\flamegpu/simulation/detail/CUDAErrorChecking.cuh(28): CUDA Error: C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\src\flamegpu\simulation\detail\CUDAAgent.cu(214): cudaErrorInvalidValue invalid argument
unknown file: error: C++ exception with description "C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\include\flamegpu/simulation/detail/CUDAErrorChecking.cuh(28): CUDA Error: C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\src\flamegpu\simulation\detail\CUDAAgent.cu(214): cudaErrorInvalidValue invalid argument" thrown in the test body.

[  FAILED  ] TestMessage_Array.arrayMessageReorderMemoryLarge (239281 ms)
[----------] 1 test from TestMessage_Array (239281 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (239282 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] TestMessage_Array.arrayMessageReorderMemoryLarge

 1 FAILED TEST
========= Target application returned an error
========= ERROR SUMMARY: 3 errors
(base)
@ptheywood
Copy link
Member Author

Compiling for SM 89 allowed the pedestrian navigation example to run on Ada correctly.

-DCMAKE_CUDA_ARCHITECTURES=89 built perfectly in Visual Studio x64/Windows/Debug

FLAMEGPU/FLAMEGPU2-pedestrian_navigation-example#7 (comment)

@ptheywood
Copy link
Member Author

ptheywood commented Nov 21, 2024

Unable to reproduce with CUDA 12.6 compiling for SM 70 under linux.

The PTX JITer is part of the CUDA driver rather than cuda toolkit which may be relevant / make it os specific.

Compute sanitizer found no issues under linux for the TestMessage_Array.arrayMessageReorderMemoryLarge test case either.

Edit: Valgrind found no issues on the host either (in case of heap/stack corruption), but I didn't expect it to given it's behaving and appears to be PTX JIT related.


May be worth tryinng CUDA_FORCE_PTX_JIT=1 under windows for a build with SM_80/86 ptx embeded to see if that reproduces it - i.e. confirm if its a problem with JITing or not.

Could try and extract the JIT generated SASS to compare to the nvcc generated sass? Though a much smaller test case/ reproduce would make that more feasible (if it reproduces which is uncertain). Could also diff sass generated from ptx on linux and windows to compare

Otherwise rolling back the CUDA driver might be an (time consuming) option to try and narrow this down futher to then report it upstream (if a driver problem not an us problem)

@ptheywood ptheywood mentioned this issue Nov 21, 2024
11 tasks
@ptheywood
Copy link
Member Author

After downgrading then incrementally upgrading the NVIDIA driver on my machine, it does appear to be caused by CUDA 12.6 compatible drivers (560.76 & 560.94 at least)

12.4

After downgrading to CUDA 12.4's driver, 551.61, the tests which errored for me passed:

$ nvidia-smi.exe
Thu Nov 21 20:53:35 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.61                 Driver Version: 551.61         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
$ ./build-70-cu124/bin/Debug/tests.exe --gtest_filter="TestMessage_Array.arrayMessageReorderMemoryLarge"
Running main() from C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\tests\helpers\main.cu
Note: Google Test filter = TestMessage_Array.arrayMessageReorderMemoryLarge
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TestMessage_Array
[ RUN      ] TestMessage_Array.arrayMessageReorderMemoryLarge
[       OK ] TestMessage_Array.arrayMessageReorderMemoryLarge (220280 ms)
[----------] 1 test from TestMessage_Array (220281 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (220282 ms total)
[  PASSED  ] 1 test.

12.5

Using CUDA 12.5's driver, the tests continue to pass

$ nvidia-smi.exe
Thu Nov 21 21:29:28 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.85                 Driver Version: 555.85         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
$ ./build-70-cu124/bin/Debug/tests.exe --gtest_filter="TestMessage_Array.arrayMessageReorderMemoryLarge"
...
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (230094 ms total)
[  PASSED  ] 1 test.

## CUDA 12.6.0

With the CUDA 12.6.0 driver, `560.76`, the tests once again fail when embedding PTX for SM 70, but executing on an SM86 device.

```console
$ nvidia-smi.exe
Thu Nov 21 21:52:11 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.76                 Driver Version: 560.76         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
$ ./build-70-cu125/bin/Debug/tests.exe --gtest_filter="TestMessage_Array.arrayMessageReorderMemoryLarge"
Running main() from C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\tests\helpers\main.cu
Note: Google Test filter = TestMessage_Array.arrayMessageReorderMemoryLarge
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TestMessage_Array
[ RUN      ] TestMessage_Array.arrayMessageReorderMemoryLarge
CUDA error 1 [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub/device/dispatch/dispatch_radix_sort.cuh, 1970]: invalid argument
CUDA error 1 [C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5\include\cub/device/dispatch/dispatch_radix_sort.cuh, 2410]: invalid argument
C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\include\flamegpu/simulation/detail/CUDAErrorChecking.cuh(28): CUDA Error: C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\src\flamegpu\simulation\detail\CUDAAgent.cu(214): cudaErrorInvalidValue invalid argument
unknown file: error: C++ exception with description "C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\include\flamegpu/simulation/detail/CUDAErrorChecking.cuh(28): CUDA Error: C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\src\flamegpu\simulation\detail\CUDAAgent.cu(214): cudaErrorInvalidValue invalid argument" thrown in the test body.

[  FAILED  ] TestMessage_Array.arrayMessageReorderMemoryLarge (224887 ms)
[----------] 1 test from TestMessage_Array (224888 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (224889 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] TestMessage_Array.arrayMessageReorderMemoryLarge

 1 FAILED TEST

CUDA_FORCE_PTX_JIT=1

With a binary compiled for SM 80 executing on the SM 86 device (3060ti), with CUDA_FORCE_PTX_JIT=1, the tests pass (after JITing for a while).

This suggests that JITing from a lower compute capability is required.

I do not have access to any other Windows machines to determine if this is an issue for older architectures, or just SM 86 & 89 devices.

@ptheywood ptheywood changed the title Thrust/Cub Radix sort invalid argument errors Windows PTX Jit errors - Thrust/Cub Radix sort invalid argument errors Nov 21, 2024
@ptheywood
Copy link
Member Author

This appears to have been resolved in the Driver which ships with CUDA 12.6 Update 3, 561.17.

$ nvidia-smi.exe
Thu Nov 21 22:27:15 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 561.17                 Driver Version: 561.17         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
$ ./build-70-cu125/bin/Debug/tests.exe --gtest_filter="TestMessage_Array.arrayMessageReorderMemoryLarge"
Running main() from C:\Users\ptheywood\code\flamegpu\FLAMEGPU2\tests\helpers\main.cu
Note: Google Test filter = TestMessage_Array.arrayMessageReorderMemoryLarge
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TestMessage_Array
[ RUN      ] TestMessage_Array.arrayMessageReorderMemoryLarge
$ ./build-70-cu124/bin/Release/tests.exe
...
[----------] Global test environment tear-down
[==========] 1126 tests from 87 test suites ran. (200626 ms total)
[  PASSED  ] 1126 tests.

  YOU HAVE 40 DISABLED TESTS

@ptheywood ptheywood changed the title Windows PTX Jit errors - Thrust/Cub Radix sort invalid argument errors Windows R560 PTX Jit errors - invalid argument in cub radix sort Nov 21, 2024
@ptheywood
Copy link
Member Author

@gubbsjuk - I've narrowed down the errors I could trigger to be caused by the CUDA driver's PTX Jitter, but it appears to have been fixed in the 561 Driver which ships with CUDA 12.6 Update 3 released in the last few days.

If/when you next update your CUDA installation / nvidia driver on your RTX 3500 Ada machine, if you could build the Pedestrian Navigation example with the incorrect CUDA architecture specified and confirm if you encounter the runtime errors or not that would be very helpful.

i.e. the following or similar via GUIs

cd path\to\FLAMEGPU2-pedestrian_navigation-example
cmake -S . -B build-sm70 -DCMAKE_CUDA_ARCHITECTURES=70 -DFLAMEGPU_VISUALISATION=ON
cmake --build build-sm70 --config Release
.\build-sm70\bin\Release\pedestrian_navigation.exe -i map.xml -s 0

@gubbsjuk
Copy link

@ptheywood - Gotcha. Will get on it and report back once done.

For reference. Current Nvidia-driver:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.94                 Driver Version: 560.94         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+

@gubbsjuk
Copy link

gubbsjuk commented Nov 22, 2024

@ptheywood No runtime errors.

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 561.17                 Driver Version: 561.17         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+

@ptheywood
Copy link
Member Author

@gubbsjuk Thanks for confirming this for us.

I'll close this issue now, and edit the original post so we can redirect anyone else who encounters these issues here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants