-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the scan implementation to follow P0571's guidance. #52
Update the scan implementation to follow P0571's guidance. #52
Conversation
I applied this patch to test building CuPy, and got a lot of errors; here's a clip, let me know if you need a full log. We're interested in this fix because we cannot make double complex work with CUB's scan (but single complex works fine, see cupy/cupy#2919 (comment)). I can open an issue in NVlabs/CUB if you prefer, but it seems abandoned to me...
|
Can you try building with C++14, please? And with the branch, not a patch?
…--------------------------------------------------------------------------------
Bryce Adelstein Lelbach aka wash
US Programming Language Standards (PL22) Chair
ISO C++ Library Evolution Chair
CppCon and C++Now Program Chair
CUDA Core C++ Libraries (Thrust, CUB, libcu++) Lead @ NVIDIA
--------------------------------------------------------------------------------
________________________________
From: Leo Fang <notifications@github.com>
Sent: Monday, August 17, 2020 9:56 PM
To: thrust/cub <cub@noreply.github.com>
Cc: Bryce Lelbach <blelbach@nvidia.com>; Assign <assign@noreply.github.com>
Subject: Re: [thrust/cub] Update the scan implementation to follow P0571's guidance. (#52)
I applied this patch to test building CuPy, and got a lot of errors; here's a clip, let me know if you need a full log.
We're interested in this fix because we cannot make double complex work with CUB's scan (but single complex works fine, see cupy/cupy#2919 (comment)<cupy/cupy#2919 (comment)>). I can open an issue in NVlabs/CUB if you prefer, but it seems abandoned to me...
building 'cupy.cuda.cub' extension
gcc -pthread -B /home/leofang/miniconda3/envs/cupy_dev/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -D_FORCE_INLINES=1 -DCUPY_CUB_VERSION_CODE=101000 -I/home/leofang/cupy/install/../cupy/core/include/cupy/cub -I/home/leofang/cupy/install/../cupy/core/include -I/usr/local/cuda/include -I/home/leofang/miniconda3/envs/cupy_dev/include/python3.7m -c cupy/cuda/cub.cpp -o build/temp.linux-x86_64-3.7/cupy/cuda/cub.o -fopenmp
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
NVCC options: ['--generate-code=arch=compute_75,code=sm_75', '-O2', '--compiler-options="-fPIC"', '--std=c++11']
/usr/local/cuda/bin/nvcc -D_FORCE_INLINES=1 -DCUPY_CUB_VERSION_CODE=101000 -I/home/leofang/cupy/install/../cupy/core/include/cupy/cub -I/home/leofang/cupy/install/../cupy/core/include -I/usr/local/cuda/include -I/home/leofang/miniconda3/envs/cupy_dev/include/python3.7m -c cupy/cuda/cupy_cub.cu -o build/temp.linux-x86_64-3.7/cupy/cuda/cupy_cub.o --generate-code=arch=compute_75,code=sm_75 -O2 --compiler-options="-fPIC" --std=c++11
In file included from /home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/../iterator/../util_arch.cuh:36:0,
from /home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/../iterator/../config.cuh:35,
from /home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/../iterator/arg_index_input_iterator.cuh:39,
from /home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/device_reduce.cuh:41,
from cupy/cuda/cupy_cub.cu:3:
/home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/../iterator/../util_cpp_dialect.cuh:129:13: warning: CUB requires C++14. Please pass -std=c++14 to your compiler. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.
CUB_COMPILER_DEPRECATION(C++14, pass -std=c++14 to your compiler);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/dispatch/../../agent/../block/block_load.cuh(404): error: no suitable constructor exists to convert from "char" to "cub::NullType"
detected during:
instantiation of "void cub::LoadDirectWarpStriped(int, InputIteratorT, InputT (&)[ITEMS_PER_THREAD], int) [with InputT=cub::NullType, ITEMS_PER_THREAD=18, InputIteratorT=cub::CacheModifiedInputIterator<cub::LOAD_DEFAULT, char, int>]"
(912): here
instantiation of "void cub::BlockLoad<InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH>::LoadInternal<cub::BLOCK_LOAD_WARP_TRANSPOSE, DUMMY>::Load(InputIteratorT, InputT (&)[ITEMS_PER_THREAD], int) [with InputT=cub::NullType, BLOCK_DIM_X=64, ITEMS_PER_THREAD=18, ALGORITHM=cub::BLOCK_LOAD_WARP_TRANSPOSE, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, PTX_ARCH=750, DUMMY=0, InputIteratorT=cub::CacheModifiedInputIterator<cub::LOAD_DEFAULT, char, int>]"
(1169): here
instantiation of "void cub::BlockLoad<InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH>::Load(InputIteratorT, InputT (&)[ITEMS_PER_THREAD], int) [with InputT=cub::NullType, BLOCK_DIM_X=64, ITEMS_PER_THREAD=18, ALGORITHM=cub::BLOCK_LOAD_WARP_TRANSPOSE, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, PTX_ARCH=750, InputIteratorT=cub::CacheModifiedInputIterator<cub::LOAD_DEFAULT, char, int>]"
/home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/dispatch/../../agent/agent_scan.cuh(295): here
instantiation of "void cub::AgentScan<AgentScanPolicyT, InputIteratorT, OutputIteratorT, ScanOpT, InitValueT, OffsetT>::ConsumeTile<IS_LAST_TILE>(OffsetT, int, OffsetT, cub::AgentScan<AgentScanPolicyT, InputIteratorT, OutputIteratorT, ScanOpT, InitValueT, OffsetT>::ScanTileStateT &) [with AgentScanPolicyT=cub::AgentScanPolicy<64, 9, cub::NullType, cub::BLOCK_LOAD_WARP_TRANSPOSE, cub::LOAD_DEFAULT, cub::BLOCK_STORE_WARP_TRANSPOSE, cub::BLOCK_SCAN_WARP_SCANS, cub::MemBoundScaling<64, 9, cub::NullType>>, InputIteratorT=char *, OutputIteratorT=char *, ScanOpT=cub::Sum, InitValueT=cub::NullType, OffsetT=int, IS_LAST_TILE=false]"
/home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/dispatch/../../agent/agent_scan.cuh(343): here
instantiation of "void cub::AgentScan<AgentScanPolicyT, InputIteratorT, OutputIteratorT, ScanOpT, InitValueT, OffsetT>::ConsumeRange(int, cub::AgentScan<AgentScanPolicyT, InputIteratorT, OutputIteratorT, ScanOpT, InitValueT, OffsetT>::ScanTileStateT &, int) [with AgentScanPolicyT=cub::AgentScanPolicy<64, 9, cub::NullType, cub::BLOCK_LOAD_WARP_TRANSPOSE, cub::LOAD_DEFAULT, cub::BLOCK_STORE_WARP_TRANSPOSE, cub::BLOCK_SCAN_WARP_SCANS, cub::MemBoundScaling<64, 9, cub::NullType>>, InputIteratorT=char *, OutputIteratorT=char *, ScanOpT=cub::Sum, InitValueT=cub::NullType, OffsetT=int]"
/home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/dispatch/dispatch_scan.cuh(130): here
[ 8 instantiation contexts not shown ]
instantiation of "cudaError_t cub::ChainedPolicy<PTX_VERSION, PolicyT, PrevPolicyT>::Invoke(int, FunctorT &) [with PTX_VERSION=600, PolicyT=cub::DeviceScanPolicy<cub::NullType>::Policy600, PrevPolicyT=cub::DeviceScanPolicy<cub::NullType>::Policy520, FunctorT=cub::DispatchScan<char *, char *, cub::Sum, cub::NullType, int, cub::DeviceScanPolicy<cub::NullType>>]"
/home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/dispatch/dispatch_scan.cuh(480): here
instantiation of "cudaError_t cub::DispatchScan<InputIteratorT, OutputIteratorT, ScanOpT, InitValueT, OffsetT, SelectedPolicy>::Dispatch(void *, size_t &, InputIteratorT, OutputIteratorT, ScanOpT, InitValueT, OffsetT, cudaStream_t, __nv_bool) [with InputIteratorT=char *, OutputIteratorT=char *, ScanOpT=cub::Sum, InitValueT=cub::NullType, OffsetT=int, SelectedPolicy=cub::DeviceScanPolicy<cub::NullType>]"
/home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/device_scan.cuh(334): here
instantiation of "cudaError_t cub::DeviceScan::InclusiveSum(void *, size_t &, InputIteratorT, OutputIteratorT, int, cudaStream_t, __nv_bool) [with InputIteratorT=char *, OutputIteratorT=char *]"
cupy/cuda/cupy_cub.cu(564): here
instantiation of "void _cub_inclusive_sum::operator()<T>(void *, size_t &, void *, void *, int, cudaStream_t) [with T=char]"
/home/leofang/cupy/install/../cupy/core/include/cupy/type_dispatcher.cuh(38): here
instantiation of "void dtype_dispatcher(int, functor_t, Ts &&...) [with functor_t=_cub_inclusive_sum, Ts=<void *&, size_t &, void *&, void *&, int &, cudaStream_t &>]"
cupy/cuda/cupy_cub.cu(721): here
/home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/dispatch/../../agent/../block/block_load.cuh(367): error: no suitable constructor exists to convert from "char" to "cub::NullType"
detected during:
instantiation of "void cub::LoadDirectWarpStriped(int, InputIteratorT, InputT (&)[ITEMS_PER_THREAD]) [with InputT=cub::NullType, ITEMS_PER_THREAD=18, InputIteratorT=cub::CacheModifiedInputIterator<cub::LOAD_DEFAULT, char, int>]"
(901): here
instantiation of "void cub::BlockLoad<InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH>::LoadInternal<cub::BLOCK_LOAD_WARP_TRANSPOSE, DUMMY>::Load(InputIteratorT, InputT (&)[ITEMS_PER_THREAD]) [with InputT=cub::NullType, BLOCK_DIM_X=64, ITEMS_PER_THREAD=18, ALGORITHM=cub::BLOCK_LOAD_WARP_TRANSPOSE, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, PTX_ARCH=750, DUMMY=0, InputIteratorT=cub::CacheModifiedInputIterator<cub::LOAD_DEFAULT, char, int>]"
(1122): here
instantiation of "void cub::BlockLoad<InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH>::Load(InputIteratorT, InputT (&)[ITEMS_PER_THREAD]) [with InputT=cub::NullType, BLOCK_DIM_X=64, ITEMS_PER_THREAD=18, ALGORITHM=cub::BLOCK_LOAD_WARP_TRANSPOSE, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, PTX_ARCH=750, InputIteratorT=cub::CacheModifiedInputIterator<cub::LOAD_DEFAULT, char, int>]"
/home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/dispatch/../../agent/agent_scan.cuh(297): here
instantiation of "void cub::AgentScan<AgentScanPolicyT, InputIteratorT, OutputIteratorT, ScanOpT, InitValueT, OffsetT>::ConsumeTile<IS_LAST_TILE>(OffsetT, int, OffsetT, cub::AgentScan<AgentScanPolicyT, InputIteratorT, OutputIteratorT, ScanOpT, InitValueT, OffsetT>::ScanTileStateT &) [with AgentScanPolicyT=cub::AgentScanPolicy<64, 9, cub::NullType, cub::BLOCK_LOAD_WARP_TRANSPOSE, cub::LOAD_DEFAULT, cub::BLOCK_STORE_WARP_TRANSPOSE, cub::BLOCK_SCAN_WARP_SCANS, cub::MemBoundScaling<64, 9, cub::NullType>>, InputIteratorT=char *, OutputIteratorT=char *, ScanOpT=cub::Sum, InitValueT=cub::NullType, OffsetT=int, IS_LAST_TILE=false]"
/home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/dispatch/../../agent/agent_scan.cuh(343): here
instantiation of "void cub::AgentScan<AgentScanPolicyT, InputIteratorT, OutputIteratorT, ScanOpT, InitValueT, OffsetT>::ConsumeRange(int, cub::AgentScan<AgentScanPolicyT, InputIteratorT, OutputIteratorT, ScanOpT, InitValueT, OffsetT>::ScanTileStateT &, int) [with AgentScanPolicyT=cub::AgentScanPolicy<64, 9, cub::NullType, cub::BLOCK_LOAD_WARP_TRANSPOSE, cub::LOAD_DEFAULT, cub::BLOCK_STORE_WARP_TRANSPOSE, cub::BLOCK_SCAN_WARP_SCANS, cub::MemBoundScaling<64, 9, cub::NullType>>, InputIteratorT=char *, OutputIteratorT=char *, ScanOpT=cub::Sum, InitValueT=cub::NullType, OffsetT=int]"
/home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/dispatch/dispatch_scan.cuh(130): here
[ 8 instantiation contexts not shown ]
instantiation of "cudaError_t cub::ChainedPolicy<PTX_VERSION, PolicyT, PrevPolicyT>::Invoke(int, FunctorT &) [with PTX_VERSION=600, PolicyT=cub::DeviceScanPolicy<cub::NullType>::Policy600, PrevPolicyT=cub::DeviceScanPolicy<cub::NullType>::Policy520, FunctorT=cub::DispatchScan<char *, char *, cub::Sum, cub::NullType, int, cub::DeviceScanPolicy<cub::NullType>>]"
/home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/dispatch/dispatch_scan.cuh(480): here
instantiation of "cudaError_t cub::DispatchScan<InputIteratorT, OutputIteratorT, ScanOpT, InitValueT, OffsetT, SelectedPolicy>::Dispatch(void *, size_t &, InputIteratorT, OutputIteratorT, ScanOpT, InitValueT, OffsetT, cudaStream_t, __nv_bool) [with InputIteratorT=char *, OutputIteratorT=char *, ScanOpT=cub::Sum, InitValueT=cub::NullType, OffsetT=int, SelectedPolicy=cub::DeviceScanPolicy<cub::NullType>]"
/home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/device_scan.cuh(334): here
instantiation of "cudaError_t cub::DeviceScan::InclusiveSum(void *, size_t &, InputIteratorT, OutputIteratorT, int, cudaStream_t, __nv_bool) [with InputIteratorT=char *, OutputIteratorT=char *]"
cupy/cuda/cupy_cub.cu(564): here
instantiation of "void _cub_inclusive_sum::operator()<T>(void *, size_t &, void *, void *, int, cudaStream_t) [with T=char]"
/home/leofang/cupy/install/../cupy/core/include/cupy/type_dispatcher.cuh(38): here
instantiation of "void dtype_dispatcher(int, functor_t, Ts &&...) [with functor_t=_cub_inclusive_sum, Ts=<void *&, size_t &, void *&, void *&, int &, cudaStream_t &>]"
cupy/cuda/cupy_cub.cu(721): here
/home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/dispatch/../../agent/../block/specializations/../../warp/specializations/../../thread/thread_operators.cuh(114): error: no operator "+" matches these operands
operand types are: const cub::NullType + const cub::NullType
detected during:
instantiation of "T cub::Sum::operator()(const T &, const T &) const [with T=cub::NullType]"
/home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/dispatch/../../agent/../block/specializations/../../warp/specializations/warp_scan_shfl.cuh(408): here
instantiation of "_T cub::WarpScanShfl<T, LOGICAL_WARP_THREADS, PTX_ARCH>::InclusiveScanStep(_T, ScanOpT, int, int) [with T=cub::NullType, LOGICAL_WARP_THREADS=32, PTX_ARCH=750, _T=cub::NullType, ScanOpT=cub::Sum]"
/home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/dispatch/../../agent/../block/specializations/../../warp/specializations/warp_scan_shfl.cuh(438): here
instantiation of "_T cub::WarpScanShfl<T, LOGICAL_WARP_THREADS, PTX_ARCH>::InclusiveScanStep(_T, ScanOpT, int, int, cub::Int2Type<0>) [with T=cub::NullType, LOGICAL_WARP_THREADS=32, PTX_ARCH=750, _T=cub::NullType, ScanOpT=cub::Sum]"
/home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/dispatch/../../agent/../block/specializations/../../warp/specializations/warp_scan_shfl.cuh(484): here
instantiation of "void cub::WarpScanShfl<T, LOGICAL_WARP_THREADS, PTX_ARCH>::InclusiveScan(_T, _T &, ScanOpT) [with T=cub::NullType, LOGICAL_WARP_THREADS=32, PTX_ARCH=750, _T=cub::NullType, ScanOpT=cub::Sum]"
/home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/dispatch/../../agent/../block/specializations/../../warp/warp_scan.cuh(451): here
instantiation of "void cub::WarpScan<T, LOGICAL_WARP_THREADS, PTX_ARCH>::InclusiveScan(T, T &, ScanOp) [with T=cub::NullType, LOGICAL_WARP_THREADS=32, PTX_ARCH=750, ScanOp=cub::Sum]"
/home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/dispatch/../../agent/../block/specializations/block_scan_warp_scans.cuh(341): here
[ 14 instantiation contexts not shown ]
instantiation of "cudaError_t cub::ChainedPolicy<PTX_VERSION, PolicyT, PrevPolicyT>::Invoke(int, FunctorT &) [with PTX_VERSION=600, PolicyT=cub::DeviceScanPolicy<cub::NullType>::Policy600, PrevPolicyT=cub::DeviceScanPolicy<cub::NullType>::Policy520, FunctorT=cub::DispatchScan<char *, char *, cub::Sum, cub::NullType, int, cub::DeviceScanPolicy<cub::NullType>>]"
/home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/dispatch/dispatch_scan.cuh(480): here
instantiation of "cudaError_t cub::DispatchScan<InputIteratorT, OutputIteratorT, ScanOpT, InitValueT, OffsetT, SelectedPolicy>::Dispatch(void *, size_t &, InputIteratorT, OutputIteratorT, ScanOpT, InitValueT, OffsetT, cudaStream_t, __nv_bool) [with InputIteratorT=char *, OutputIteratorT=char *, ScanOpT=cub::Sum, InitValueT=cub::NullType, OffsetT=int, SelectedPolicy=cub::DeviceScanPolicy<cub::NullType>]"
/home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/device_scan.cuh(334): here
instantiation of "cudaError_t cub::DeviceScan::InclusiveSum(void *, size_t &, InputIteratorT, OutputIteratorT, int, cudaStream_t, __nv_bool) [with InputIteratorT=char *, OutputIteratorT=char *]"
cupy/cuda/cupy_cub.cu(564): here
instantiation of "void _cub_inclusive_sum::operator()<T>(void *, size_t &, void *, void *, int, cudaStream_t) [with T=char]"
/home/leofang/cupy/install/../cupy/core/include/cupy/type_dispatcher.cuh(38): here
instantiation of "void dtype_dispatcher(int, functor_t, Ts &&...) [with functor_t=_cub_inclusive_sum, Ts=<void *&, size_t &, void *&, void *&, int &, cudaStream_t &>]"
cupy/cuda/cupy_cub.cu(721): here
/home/leofang/cupy/install/../cupy/core/include/cupy/cub/cub/device/dispatch/../../agent/../block/block_store.cuh(334): error: no suitable conversion function from "cub::NullType" to "char" exists
detected during:
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub<https://github.com/thrust/cub/pull/52#issuecomment-675248710>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AADBG4TTCTEPUXIJWT5SGHLSBICXBANCNFSM4PXEAMZA>.
|
Just tried 1. C++14 + patch, 2. C++11 + branch, and 3. C++14 + branch, and all of them got most of the same errors. Just FYI, I applied the patch on to the latest master, so it should be a legit patch. One of CuPy's CI is set up with Thrust + CUB from CUDA 11 and built with C++11, and so far everything works just fine, and we prefer to not bump to C++14 for as long as possible, as from the Python side the dependencies are more convolved and the compiler is only one of the the things we need to worry about. |
Some of the errors you linked indicated C++03 was being used, though. Can you post the errors from a C++11 build?
…--------------------------------------------------------------------------------
Bryce Adelstein Lelbach aka wash
US Programming Language Standards (PL22) Chair
ISO C++ Library Evolution Chair
CppCon and C++Now Program Chair
CUDA Core C++ Libraries (Thrust, CUB, libcu++) Lead @ NVIDIA
--------------------------------------------------------------------------------
________________________________
From: Leo Fang <notifications@github.com>
Sent: Monday, August 17, 2020 10:21 PM
To: thrust/cub <cub@noreply.github.com>
Cc: Bryce Lelbach <blelbach@nvidia.com>; Assign <assign@noreply.github.com>
Subject: Re: [thrust/cub] Update the scan implementation to follow P0571's guidance. (#52)
Just tried 1. C++14 + patch, 2. C++11 + branch, and 3. C++14 + branch, and all of them got most of the same errors. Just FYI, I applied the patch on to the latest master, so it should be a legit patch.
One of CuPy's CI is set up with Thrust + CUB from CUDA 11 and built with C++11, and so far everything works just fine, and we prefer to not bump to C++14 for as long as possible, as from the Python side the dependencies are more convolved and the compiler is only one of the the things we need to worry about.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub<https://github.com/thrust/cub/pull/52#issuecomment-675257080>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AADBG4SBM35UNIVKZZXFIPLSBIFXBANCNFSM4PXEAMZA>.
|
My first message was from a C++11 build (see the --std flag). Would you like me to upload a full log? |
Sure, that would help.
…--------------------------------------------------------------------------------
Bryce Adelstein Lelbach aka wash
US Programming Language Standards (PL22) Chair
ISO C++ Library Evolution Chair
CppCon and C++Now Program Chair
CUDA Core C++ Libraries (Thrust, CUB, libcu++) Lead @ NVIDIA
--------------------------------------------------------------------------------
________________________________
From: Leo Fang <notifications@github.com>
Sent: Monday, August 17, 2020 10:30 PM
To: thrust/cub <cub@noreply.github.com>
Cc: Bryce Lelbach <blelbach@nvidia.com>; Assign <assign@noreply.github.com>
Subject: Re: [thrust/cub] Update the scan implementation to follow P0571's guidance. (#52)
My first message was from a C++11 build (see the --std flag). Would you like me to upload a full log?
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub<https://github.com/thrust/cub/pull/52#issuecomment-675259903>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AADBG4WVMI4B4BX5B7DOJGTSBIGX3ANCNFSM4PXEAMZA>.
|
@leofang See if that last push fixes it. Relevant bits:
|
Thank you for quick reply, @allisonvacanti! 5b965c6 builds with both C++11 and C++14, but I see several errors in By the way, which version of CUB is included in CUDA 11 Update 1? |
Yes, 11.0u1 uses CUB 1.9.10-1. But be aware that This patch may also slightly change the results of scans where the input/output types do not match, so that may also be causing some tests to fail at runtime. |
I've traced from 1.9.9 and up to 1.9.10-1 now and haven't seen errors. Testing the master branch now...
The errors I got appeared for both C++11 and C++14 mode, so definitely something else. One of the errors I got was that the output array from
Thanks! I'm looking into this. |
I've tested the master branch and it works fine too, so looks like this PR caused the errors that I saw. We're actually in a GPU Hackathon now, so I'll continue looking into the root cause later today, but from a quick glance I don't spot any issue 🧐 (IIRC, in CuPy we set the output type of |
btw, another class of errors I see is that all elements are shifted by 1, but I don't understand what could lead to this from the code change: >>> import cupy as cp
>>> a = cp.arange(10, dtype=cp.float64)
>>> a.cumsum() # CuPy with this PR
array([ 0., 0., 1., 3., 6., 10., 15., 21., 28., 36.])
>>>
>>> import numpy as np
>>> a = np.arange(10, dtype=np.float64)
>>> a.cumsum() # NumPy, or CuPy before this PR
array([ 0., 1., 3., 6., 10., 15., 21., 28., 36., 45.]) |
@allisonvacanti I think I got the bug: if |
@leofang Ah, good catch. I'll be sure to fix that before this goes in. |
Submitted to DVS 29098336. @leofang This should address the issues you discovered above. Feel free to retry it, but be aware that this PR will migrate to the NVIDIA/cub repo soon. |
Moved to NVIDIA#201. |
Found this while writing tests for the thrust async algorithms.