-
Notifications
You must be signed in to change notification settings - Fork 761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][ROCm] Use offload-arch instead of mcpu for AMD arch #4239
Conversation
@malixian This patch should fix the issue discussed in: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe clang/test/Driver/sycl-offload-amdgcn.cpp
should be expanded, at least to check the error message and enforce valid command lines for the main test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test for the new diagnostic
5a790a2
to
a634bd9
Compare
This patch changes using `-mcpu` for SYCL applications targeting AMD to `-Xsycl-target-backend --offload-arch`. Before this patch the offloading arch wasn't set correctly for AMD architectures. This is fixing an issue with HIP that was talked about in intel#4133, regarding having `v4` in the hip part of the triple, without the `v4` HIP seems to be ignoring the fact that the offloading arch is missing from the triple, which is why there was a workaround orignally to force not using `v4` with SYCL. By fixing the offloading arch this patch fixes the issue properly and now the triple with `v4` works because it also contains the offloading architecture.
Co-authored-by: Artem Gindinson <artem.gindinson@intel.com>
I've updated I'm still seeing one test failing with this when running |
a634bd9
to
7331029
Compare
if (Triple.isAMDGCN() && llvm::none_of(GpuArchList, [&](auto &P) { | ||
return P.first.isAMDGCN(); | ||
})) { | ||
C.getDriver().Diag(clang::diag::err_drv_sycl_missing_amdgpu_arch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a default AMD GPU arch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I didn't go for the default AMD GPU arch is that as I understand it we need to specify the exact GPU architecture for AMD so a default would only work for a very specific type of GPUs. Which means that in a lot of cases users would still need to specify the architecture manually, so I think it is better to force the architecture to always be set manually and have a clear diagnostic, than have a default architecture that rarely works and a more confusing error message from hip.
This is different with NVidia because SM_50
covers a lot of different GPUs, so in most cases it will work out of the box and the user won't have to set the architecture.
Co-authored-by: Victor Lomuller <victor@codeplay.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@AGindinson, @hchilama, @mdtoguchi, ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…4463) #4175 introduced automatic addition of the generic spir64 device target when any section of the input objects had this triple assigned to it. As a result, the actual list of toolchains started exceeding the user-provided one by 1 item. After #4239, the above became a problem. The dispatch of -Xsycl-target-* arguments started happening earlier in theflow, which broke the following use-case: ``` clang++ -fsycl -fsycl-targets=spir64_gen gen-obj.o gen-and-spir64-obj.o -Xsycl-target-backend "-device *" ``` A fix for now is to ignore the autodetected spir64 target when propagating the -Xsycl-target-backend arguments. A permanent solution would involve a re-design of -Xsycl-target-backend handling so that it took place only once in the flow, or belating the addition of the autodetected generic triple into the list of device targets. Signed-off-by: Artem Gindinson <artem.gindinson@intel.com>
This patch changes using
-mcpu
for SYCL applications targeting AMD to-Xsycl-target-backend --offload-arch
.Before this patch the offloading arch wasn't set correctly for AMD
architectures.
This is fixing an issue with HIP that was talked about in #4133,
regarding having
v4
in the hip part of the triple, without thev4
HIP seems to be ignoring the fact that the offloading arch is missing
from the triple, which is why there was a workaround orignally to force
not using
v4
with SYCL. By fixing the offloading arch this patch fixesthe issue properly and now the triple with
v4
works because it alsocontains the offloading architecture.