Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][ROCm] Use offload-arch instead of mcpu for AMD arch #4239

Merged
merged 8 commits into from
Aug 19, 2021

Conversation

npmiller
Copy link
Contributor

@npmiller npmiller commented Aug 3, 2021

This patch changes using -mcpu for SYCL applications targeting AMD to
-Xsycl-target-backend --offload-arch.

Before this patch the offloading arch wasn't set correctly for AMD
architectures.

This is fixing an issue with HIP that was talked about in #4133,
regarding having v4 in the hip part of the triple, without the v4
HIP seems to be ignoring the fact that the offloading arch is missing
from the triple, which is why there was a workaround orignally to force
not using v4 with SYCL. By fixing the offloading arch this patch fixes
the issue properly and now the triple with v4 works because it also
contains the offloading architecture.

@npmiller
Copy link
Contributor Author

npmiller commented Aug 3, 2021

@malixian This patch should fix the issue discussed in:

Copy link
Contributor

@AGindinson AGindinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe clang/test/Driver/sycl-offload-amdgcn.cpp should be expanded, at least to check the error message and enforce valid command lines for the main test cases.

@bader bader added the hip Issues related to execution on HIP backend. label Aug 4, 2021
Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test for the new diagnostic

@npmiller npmiller force-pushed the rocm-fix-offload-arch branch 2 times, most recently from 5a790a2 to a634bd9 Compare August 4, 2021 14:09
npmiller and others added 7 commits August 4, 2021 15:13
This patch changes using `-mcpu` for SYCL applications targeting AMD to
`-Xsycl-target-backend --offload-arch`.

Before this patch the offloading arch wasn't set correctly for AMD
architectures.

This is fixing an issue with HIP that was talked about in intel#4133,
regarding having `v4` in the hip part of the triple, without the `v4`
HIP seems to be ignoring the fact that the offloading arch is missing
from the triple, which is why there was a workaround orignally to force
not using `v4` with SYCL. By fixing the offloading arch this patch fixes
the issue properly and now the triple with `v4` works because it also
contains the offloading architecture.
Co-authored-by: Artem Gindinson <artem.gindinson@intel.com>
@npmiller
Copy link
Contributor Author

npmiller commented Aug 4, 2021

I've updated clang/test/Driver/sycl-offload-amdgcn.cpp to add the architecture parameters, added a test for the new diagnostic in there as well, and updated other amdgcn tests with architecture flags.

I'm still seeing one test failing with this when running ninja check, Driver/amdgpu-openmp-toolchain.c, however it seems that this test was already failing on AMD before this patch.

@npmiller npmiller force-pushed the rocm-fix-offload-arch branch from a634bd9 to 7331029 Compare August 4, 2021 14:16
bader
bader previously approved these changes Aug 4, 2021
if (Triple.isAMDGCN() && llvm::none_of(GpuArchList, [&](auto &P) {
return P.first.isAMDGCN();
})) {
C.getDriver().Diag(clang::diag::err_drv_sycl_missing_amdgpu_arch);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a default AMD GPU arch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I didn't go for the default AMD GPU arch is that as I understand it we need to specify the exact GPU architecture for AMD so a default would only work for a very specific type of GPUs. Which means that in a lot of cases users would still need to specify the architecture manually, so I think it is better to force the architecture to always be set manually and have a clear diagnostic, than have a default architecture that rarely works and a more confusing error message from hip.

This is different with NVidia because SM_50 covers a lot of different GPUs, so in most cases it will work out of the box and the user won't have to set the architecture.

Co-authored-by: Victor Lomuller <victor@codeplay.com>
Copy link
Contributor

@Naghasan Naghasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bader
Copy link
Contributor

bader commented Aug 19, 2021

@AGindinson, @hchilama, @mdtoguchi, ping.

Copy link
Contributor

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@hchilama hchilama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bader bader merged commit 2e08d0e into intel:sycl Aug 19, 2021
romanovvlad pushed a commit that referenced this pull request Sep 6, 2021
…4463)

#4175 introduced automatic addition
of the generic spir64 device target when any section of the input
objects had this triple assigned to it. As a result, the actual list
of toolchains started exceeding the user-provided one by 1 item.

After #4239, the above became a
problem. The dispatch of -Xsycl-target-* arguments started happening
earlier in theflow, which broke the following use-case:
```
clang++ -fsycl -fsycl-targets=spir64_gen gen-obj.o
gen-and-spir64-obj.o -Xsycl-target-backend "-device *"
```
A fix for now is to ignore the autodetected spir64 target when
propagating the -Xsycl-target-backend arguments. A permanent solution
would involve a re-design of -Xsycl-target-backend handling so that it
took place only once in the flow, or belating the addition of the
autodetected generic triple into the list of device targets.

Signed-off-by: Artem Gindinson <artem.gindinson@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hip Issues related to execution on HIP backend.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants