Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM and SPIRV-LLVM-Translator pulldown (WW30) #4133

Merged
merged 749 commits into from
Jul 23, 2021

Conversation

vmaksimo
Copy link
Contributor

@vmaksimo vmaksimo commented Jul 19, 2021

joker-eph and others added 30 commits July 16, 2021 06:54
… make it free of global initializer

We can build it with -Werror=global-constructors now. This helps
in situation where libSupport is embedded as a shared library,
potential with dlopen/dlclose scenario, and when command-line
parsing or other facilities may not be involved. Avoiding the
implicit construction of these cl::opt can avoid double-registration
issues and other kind of behavior.

Reviewed By: lattner, jpienaar

Differential Revision: https://reviews.llvm.org/D105959
    This patch handles the `<<` operator defined for `std::unique_ptr` in
    the std namespace (ignores custom overloads of the operator).

    Differential Revision: https://reviews.llvm.org/D105421
…the simulation

Added information stored in PipelineOptions and the MCSubtargetInfo.

Bug: https://bugs.llvm.org/show_bug.cgi?id=51041

Reviewed By: andreadb

Differential Revision: https://reviews.llvm.org/D106077
…pport to make it free of global initializer"

This reverts commit af93217.
Still some specific config broken in some way that requires more
investigation.
… make it free of global initializer

We can build it with -Werror=global-constructors now. This helps
in situation where libSupport is embedded as a shared library,
potential with dlopen/dlclose scenario, and when command-line
parsing or other facilities may not be involved. Avoiding the
implicit construction of these cl::opt can avoid double-registration
issues and other kind of behavior.

Reviewed By: lattner, jpienaar

Differential Revision: https://reviews.llvm.org/D105959
…void*

This change addresses this assertion that occurs in a downstream
compiler with a custom target.

```APInt.h:1151: bool llvm::APInt::operator==(const llvm::APInt &) const: Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"'```

No covering test case is susbmitted with this change since this crash
cannot be reproduced using any upstream supported target. The test case
that exposes this issue is as simple as:

```lang=c++
  void test(int * p) {
    int * q = p-1;
    if (q) {}
    if (q) {} // crash
    (void)q;
  }
```

The custom target that exposes this problem supports two address spaces,
16-bit `char`s, and a `_Bool` type that maps to 16-bits. There are no upstream
supported targets with similar attributes.

The assertion appears to be happening as a result of evaluating the
`SymIntExpr` `(reg_$0<int * p>) != 0U` in `VisitSymIntExpr` located in
`SimpleSValBuilder.cpp`. The `LHS` is evaluated to `32b` and the `RHS` is
evaluated to `16b`. This eventually leads to the assertion in `APInt.h`.

While this change addresses the crash and passes LITs, two follow-ups
are required:
  1) The remainder of `getZeroWithPtrWidth()` and `getIntWithPtrWidth()`
     should be cleaned up following this model to prevent future
     confusion.
  2) We're not sure why references are found along with the modified
     code path, that should not be the case. A more principled
     fix may be found after some further comprehension of why this
     is the case.

Acks: Thanks to @steakhal and @martong for the discussions leading to this
fix.

Reviewed By: NoQ

Differential Revision: https://reviews.llvm.org/D105974
Remove duplicate and stale doc comment on affineParallelize. NFC.
This patch teaches the compiler to identify a wider variety of
`BUILD_VECTOR`s which form integer arithmetic sequences, and to lower
them to `vid.v` with modifications for non-unit steps and non-zero
addends.

The sequences handled by this optimization must either be monotonically
increasing or decreasing. Consecutive elements holding the same value
indicate a fractional step which, while simple mathematically,
becomes more complex to handle both in the realm of lossy integer
division and in the presence of `undef`s.

For example, a common "interleaving" shuffle index will be lowered by
LLVM to both `<0,u,1,u,2,...>` and `<u,0,u,1,u,...>` `BUILD_VECTOR`
nodes. Either of these would ideally be lowered to `vid.v` shifted right
by 1. Detection of this sequence in presence of general `undef` values
is more complicated, however: `<0,u,u,1,>` could match either
`<0,0,0,1,>` or `<0,0,1,1,>` depending on later values in the sequence.
Both are possible, so backtracking or multiple passes is inevitable.

Sticking to monotonic sequences keeps the logic simpler as it can be
done in one pass. Fractional steps will likely be a separate
optimization in a future patch.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104921
…id costs."

The original patch was:
  https://reviews.llvm.org/D105806

There were some issues with undeterministic behaviour of the sorting
function, which led to scalable-call.ll passing and/or failing. This
patch fixes the issue by numbering all instructions in the array first,
and using that number as the order, which should provide a consistent
ordering.

This reverts commit a607f64.
Previously GetMemoryTagManager checked many things in one:
* architecture supports memory tagging
* process supports memory tagging
* memory range isn't inverted
* memory range is all tagged

Since writing follow up patches for tag writing (in review
at the moment) it has become clear that this gets unwieldy
once we add the features needed for that.

It also implies that the memory tag manager is tied to the
range you used to request it with but it is not. It's a per
process object.

Instead:
* GetMemoryTagManager just checks architecture and process.
* Then the MemoryTagManager can later be asked to check a
  memory range.

This is better because:
* We don't imply that range and manager are tied together.
* A slightly diferent range calculation for tag writing
  doesn't add more code to Process.
* Range checking code can now be unit tested.

Reviewed By: omjavaid

Differential Revision: https://reviews.llvm.org/D105630
This patch adds support for following contiguous load and store
instructions:

  * LD1B, LD1H, LD1W, LD1D, LD1Q
  * ST1B, ST1H, ST1W, ST1D, ST1Q

A new register class and operand is added for the 32-bit vector select
register W12-W15. The differences in the following tests which have been
re-generated are caused by the introduction of this register class:

  * llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll
  * llvm/test/CodeGen/AArch64/GlobalISel/regbank-inlineasm.mir
  * llvm/test/CodeGen/AArch64/stp-opt-with-renaming-reserved-regs.mir
  * llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir

D88663 attempts to resolve the issue with the store pair test
differences in the AArch64 load/store optimizer.

The GlobalISel differences are caused by changes in the enum values of
register classes, tests have been updated with the new values.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Reviewed By: CarolineConcatto

Differential Revision: https://reviews.llvm.org/D105572
This patch returns an Invalid cost from getInstructionCost() for alloca
instructions if the VF is scalable, as otherwise loops which contain
these instructions will crash when attempting to scalarize the alloca.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D105824
Specifying the latencies of specific LDP variants appears to improve
performance almost universally.

Differential Revision: https://reviews.llvm.org/D105882
This may be necessary in partial multi-stage conversion when a container type
from dialect A containing types from dialect B goes through the conversion
where only dialect A is converted to the LLVM dialect. We will need to keep a
pointer-to-non-LLVM type in the IR until a further conversion can convert
dialect B types to LLVM types.

Reviewed By: wsmoses

Differential Revision: https://reviews.llvm.org/D106076
https://reviews.llvm.org/D105659 implements ByVal handling in llc but
some cases are not compatible with existing XL compiler on AIX.  Adding
a clang warning for such cases.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D105660
Added isCall for S_CALL_B64; added isBranch for S_SUBVECTOR_LOOP_*.

Differential Revision: https://reviews.llvm.org/D106072
If you attach __attribute__((optnone)) to a function when using
optimisations, that function will use fast-isel instead of the usual
SelectionDAG method. This is a problem for instruction referencing,
because it means DBG_VALUEs of virtual registers will be created,
triggering some safety assertions in LiveDebugVariables. Those assertions
exist to detect exactly this scenario, where an unexpected piece of code is
generating virtual register references in instruction referencing mode.

Fix this by transforming the DBG_VALUEs created by fast-isel into
half-formed DBG_INSTR_REFs, after which they get patched up in
finalizeDebugInstrRefs. The test modified adds a fast-isel mode to the
instruction referencing isel test.

Differential Revision: https://reviews.llvm.org/D105694
This avoids relying on G_EXTRACT on unusual types, and also properly
decomposes structs into multiple registers. This also preserves the
LLTs in the memory operands.
The dialect-specific cast between builtin (ex-standard) types and LLVM
dialect types was introduced long time before built-in support for
unrealized_conversion_cast. It has a similar purpose, but is restricted
to compatible builtin and LLVM dialect types, which may hamper
progressive lowering and composition with types from other dialects.
Replace llvm.mlir.cast with unrealized_conversion_cast, and drop the
operation that became unnecessary.

Also make unrealized_conversion_cast legal by default in
LLVMConversionTarget as the majority of convesions using it are partial
conversions that actually want the casts to persist in the IR. The
standard-to-llvm conversion, which is still expected to run last, cleans
up the remaining casts  standard-to-llvm conversion, which is still
expected to run last, cleans up the remaining casts

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D105880
@vmaksimo
Copy link
Contributor Author

/summary:run

@vmaksimo
Copy link
Contributor Author

Hi again @steffenlarsen! We fixed build on CUDA, but there appeared a lit-test failure in clang/test/Driver/cuda-arch-translation.cu. It looks like a community test bug, but it can be something else. Possible culprit could be in vmaksimo@01d3a3d or vmaksimo@25629bb
Could you please take a look? Thanks!

@vmaksimo vmaksimo force-pushed the llvmspirv_pulldown branch from fa4d950 to f442564 Compare July 21, 2021 13:57
@steffenlarsen
Copy link
Contributor

@vmaksimo Looks to me like "v4" isn't appended to the "hip" part of the triple, which the tests expect. I suspect the problem is ec61222#diff-eefa159a6526f756b7260f310595827f33f96fd91a8d336f4d20ad8b72738d17R121

@steffenlarsen
Copy link
Contributor

@vmaksimo Looks to me like "v4" isn't appended to the "hip" part of the triple, which the tests expect. I suspect the problem is ec61222#diff-eefa159a6526f756b7260f310595827f33f96fd91a8d336f4d20ad8b72738d17R121

I have tried removing the changes at that line and the test passes. That said, I don't know what the motivation of that change was.

@bader
Copy link
Contributor

bader commented Jul 21, 2021

@vmaksimo Looks to me like "v4" isn't appended to the "hip" part of the triple, which the tests expect. I suspect the problem is ec61222#diff-eefa159a6526f756b7260f310595827f33f96fd91a8d336f4d20ad8b72738d17R121

I have tried removing the changes at that line and the test passes. That said, I don't know what the motivation of that change was.

Tagging @malixian to clarify.

@vmaksimo
Copy link
Contributor Author

/summary:run

@vmaksimo vmaksimo marked this pull request as ready for review July 23, 2021 08:13
@romanovvlad romanovvlad merged commit f7c5443 into intel:sycl Jul 23, 2021
npmiller added a commit to npmiller/llvm that referenced this pull request Jul 27, 2021
This file uses `cl_mem_fence_flags`, `CLK_GLOBAL_MEM_FENCE`,
`CLK_LOCAL_MEM_FENCE`, which are defined by `clc.h`.

It wasn't included because when this file was written the OpenCL headers
were implicitely included, this has changed so now including the header
is necessary. See the discussion on intel#4133 for more details.
@malixian
Copy link
Contributor

@vmaksimo Looks to me like "v4" isn't appended to the "hip" part of the triple, which the tests expect. I suspect the problem is ec61222#diff-eefa159a6526f756b7260f310595827f33f96fd91a8d336f4d20ad8b72738d17R121

I have tried removing the changes at that line and the test passes. That said, I don't know what the motivation of that change was.

The motivation for this change comes from some of our tests. Through testing, we found that if the compiler does not have the option of mcode_object_version, even if the version obtained by getAMDGPUCodeObjectVersion is V4, the generated amd binary cannot be executed correctly in ROCm. So we use haveAMDGPUCodeObjectVersionArgument to determine whether code_object_version is included, and only when these two functions are true, will OffloadKind attach v4. In fact, we also found through testing that even if the version obtained by getAMDGPUCodeObjectVersion is V4, it is not necessary to attach v4 to OffloadKind, so that there is no problem in rocm execution.

I can’t explain this phenomenon yet, and I’m also confused about this problem. This may be related to the version of ROCm or the version of llvm.

bader pushed a commit that referenced this pull request Jul 28, 2021
This file uses `cl_mem_fence_flags`, `CLK_GLOBAL_MEM_FENCE`,
`CLK_LOCAL_MEM_FENCE`, which are defined by `clc.h`.

It wasn't included because when this file was written the OpenCL headers
were implicitely included, this has changed so now including the header
is necessary. See the discussion on #4133 for more details.
zahiraam pushed a commit to zahiraam/llvm-1 that referenced this pull request Aug 2, 2021
This file uses `cl_mem_fence_flags`, `CLK_GLOBAL_MEM_FENCE`,
`CLK_LOCAL_MEM_FENCE`, which are defined by `clc.h`.

It wasn't included because when this file was written the OpenCL headers
were implicitely included, this has changed so now including the header
is necessary. See the discussion on intel#4133 for more details.
npmiller added a commit to npmiller/llvm that referenced this pull request Aug 3, 2021
This patch changes using `-mcpu` for SYCL applications targeting AMD to
`-Xsycl-target-backend --offload-arch`.

Before this patch the offloading arch wasn't set correctly for AMD
architectures.

This is fixing an issue with HIP that was talked about in intel#4133,
regarding having `v4` in the hip part of the triple, without the `v4`
HIP seems to be ignoring the fact that the offloading arch is missing
from the triple, which is why there was a workaround orignally to force
not using `v4` with SYCL. By fixing the offloading arch this patch fixes
the issue properly and now the triple with `v4` works because it also
contains the offloading architecture.
npmiller added a commit to npmiller/llvm that referenced this pull request Aug 4, 2021
This patch changes using `-mcpu` for SYCL applications targeting AMD to
`-Xsycl-target-backend --offload-arch`.

Before this patch the offloading arch wasn't set correctly for AMD
architectures.

This is fixing an issue with HIP that was talked about in intel#4133,
regarding having `v4` in the hip part of the triple, without the `v4`
HIP seems to be ignoring the fact that the offloading arch is missing
from the triple, which is why there was a workaround orignally to force
not using `v4` with SYCL. By fixing the offloading arch this patch fixes
the issue properly and now the triple with `v4` works because it also
contains the offloading architecture.
bader pushed a commit that referenced this pull request Aug 19, 2021
This patch changes using `-mcpu` for SYCL applications targeting AMD to
`-Xsycl-target-backend --offload-arch`.

Before this patch the offloading arch wasn't set correctly for AMD
architectures.

This is fixing an issue with HIP that was talked about in #4133,
regarding having `v4` in the hip part of the triple, without the `v4`
HIP seems to be ignoring the fact that the offloading arch is missing
from the triple, which is why there was a workaround orignally to force
not using `v4` with SYCL. By fixing the offloading arch this patch fixes
the issue properly and now the triple with `v4` works because it also
contains the offloading architecture.

Co-authored-by: Artem Gindinson <artem.gindinson@intel.com>
Co-authored-by: Victor Lomuller <victor@codeplay.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.