LLVM and SPIRV-LLVM-Translator pulldown (WW30) #4133

vmaksimo · 2021-07-19T08:47:20Z

LLVM: llvm/llvm-project@fb0cf01
SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@9e234d9

… make it free of global initializer We can build it with -Werror=global-constructors now. This helps in situation where libSupport is embedded as a shared library, potential with dlopen/dlclose scenario, and when command-line parsing or other facilities may not be involved. Avoiding the implicit construction of these cl::opt can avoid double-registration issues and other kind of behavior. Reviewed By: lattner, jpienaar Differential Revision: https://reviews.llvm.org/D105959

This patch handles the `<<` operator defined for `std::unique_ptr` in the std namespace (ignores custom overloads of the operator). Differential Revision: https://reviews.llvm.org/D105421

…the simulation Added information stored in PipelineOptions and the MCSubtargetInfo. Bug: https://bugs.llvm.org/show_bug.cgi?id=51041 Reviewed By: andreadb Differential Revision: https://reviews.llvm.org/D106077

Differential Revision: https://reviews.llvm.org/D106055

This reverts commit 7c63726.

…pport to make it free of global initializer" This reverts commit af93217. Still some specific config broken in some way that requires more investigation.

… make it free of global initializer We can build it with -Werror=global-constructors now. This helps in situation where libSupport is embedded as a shared library, potential with dlopen/dlclose scenario, and when command-line parsing or other facilities may not be involved. Avoiding the implicit construction of these cl::opt can avoid double-registration issues and other kind of behavior. Reviewed By: lattner, jpienaar Differential Revision: https://reviews.llvm.org/D105959

Differential Revision: https://reviews.llvm.org/D105982

@steakhal

…void* This change addresses this assertion that occurs in a downstream compiler with a custom target. ```APInt.h:1151: bool llvm::APInt::operator==(const llvm::APInt &) const: Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"'``` No covering test case is susbmitted with this change since this crash cannot be reproduced using any upstream supported target. The test case that exposes this issue is as simple as: ```lang=c++ void test(int * p) { int * q = p-1; if (q) {} if (q) {} // crash (void)q; } ``` The custom target that exposes this problem supports two address spaces, 16-bit `char`s, and a `_Bool` type that maps to 16-bits. There are no upstream supported targets with similar attributes. The assertion appears to be happening as a result of evaluating the `SymIntExpr` `(reg_$0<int * p>) != 0U` in `VisitSymIntExpr` located in `SimpleSValBuilder.cpp`. The `LHS` is evaluated to `32b` and the `RHS` is evaluated to `16b`. This eventually leads to the assertion in `APInt.h`. While this change addresses the crash and passes LITs, two follow-ups are required: 1) The remainder of `getZeroWithPtrWidth()` and `getIntWithPtrWidth()` should be cleaned up following this model to prevent future confusion. 2) We're not sure why references are found along with the modified code path, that should not be the case. A more principled fix may be found after some further comprehension of why this is the case. Acks: Thanks to @steakhal and @martong for the discussions leading to this fix. Reviewed By: NoQ Differential Revision: https://reviews.llvm.org/D105974

Differential Revision: https://reviews.llvm.org/D106055

Remove duplicate and stale doc comment on affineParallelize. NFC.

This patch teaches the compiler to identify a wider variety of `BUILD_VECTOR`s which form integer arithmetic sequences, and to lower them to `vid.v` with modifications for non-unit steps and non-zero addends. The sequences handled by this optimization must either be monotonically increasing or decreasing. Consecutive elements holding the same value indicate a fractional step which, while simple mathematically, becomes more complex to handle both in the realm of lossy integer division and in the presence of `undef`s. For example, a common "interleaving" shuffle index will be lowered by LLVM to both `<0,u,1,u,2,...>` and `<u,0,u,1,u,...>` `BUILD_VECTOR` nodes. Either of these would ideally be lowered to `vid.v` shifted right by 1. Detection of this sequence in presence of general `undef` values is more complicated, however: `<0,u,u,1,>` could match either `<0,0,0,1,>` or `<0,0,1,1,>` depending on later values in the sequence. Both are possible, so backtracking or multiple passes is inevitable. Sticking to monotonic sequences keeps the logic simpler as it can be done in one pass. Fractional steps will likely be a separate optimization in a future patch. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104921

…id costs." The original patch was: https://reviews.llvm.org/D105806 There were some issues with undeterministic behaviour of the sorting function, which led to scalable-call.ll passing and/or failing. This patch fixes the issue by numbering all instructions in the array first, and using that number as the order, which should provide a consistent ordering. This reverts commit a607f64.

Previously GetMemoryTagManager checked many things in one: * architecture supports memory tagging * process supports memory tagging * memory range isn't inverted * memory range is all tagged Since writing follow up patches for tag writing (in review at the moment) it has become clear that this gets unwieldy once we add the features needed for that. It also implies that the memory tag manager is tied to the range you used to request it with but it is not. It's a per process object. Instead: * GetMemoryTagManager just checks architecture and process. * Then the MemoryTagManager can later be asked to check a memory range. This is better because: * We don't imply that range and manager are tied together. * A slightly diferent range calculation for tag writing doesn't add more code to Process. * Range checking code can now be unit tested. Reviewed By: omjavaid Differential Revision: https://reviews.llvm.org/D105630

This patch adds support for following contiguous load and store instructions: * LD1B, LD1H, LD1W, LD1D, LD1Q * ST1B, ST1H, ST1W, ST1D, ST1Q A new register class and operand is added for the 32-bit vector select register W12-W15. The differences in the following tests which have been re-generated are caused by the introduction of this register class: * llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll * llvm/test/CodeGen/AArch64/GlobalISel/regbank-inlineasm.mir * llvm/test/CodeGen/AArch64/stp-opt-with-renaming-reserved-regs.mir * llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir D88663 attempts to resolve the issue with the store pair test differences in the AArch64 load/store optimizer. The GlobalISel differences are caused by changes in the enum values of register classes, tests have been updated with the new values. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: CarolineConcatto Differential Revision: https://reviews.llvm.org/D105572

This patch returns an Invalid cost from getInstructionCost() for alloca instructions if the VF is scalable, as otherwise loops which contain these instructions will crash when attempting to scalarize the alloca. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D105824

Specifying the latencies of specific LDP variants appears to improve performance almost universally. Differential Revision: https://reviews.llvm.org/D105882

This may be necessary in partial multi-stage conversion when a container type from dialect A containing types from dialect B goes through the conversion where only dialect A is converted to the LLVM dialect. We will need to keep a pointer-to-non-LLVM type in the IR until a further conversion can convert dialect B types to LLVM types. Reviewed By: wsmoses Differential Revision: https://reviews.llvm.org/D106076

RFC: https://llvm.discourse.group/t/rfc-reshape-ops-restructuring/3310 Differential Revision: https://reviews.llvm.org/D106141

https://reviews.llvm.org/D105659 implements ByVal handling in llc but some cases are not compatible with existing XL compiler on AIX. Adding a clang warning for such cases. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D105660

Added isCall for S_CALL_B64; added isBranch for S_SUBVECTOR_LOOP_*. Differential Revision: https://reviews.llvm.org/D106072

…ment type This fixes bug 36064 Differential Revision: https://reviews.llvm.org/D106093

More coverage for D105730

If you attach __attribute__((optnone)) to a function when using optimisations, that function will use fast-isel instead of the usual SelectionDAG method. This is a problem for instruction referencing, because it means DBG_VALUEs of virtual registers will be created, triggering some safety assertions in LiveDebugVariables. Those assertions exist to detect exactly this scenario, where an unexpected piece of code is generating virtual register references in instruction referencing mode. Fix this by transforming the DBG_VALUEs created by fast-isel into half-formed DBG_INSTR_REFs, after which they get patched up in finalizeDebugInstrRefs. The test modified adds a fast-isel mode to the instruction referencing isel test. Differential Revision: https://reviews.llvm.org/D105694

This avoids relying on G_EXTRACT on unusual types, and also properly decomposes structs into multiple registers. This also preserves the LLTs in the memory operands.

The dialect-specific cast between builtin (ex-standard) types and LLVM dialect types was introduced long time before built-in support for unrealized_conversion_cast. It has a similar purpose, but is restricted to compatible builtin and LLVM dialect types, which may hamper progressive lowering and composition with types from other dialects. Replace llvm.mlir.cast with unrealized_conversion_cast, and drop the operation that became unnecessary. Also make unrealized_conversion_cast legal by default in LLVMConversionTarget as the majority of convesions using it are partial conversions that actually want the casts to persist in the IR. The standard-to-llvm conversion, which is still expected to run last, cleans up the remaining casts standard-to-llvm conversion, which is still expected to run last, cleans up the remaining casts Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D105880

vmaksimo · 2021-07-20T21:46:00Z

/summary:run

…v_pulldown

…tribute_parallel_for_simd_codegen failure"

vmaksimo · 2021-07-21T13:56:00Z

Hi again @steffenlarsen! We fixed build on CUDA, but there appeared a lit-test failure in clang/test/Driver/cuda-arch-translation.cu. It looks like a community test bug, but it can be something else. Possible culprit could be in vmaksimo@01d3a3d or vmaksimo@25629bb
Could you please take a look? Thanks!

steffenlarsen · 2021-07-21T16:04:14Z

@vmaksimo Looks to me like "v4" isn't appended to the "hip" part of the triple, which the tests expect. I suspect the problem is ec61222#diff-eefa159a6526f756b7260f310595827f33f96fd91a8d336f4d20ad8b72738d17R121

steffenlarsen · 2021-07-21T16:20:10Z

@vmaksimo Looks to me like "v4" isn't appended to the "hip" part of the triple, which the tests expect. I suspect the problem is ec61222#diff-eefa159a6526f756b7260f310595827f33f96fd91a8d336f4d20ad8b72738d17R121

I have tried removing the changes at that line and the test passes. That said, I don't know what the motivation of that change was.

bader · 2021-07-21T16:40:09Z

@vmaksimo Looks to me like "v4" isn't appended to the "hip" part of the triple, which the tests expect. I suspect the problem is ec61222#diff-eefa159a6526f756b7260f310595827f33f96fd91a8d336f4d20ad8b72738d17R121

I have tried removing the changes at that line and the test passes. That said, I don't know what the motivation of that change was.

Tagging @malixian to clarify.

This check causes regression to cuda-arch-translation after community test changes intel@ec61222#diff-eefa159a6526f756b7260f310595827f33f96fd91a8d336f4d20ad8b72738d17R121

vmaksimo · 2021-07-22T10:40:10Z

/summary:run

This file uses `cl_mem_fence_flags`, `CLK_GLOBAL_MEM_FENCE`, `CLK_LOCAL_MEM_FENCE`, which are defined by `clc.h`. It wasn't included because when this file was written the OpenCL headers were implicitely included, this has changed so now including the header is necessary. See the discussion on intel#4133 for more details.

malixian · 2021-07-28T09:27:17Z

@vmaksimo Looks to me like "v4" isn't appended to the "hip" part of the triple, which the tests expect. I suspect the problem is ec61222#diff-eefa159a6526f756b7260f310595827f33f96fd91a8d336f4d20ad8b72738d17R121

I have tried removing the changes at that line and the test passes. That said, I don't know what the motivation of that change was.

The motivation for this change comes from some of our tests. Through testing, we found that if the compiler does not have the option of mcode_object_version, even if the version obtained by getAMDGPUCodeObjectVersion is V4, the generated amd binary cannot be executed correctly in ROCm. So we use haveAMDGPUCodeObjectVersionArgument to determine whether code_object_version is included, and only when these two functions are true, will OffloadKind attach v4. In fact, we also found through testing that even if the version obtained by getAMDGPUCodeObjectVersion is V4, it is not necessary to attach v4 to OffloadKind, so that there is no problem in rocm execution.

I can’t explain this phenomenon yet, and I’m also confused about this problem. This may be related to the version of ROCm or the version of llvm.

This file uses `cl_mem_fence_flags`, `CLK_GLOBAL_MEM_FENCE`, `CLK_LOCAL_MEM_FENCE`, which are defined by `clc.h`. It wasn't included because when this file was written the OpenCL headers were implicitely included, this has changed so now including the header is necessary. See the discussion on #4133 for more details.

This file uses `cl_mem_fence_flags`, `CLK_GLOBAL_MEM_FENCE`, `CLK_LOCAL_MEM_FENCE`, which are defined by `clc.h`. It wasn't included because when this file was written the OpenCL headers were implicitely included, this has changed so now including the header is necessary. See the discussion on intel#4133 for more details.

This patch changes using `-mcpu` for SYCL applications targeting AMD to `-Xsycl-target-backend --offload-arch`. Before this patch the offloading arch wasn't set correctly for AMD architectures. This is fixing an issue with HIP that was talked about in intel#4133, regarding having `v4` in the hip part of the triple, without the `v4` HIP seems to be ignoring the fact that the offloading arch is missing from the triple, which is why there was a workaround orignally to force not using `v4` with SYCL. By fixing the offloading arch this patch fixes the issue properly and now the triple with `v4` works because it also contains the offloading architecture.

This patch changes using `-mcpu` for SYCL applications targeting AMD to `-Xsycl-target-backend --offload-arch`. Before this patch the offloading arch wasn't set correctly for AMD architectures. This is fixing an issue with HIP that was talked about in #4133, regarding having `v4` in the hip part of the triple, without the `v4` HIP seems to be ignoring the fact that the offloading arch is missing from the triple, which is why there was a workaround orignally to force not using `v4` with SYCL. By fixing the offloading arch this patch fixes the issue properly and now the triple with `v4` works because it also contains the offloading architecture. Co-authored-by: Artem Gindinson <artem.gindinson@intel.com> Co-authored-by: Victor Lomuller <victor@codeplay.com>

joker-eph and others added 30 commits July 16, 2021 06:54

[analyzer] Handle << operator for std::unique_ptr

13fe782

This patch handles the `<<` operator defined for `std::unique_ptr` in the std namespace (ignores custom overloads of the operator). Differential Revision: https://reviews.llvm.org/D105421

[llvm-mca][JSON] Store extra information about driver flags used for …

77f2f0f

…the simulation Added information stored in PipelineOptions and the MCSubtargetInfo. Bug: https://bugs.llvm.org/show_bug.cgi?id=51041 Reviewed By: andreadb Differential Revision: https://reviews.llvm.org/D106077

[llvm][tools] Hide unrelated llvm-cfi-verify options

7c63726

Differential Revision: https://reviews.llvm.org/D106055

Revert "[llvm][tools] Hide unrelated llvm-cfi-verify options"

3d3dc95

This reverts commit 7c63726.

Revert "Use ManagedStatic and lazy initialization of cl::opt in libSu…

8d051d8

…pport to make it free of global initializer" This reverts commit af93217. Still some specific config broken in some way that requires more investigation.

Reformat files.

a12000e

Differential Revision: https://reviews.llvm.org/D105982

[llvm][tools] Hide unrelated llvm-cfi-verify options

69a5684

Differential Revision: https://reviews.llvm.org/D106055

[MLIR][NFC] Improve doc comment and delete stale comment

edfcfa6

Remove duplicate and stale doc comment on affineParallelize. NFC.

[AArch64] Update Cortex-A55 SchedModel to improve LDP scheduling

9769535

Specifying the latencies of specific LDP variants appears to improve performance almost universally. Differential Revision: https://reviews.llvm.org/D105882

Use update_test_checks.py to auto-generate check lines

a0b4f42

[mlir] Move linalg::Expand/CollapseShapeOp to memref dialect.

46ef86b

RFC: https://llvm.discourse.group/t/rfc-reshape-ops-restructuring/3310 Differential Revision: https://reviews.llvm.org/D106141

[AMDGPU][MC] Added missing isCall/isBranch flags

09c9f4d

Added isCall for S_CALL_B64; added isBranch for S_SUBVECTOR_LOOP_*. Differential Revision: https://reviews.llvm.org/D106072

SubstTemplateTypeParmType can contain an 'auto' type in their replace…

8ada884

…ment type This fixes bug 36064 Differential Revision: https://reviews.llvm.org/D106093

[SLP] add tests for poison-safe bool logic reductions; NFC

d9abb15

More coverage for D105730

AMDGPU/GlobalISel: Redo kernel argument load handling

21a0ef8

This avoids relying on G_EXTRACT on unusual types, and also properly decomposes structs into multiple registers. This also preserves the LLTs in the memory operands.

AMDGPU/GlobalISel: Preserve more memory types

3ceb922

GlobalISel: Remove dead function

f57f8f7

[mlir] add missing build dependency

a24e020

vmaksimo added 2 commits July 21, 2021 12:46

Merge commit 'fb0cf0179526f135fab24b84f9dcd1c41f59ec53' into llvmspir…

1168271

…v_pulldown

Reapply after conflict resolution 418a6d6 "Fix nvptx_target_teams_dis…

f442564

…tribute_parallel_for_simd_codegen failure"

vmaksimo force-pushed the llvmspirv_pulldown branch from fa4d950 to f442564 Compare July 21, 2021 13:57

Remove check for AMD HIP to fix Driver/cuda-arch-translation.cu

86b3b71

This check causes regression to cuda-arch-translation after community test changes intel@ec61222#diff-eefa159a6526f756b7260f310595827f33f96fd91a8d336f4d20ad8b72738d17R121

vmaksimo marked this pull request as ready for review July 23, 2021 08:13

vmaksimo requested review from AaronBallman, AGindinson, AlexeySachkov, AlexeySotkin, bader, elizabethandrews, mdtoguchi and premanandrao as code owners July 23, 2021 08:13

romanovvlad merged commit f7c5443 into intel:sycl Jul 23, 2021

npmiller mentioned this pull request Jul 27, 2021

[SYCL][ROCm] Fix building libclc for AMD #4199

Merged

npmiller mentioned this pull request Aug 3, 2021

[SYCL][ROCm] Use offload-arch instead of mcpu for AMD arch #4239

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLVM and SPIRV-LLVM-Translator pulldown (WW30) #4133

LLVM and SPIRV-LLVM-Translator pulldown (WW30) #4133

vmaksimo commented Jul 19, 2021 •

edited

Loading

vmaksimo commented Jul 20, 2021

vmaksimo commented Jul 21, 2021

steffenlarsen commented Jul 21, 2021

steffenlarsen commented Jul 21, 2021

bader commented Jul 21, 2021

vmaksimo commented Jul 22, 2021

malixian commented Jul 28, 2021

LLVM and SPIRV-LLVM-Translator pulldown (WW30) #4133

LLVM and SPIRV-LLVM-Translator pulldown (WW30) #4133

Conversation

vmaksimo commented Jul 19, 2021 • edited Loading

vmaksimo commented Jul 20, 2021

vmaksimo commented Jul 21, 2021

steffenlarsen commented Jul 21, 2021

steffenlarsen commented Jul 21, 2021

bader commented Jul 21, 2021

vmaksimo commented Jul 22, 2021

malixian commented Jul 28, 2021

vmaksimo commented Jul 19, 2021 •

edited

Loading