-
Notifications
You must be signed in to change notification settings - Fork 761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLVM and SPIRV-LLVM-Translator pulldown (WW30) #4133
Conversation
… make it free of global initializer We can build it with -Werror=global-constructors now. This helps in situation where libSupport is embedded as a shared library, potential with dlopen/dlclose scenario, and when command-line parsing or other facilities may not be involved. Avoiding the implicit construction of these cl::opt can avoid double-registration issues and other kind of behavior. Reviewed By: lattner, jpienaar Differential Revision: https://reviews.llvm.org/D105959
This patch handles the `<<` operator defined for `std::unique_ptr` in the std namespace (ignores custom overloads of the operator). Differential Revision: https://reviews.llvm.org/D105421
…the simulation Added information stored in PipelineOptions and the MCSubtargetInfo. Bug: https://bugs.llvm.org/show_bug.cgi?id=51041 Reviewed By: andreadb Differential Revision: https://reviews.llvm.org/D106077
Differential Revision: https://reviews.llvm.org/D106055
This reverts commit 7c63726.
…pport to make it free of global initializer" This reverts commit af93217. Still some specific config broken in some way that requires more investigation.
… make it free of global initializer We can build it with -Werror=global-constructors now. This helps in situation where libSupport is embedded as a shared library, potential with dlopen/dlclose scenario, and when command-line parsing or other facilities may not be involved. Avoiding the implicit construction of these cl::opt can avoid double-registration issues and other kind of behavior. Reviewed By: lattner, jpienaar Differential Revision: https://reviews.llvm.org/D105959
Differential Revision: https://reviews.llvm.org/D105982
…void* This change addresses this assertion that occurs in a downstream compiler with a custom target. ```APInt.h:1151: bool llvm::APInt::operator==(const llvm::APInt &) const: Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"'``` No covering test case is susbmitted with this change since this crash cannot be reproduced using any upstream supported target. The test case that exposes this issue is as simple as: ```lang=c++ void test(int * p) { int * q = p-1; if (q) {} if (q) {} // crash (void)q; } ``` The custom target that exposes this problem supports two address spaces, 16-bit `char`s, and a `_Bool` type that maps to 16-bits. There are no upstream supported targets with similar attributes. The assertion appears to be happening as a result of evaluating the `SymIntExpr` `(reg_$0<int * p>) != 0U` in `VisitSymIntExpr` located in `SimpleSValBuilder.cpp`. The `LHS` is evaluated to `32b` and the `RHS` is evaluated to `16b`. This eventually leads to the assertion in `APInt.h`. While this change addresses the crash and passes LITs, two follow-ups are required: 1) The remainder of `getZeroWithPtrWidth()` and `getIntWithPtrWidth()` should be cleaned up following this model to prevent future confusion. 2) We're not sure why references are found along with the modified code path, that should not be the case. A more principled fix may be found after some further comprehension of why this is the case. Acks: Thanks to @steakhal and @martong for the discussions leading to this fix. Reviewed By: NoQ Differential Revision: https://reviews.llvm.org/D105974
Differential Revision: https://reviews.llvm.org/D106055
Remove duplicate and stale doc comment on affineParallelize. NFC.
This patch teaches the compiler to identify a wider variety of `BUILD_VECTOR`s which form integer arithmetic sequences, and to lower them to `vid.v` with modifications for non-unit steps and non-zero addends. The sequences handled by this optimization must either be monotonically increasing or decreasing. Consecutive elements holding the same value indicate a fractional step which, while simple mathematically, becomes more complex to handle both in the realm of lossy integer division and in the presence of `undef`s. For example, a common "interleaving" shuffle index will be lowered by LLVM to both `<0,u,1,u,2,...>` and `<u,0,u,1,u,...>` `BUILD_VECTOR` nodes. Either of these would ideally be lowered to `vid.v` shifted right by 1. Detection of this sequence in presence of general `undef` values is more complicated, however: `<0,u,u,1,>` could match either `<0,0,0,1,>` or `<0,0,1,1,>` depending on later values in the sequence. Both are possible, so backtracking or multiple passes is inevitable. Sticking to monotonic sequences keeps the logic simpler as it can be done in one pass. Fractional steps will likely be a separate optimization in a future patch. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104921
…id costs." The original patch was: https://reviews.llvm.org/D105806 There were some issues with undeterministic behaviour of the sorting function, which led to scalable-call.ll passing and/or failing. This patch fixes the issue by numbering all instructions in the array first, and using that number as the order, which should provide a consistent ordering. This reverts commit a607f64.
Previously GetMemoryTagManager checked many things in one: * architecture supports memory tagging * process supports memory tagging * memory range isn't inverted * memory range is all tagged Since writing follow up patches for tag writing (in review at the moment) it has become clear that this gets unwieldy once we add the features needed for that. It also implies that the memory tag manager is tied to the range you used to request it with but it is not. It's a per process object. Instead: * GetMemoryTagManager just checks architecture and process. * Then the MemoryTagManager can later be asked to check a memory range. This is better because: * We don't imply that range and manager are tied together. * A slightly diferent range calculation for tag writing doesn't add more code to Process. * Range checking code can now be unit tested. Reviewed By: omjavaid Differential Revision: https://reviews.llvm.org/D105630
This patch adds support for following contiguous load and store instructions: * LD1B, LD1H, LD1W, LD1D, LD1Q * ST1B, ST1H, ST1W, ST1D, ST1Q A new register class and operand is added for the 32-bit vector select register W12-W15. The differences in the following tests which have been re-generated are caused by the introduction of this register class: * llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll * llvm/test/CodeGen/AArch64/GlobalISel/regbank-inlineasm.mir * llvm/test/CodeGen/AArch64/stp-opt-with-renaming-reserved-regs.mir * llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir D88663 attempts to resolve the issue with the store pair test differences in the AArch64 load/store optimizer. The GlobalISel differences are caused by changes in the enum values of register classes, tests have been updated with the new values. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: CarolineConcatto Differential Revision: https://reviews.llvm.org/D105572
This patch returns an Invalid cost from getInstructionCost() for alloca instructions if the VF is scalable, as otherwise loops which contain these instructions will crash when attempting to scalarize the alloca. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D105824
Specifying the latencies of specific LDP variants appears to improve performance almost universally. Differential Revision: https://reviews.llvm.org/D105882
This may be necessary in partial multi-stage conversion when a container type from dialect A containing types from dialect B goes through the conversion where only dialect A is converted to the LLVM dialect. We will need to keep a pointer-to-non-LLVM type in the IR until a further conversion can convert dialect B types to LLVM types. Reviewed By: wsmoses Differential Revision: https://reviews.llvm.org/D106076
https://reviews.llvm.org/D105659 implements ByVal handling in llc but some cases are not compatible with existing XL compiler on AIX. Adding a clang warning for such cases. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D105660
Added isCall for S_CALL_B64; added isBranch for S_SUBVECTOR_LOOP_*. Differential Revision: https://reviews.llvm.org/D106072
…ment type This fixes bug 36064 Differential Revision: https://reviews.llvm.org/D106093
More coverage for D105730
If you attach __attribute__((optnone)) to a function when using optimisations, that function will use fast-isel instead of the usual SelectionDAG method. This is a problem for instruction referencing, because it means DBG_VALUEs of virtual registers will be created, triggering some safety assertions in LiveDebugVariables. Those assertions exist to detect exactly this scenario, where an unexpected piece of code is generating virtual register references in instruction referencing mode. Fix this by transforming the DBG_VALUEs created by fast-isel into half-formed DBG_INSTR_REFs, after which they get patched up in finalizeDebugInstrRefs. The test modified adds a fast-isel mode to the instruction referencing isel test. Differential Revision: https://reviews.llvm.org/D105694
This avoids relying on G_EXTRACT on unusual types, and also properly decomposes structs into multiple registers. This also preserves the LLTs in the memory operands.
The dialect-specific cast between builtin (ex-standard) types and LLVM dialect types was introduced long time before built-in support for unrealized_conversion_cast. It has a similar purpose, but is restricted to compatible builtin and LLVM dialect types, which may hamper progressive lowering and composition with types from other dialects. Replace llvm.mlir.cast with unrealized_conversion_cast, and drop the operation that became unnecessary. Also make unrealized_conversion_cast legal by default in LLVMConversionTarget as the majority of convesions using it are partial conversions that actually want the casts to persist in the IR. The standard-to-llvm conversion, which is still expected to run last, cleans up the remaining casts standard-to-llvm conversion, which is still expected to run last, cleans up the remaining casts Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D105880
/summary:run |
…tribute_parallel_for_simd_codegen failure"
Hi again @steffenlarsen! We fixed build on CUDA, but there appeared a lit-test failure in |
fa4d950
to
f442564
Compare
@vmaksimo Looks to me like "v4" isn't appended to the "hip" part of the triple, which the tests expect. I suspect the problem is ec61222#diff-eefa159a6526f756b7260f310595827f33f96fd91a8d336f4d20ad8b72738d17R121 |
I have tried removing the changes at that line and the test passes. That said, I don't know what the motivation of that change was. |
Tagging @malixian to clarify. |
This check causes regression to cuda-arch-translation after community test changes intel@ec61222#diff-eefa159a6526f756b7260f310595827f33f96fd91a8d336f4d20ad8b72738d17R121
/summary:run |
This file uses `cl_mem_fence_flags`, `CLK_GLOBAL_MEM_FENCE`, `CLK_LOCAL_MEM_FENCE`, which are defined by `clc.h`. It wasn't included because when this file was written the OpenCL headers were implicitely included, this has changed so now including the header is necessary. See the discussion on intel#4133 for more details.
The motivation for this change comes from some of our tests. Through testing, we found that if the compiler does not have the option of I can’t explain this phenomenon yet, and I’m also confused about this problem. This may be related to the version of ROCm or the version of llvm. |
This file uses `cl_mem_fence_flags`, `CLK_GLOBAL_MEM_FENCE`, `CLK_LOCAL_MEM_FENCE`, which are defined by `clc.h`. It wasn't included because when this file was written the OpenCL headers were implicitely included, this has changed so now including the header is necessary. See the discussion on #4133 for more details.
This file uses `cl_mem_fence_flags`, `CLK_GLOBAL_MEM_FENCE`, `CLK_LOCAL_MEM_FENCE`, which are defined by `clc.h`. It wasn't included because when this file was written the OpenCL headers were implicitely included, this has changed so now including the header is necessary. See the discussion on intel#4133 for more details.
This patch changes using `-mcpu` for SYCL applications targeting AMD to `-Xsycl-target-backend --offload-arch`. Before this patch the offloading arch wasn't set correctly for AMD architectures. This is fixing an issue with HIP that was talked about in intel#4133, regarding having `v4` in the hip part of the triple, without the `v4` HIP seems to be ignoring the fact that the offloading arch is missing from the triple, which is why there was a workaround orignally to force not using `v4` with SYCL. By fixing the offloading arch this patch fixes the issue properly and now the triple with `v4` works because it also contains the offloading architecture.
This patch changes using `-mcpu` for SYCL applications targeting AMD to `-Xsycl-target-backend --offload-arch`. Before this patch the offloading arch wasn't set correctly for AMD architectures. This is fixing an issue with HIP that was talked about in intel#4133, regarding having `v4` in the hip part of the triple, without the `v4` HIP seems to be ignoring the fact that the offloading arch is missing from the triple, which is why there was a workaround orignally to force not using `v4` with SYCL. By fixing the offloading arch this patch fixes the issue properly and now the triple with `v4` works because it also contains the offloading architecture.
This patch changes using `-mcpu` for SYCL applications targeting AMD to `-Xsycl-target-backend --offload-arch`. Before this patch the offloading arch wasn't set correctly for AMD architectures. This is fixing an issue with HIP that was talked about in #4133, regarding having `v4` in the hip part of the triple, without the `v4` HIP seems to be ignoring the fact that the offloading arch is missing from the triple, which is why there was a workaround orignally to force not using `v4` with SYCL. By fixing the offloading arch this patch fixes the issue properly and now the triple with `v4` works because it also contains the offloading architecture. Co-authored-by: Artem Gindinson <artem.gindinson@intel.com> Co-authored-by: Victor Lomuller <victor@codeplay.com>
LLVM: llvm/llvm-project@fb0cf01
SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@9e234d9