Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV][VLOPT] Enable the RISCVVLOptimizer by default #119461

Merged
merged 2 commits into from
Dec 17, 2024

Conversation

michaelmaitland
Copy link
Contributor

@michaelmaitland michaelmaitland commented Dec 10, 2024

Now that we have testing of all instructions in the isSupportedInstr switch, and better coverage of getOperandInfo, I think it is a good time to enable this by default.

@llvmbot
Copy link
Member

llvmbot commented Dec 10, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Michael Maitland (michaelmaitland)

Changes

Now that we have testing of all instructions in the isSupportedInstr switch, and better coverage of getOperandInfo, I think it is a good time to enable this by default.

I'd like for #112231 and #119416 to land before this patch, so it'd be great for anyone reviewing this to check those out first.


Patch is 81.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/119461.diff

34 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVTargetMachine.cpp (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/O3-pipeline.ll (+2-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/ctlz-vp.ll (+2-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-abs.ll (-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp.ll (+4-16)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll (+2-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-shuffles.ll (+6-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int.ll (+3-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-changes-length.ll (+2-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vdiv-vp.ll (-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vdivu-vp.ll (+1-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfma-vp.ll (+72-76)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.ll (+16-30)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vl-opt.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmax-vp.ll (-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmaxu-vp.ll (+1-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmin-vp.ll (-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vminu-vp.ll (+1-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmul-vp.ll (+3-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vrem-vp.ll (-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vremu-vp.ll (+1-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vsadd-vp.ll (+1-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vsaddu-vp.ll (+1-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll (-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vshl-vp.ll (+1-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vsitofp-vp.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vsra-sdnode.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vsra-vp.ll (-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vsrl-vp.ll (+1-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vssub-vp.ll (+2-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vssubu-vp.ll (+2-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vuitofp-vp.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vwsll-vp.ll (+30-60)
  • (modified) llvm/test/CodeGen/RISCV/srem-seteq-illegal-types.ll (+4-3)
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index dcd3598f658f6a..c507ab3f4f3885 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -112,7 +112,7 @@ static cl::opt<bool> EnablePostMISchedLoadStoreClustering(
 static cl::opt<bool>
     EnableVLOptimizer("riscv-enable-vl-optimizer",
                       cl::desc("Enable the RISC-V VL Optimizer pass"),
-                      cl::init(false), cl::Hidden);
+                      cl::init(true), cl::Hidden);
 
 static cl::opt<bool> DisableVectorMaskMutation(
     "riscv-disable-vector-mask-mutation",
diff --git a/llvm/test/CodeGen/RISCV/O3-pipeline.ll b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
index 8fd9ae98503665..b0c756e26985bb 100644
--- a/llvm/test/CodeGen/RISCV/O3-pipeline.ll
+++ b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
@@ -119,6 +119,8 @@
 ; RV64-NEXT:        RISC-V Optimize W Instructions
 ; CHECK-NEXT:       RISC-V Pre-RA pseudo instruction expansion pass
 ; CHECK-NEXT:       RISC-V Merge Base Offset
+; CHECK-NEXT:       MachineDominator Tree Construction
+; CHECK-NEXT:       RISC-V VL Optimizer
 ; CHECK-NEXT:       RISC-V Insert Read/Write CSR Pass
 ; CHECK-NEXT:       RISC-V Insert Write VXRM Pass
 ; CHECK-NEXT:       RISC-V Landing Pad Setup
@@ -129,7 +131,6 @@
 ; CHECK-NEXT:       Live Variable Analysis
 ; CHECK-NEXT:       Eliminate PHI nodes for register allocation
 ; CHECK-NEXT:       Two-Address instruction pass
-; CHECK-NEXT:       MachineDominator Tree Construction
 ; CHECK-NEXT:       Slot index numbering
 ; CHECK-NEXT:       Live Interval Analysis
 ; CHECK-NEXT:       Register Coalescer
diff --git a/llvm/test/CodeGen/RISCV/rvv/ctlz-vp.ll b/llvm/test/CodeGen/RISCV/rvv/ctlz-vp.ll
index ce4bc48dff0426..6f515996677ee6 100644
--- a/llvm/test/CodeGen/RISCV/rvv/ctlz-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/ctlz-vp.ll
@@ -2654,9 +2654,8 @@ define <vscale x 1 x i9> @vp_ctlo_zero_undef_nxv1i9(<vscale x 1 x i9> %va, <vsca
 ; CHECK-LABEL: vp_ctlo_zero_undef_nxv1i9:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    li a1, 511
-; CHECK-NEXT:    vsetvli a2, zero, e16, mf4, ta, ma
-; CHECK-NEXT:    vxor.vx v8, v8, a1
 ; CHECK-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
+; CHECK-NEXT:    vxor.vx v8, v8, a1
 ; CHECK-NEXT:    vsll.vi v8, v8, 7, v0.t
 ; CHECK-NEXT:    vfwcvt.f.xu.v v9, v8, v0.t
 ; CHECK-NEXT:    vsetvli zero, zero, e32, mf2, ta, ma
@@ -2670,9 +2669,8 @@ define <vscale x 1 x i9> @vp_ctlo_zero_undef_nxv1i9(<vscale x 1 x i9> %va, <vsca
 ; CHECK-ZVBB-LABEL: vp_ctlo_zero_undef_nxv1i9:
 ; CHECK-ZVBB:       # %bb.0:
 ; CHECK-ZVBB-NEXT:    li a1, 511
-; CHECK-ZVBB-NEXT:    vsetvli a2, zero, e16, mf4, ta, ma
-; CHECK-ZVBB-NEXT:    vxor.vx v8, v8, a1
 ; CHECK-ZVBB-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
+; CHECK-ZVBB-NEXT:    vxor.vx v8, v8, a1
 ; CHECK-ZVBB-NEXT:    vsll.vi v8, v8, 7, v0.t
 ; CHECK-ZVBB-NEXT:    vclz.v v8, v8, v0.t
 ; CHECK-ZVBB-NEXT:    ret
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-abs.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-abs.ll
index ac7d3d9109e39c..3153b44386d7ae 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-abs.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-abs.ll
@@ -39,9 +39,7 @@ define void @abs_v6i16(ptr %x) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 6, e16, m1, ta, ma
 ; CHECK-NEXT:    vle16.v v8, (a0)
-; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
 ; CHECK-NEXT:    vrsub.vi v9, v8, 0
-; CHECK-NEXT:    vsetivli zero, 6, e16, m1, ta, ma
 ; CHECK-NEXT:    vmax.vv v8, v8, v9
 ; CHECK-NEXT:    vse16.v v8, (a0)
 ; CHECK-NEXT:    ret
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp.ll
index 36bbec12e9b06c..15793eaada0783 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp.ll
@@ -788,11 +788,9 @@ define void @copysign_v6bf16(ptr %x, ptr %y) {
 ; CHECK-NEXT:    vle16.v v8, (a1)
 ; CHECK-NEXT:    vle16.v v9, (a0)
 ; CHECK-NEXT:    lui a1, 8
-; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
 ; CHECK-NEXT:    vand.vx v8, v8, a1
 ; CHECK-NEXT:    addi a1, a1, -1
 ; CHECK-NEXT:    vand.vx v9, v9, a1
-; CHECK-NEXT:    vsetivli zero, 6, e16, m1, ta, ma
 ; CHECK-NEXT:    vor.vv v8, v9, v8
 ; CHECK-NEXT:    vse16.v v8, (a0)
 ; CHECK-NEXT:    ret
@@ -848,11 +846,9 @@ define void @copysign_v6f16(ptr %x, ptr %y) {
 ; ZVFHMIN-NEXT:    vle16.v v8, (a1)
 ; ZVFHMIN-NEXT:    vle16.v v9, (a0)
 ; ZVFHMIN-NEXT:    lui a1, 8
-; ZVFHMIN-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
 ; ZVFHMIN-NEXT:    vand.vx v8, v8, a1
 ; ZVFHMIN-NEXT:    addi a1, a1, -1
 ; ZVFHMIN-NEXT:    vand.vx v9, v9, a1
-; ZVFHMIN-NEXT:    vsetivli zero, 6, e16, m1, ta, ma
 ; ZVFHMIN-NEXT:    vor.vv v8, v9, v8
 ; ZVFHMIN-NEXT:    vse16.v v8, (a0)
 ; ZVFHMIN-NEXT:    ret
@@ -924,12 +920,10 @@ define void @copysign_vf_v6bf16(ptr %x, bfloat %y) {
 ; CHECK-NEXT:    vsetivli zero, 6, e16, m1, ta, ma
 ; CHECK-NEXT:    vle16.v v8, (a0)
 ; CHECK-NEXT:    lui a2, 8
-; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
 ; CHECK-NEXT:    vmv.v.x v9, a1
 ; CHECK-NEXT:    addi a1, a2, -1
 ; CHECK-NEXT:    vand.vx v8, v8, a1
 ; CHECK-NEXT:    vand.vx v9, v9, a2
-; CHECK-NEXT:    vsetivli zero, 6, e16, m1, ta, ma
 ; CHECK-NEXT:    vor.vv v8, v8, v9
 ; CHECK-NEXT:    vse16.v v8, (a0)
 ; CHECK-NEXT:    ret
@@ -986,12 +980,10 @@ define void @copysign_vf_v6f16(ptr %x, half %y) {
 ; ZVFHMIN-NEXT:    vsetivli zero, 6, e16, m1, ta, ma
 ; ZVFHMIN-NEXT:    vle16.v v8, (a0)
 ; ZVFHMIN-NEXT:    lui a2, 8
-; ZVFHMIN-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
 ; ZVFHMIN-NEXT:    vmv.v.x v9, a1
 ; ZVFHMIN-NEXT:    addi a1, a2, -1
 ; ZVFHMIN-NEXT:    vand.vx v8, v8, a1
 ; ZVFHMIN-NEXT:    vand.vx v9, v9, a2
-; ZVFHMIN-NEXT:    vsetivli zero, 6, e16, m1, ta, ma
 ; ZVFHMIN-NEXT:    vor.vv v8, v8, v9
 ; ZVFHMIN-NEXT:    vse16.v v8, (a0)
 ; ZVFHMIN-NEXT:    ret
@@ -1065,11 +1057,9 @@ define void @copysign_neg_v6bf16(ptr %x, ptr %y) {
 ; CHECK-NEXT:    vle16.v v9, (a0)
 ; CHECK-NEXT:    lui a1, 8
 ; CHECK-NEXT:    addi a2, a1, -1
-; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
 ; CHECK-NEXT:    vxor.vx v8, v8, a1
 ; CHECK-NEXT:    vand.vx v9, v9, a2
 ; CHECK-NEXT:    vand.vx v8, v8, a1
-; CHECK-NEXT:    vsetivli zero, 6, e16, m1, ta, ma
 ; CHECK-NEXT:    vor.vv v8, v9, v8
 ; CHECK-NEXT:    vse16.v v8, (a0)
 ; CHECK-NEXT:    ret
@@ -1129,11 +1119,9 @@ define void @copysign_neg_v6f16(ptr %x, ptr %y) {
 ; ZVFHMIN-NEXT:    vle16.v v9, (a0)
 ; ZVFHMIN-NEXT:    lui a1, 8
 ; ZVFHMIN-NEXT:    addi a2, a1, -1
-; ZVFHMIN-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
 ; ZVFHMIN-NEXT:    vxor.vx v8, v8, a1
 ; ZVFHMIN-NEXT:    vand.vx v9, v9, a2
 ; ZVFHMIN-NEXT:    vand.vx v8, v8, a1
-; ZVFHMIN-NEXT:    vsetivli zero, 6, e16, m1, ta, ma
 ; ZVFHMIN-NEXT:    vor.vv v8, v9, v8
 ; ZVFHMIN-NEXT:    vse16.v v8, (a0)
 ; ZVFHMIN-NEXT:    ret
@@ -1211,12 +1199,12 @@ define void @copysign_neg_trunc_v3bf16_v3f32(ptr %x, ptr %y) {
 ; CHECK-NEXT:    vle32.v v9, (a1)
 ; CHECK-NEXT:    lui a1, 8
 ; CHECK-NEXT:    addi a2, a1, -1
-; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
 ; CHECK-NEXT:    vand.vx v8, v8, a2
+; CHECK-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
 ; CHECK-NEXT:    vfncvtbf16.f.f.w v10, v9
+; CHECK-NEXT:    vsetivli zero, 3, e16, mf2, ta, ma
 ; CHECK-NEXT:    vxor.vx v9, v10, a1
 ; CHECK-NEXT:    vand.vx v9, v9, a1
-; CHECK-NEXT:    vsetivli zero, 3, e16, mf2, ta, ma
 ; CHECK-NEXT:    vor.vv v8, v8, v9
 ; CHECK-NEXT:    vse16.v v8, (a0)
 ; CHECK-NEXT:    ret
@@ -1283,12 +1271,12 @@ define void @copysign_neg_trunc_v3f16_v3f32(ptr %x, ptr %y) {
 ; ZVFHMIN-NEXT:    vle32.v v9, (a1)
 ; ZVFHMIN-NEXT:    lui a1, 8
 ; ZVFHMIN-NEXT:    addi a2, a1, -1
-; ZVFHMIN-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
 ; ZVFHMIN-NEXT:    vand.vx v8, v8, a2
+; ZVFHMIN-NEXT:    vsetivli zero, 4, e16, mf2, ta, ma
 ; ZVFHMIN-NEXT:    vfncvt.f.f.w v10, v9
+; ZVFHMIN-NEXT:    vsetivli zero, 3, e16, mf2, ta, ma
 ; ZVFHMIN-NEXT:    vxor.vx v9, v10, a1
 ; ZVFHMIN-NEXT:    vand.vx v9, v9, a1
-; ZVFHMIN-NEXT:    vsetivli zero, 3, e16, mf2, ta, ma
 ; ZVFHMIN-NEXT:    vor.vv v8, v8, v9
 ; ZVFHMIN-NEXT:    vse16.v v8, (a0)
 ; ZVFHMIN-NEXT:    ret
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
index e9fd0a19e3eb66..276b5401a902a4 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
@@ -910,8 +910,9 @@ define <4 x i8> @buildvec_not_vid_v4i8_2() {
 define <16 x i8> @buildvec_not_vid_v16i8() {
 ; CHECK-LABEL: buildvec_not_vid_v16i8:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 16, e8, m1, ta, ma
+; CHECK-NEXT:    vsetivli zero, 7, e8, m1, ta, ma
 ; CHECK-NEXT:    vmv.v.i v9, 3
+; CHECK-NEXT:    vsetivli zero, 16, e8, m1, ta, ma
 ; CHECK-NEXT:    vmv.v.i v8, 0
 ; CHECK-NEXT:    vsetivli zero, 7, e8, m1, tu, ma
 ; CHECK-NEXT:    vslideup.vi v8, v9, 6
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-shuffles.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-shuffles.ll
index 1c6e1a37fa8af5..a8e12dfaa82e9c 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-shuffles.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-shuffles.ll
@@ -348,8 +348,9 @@ define <8 x i8> @splat_ve4_ins_i0ve2(<8 x i8> %v) {
 define <8 x i8> @splat_ve4_ins_i1ve3(<8 x i8> %v) {
 ; CHECK-LABEL: splat_ve4_ins_i1ve3:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-NEXT:    vsetivli zero, 2, e8, mf2, ta, ma
 ; CHECK-NEXT:    vmv.v.i v9, 3
+; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
 ; CHECK-NEXT:    vmv.v.i v10, 4
 ; CHECK-NEXT:    vsetivli zero, 2, e8, mf2, tu, ma
 ; CHECK-NEXT:    vslideup.vi v10, v9, 1
@@ -432,8 +433,9 @@ define <8 x i8> @splat_ve2_we0_ins_i2ve4(<8 x i8> %v, <8 x i8> %w) {
 define <8 x i8> @splat_ve2_we0_ins_i2we4(<8 x i8> %v, <8 x i8> %w) {
 ; CHECK-LABEL: splat_ve2_we0_ins_i2we4:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-NEXT:    vsetivli zero, 3, e8, mf2, ta, ma
 ; CHECK-NEXT:    vmv.v.i v10, 4
+; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
 ; CHECK-NEXT:    vmv.v.i v11, 0
 ; CHECK-NEXT:    li a0, 70
 ; CHECK-NEXT:    vsetivli zero, 3, e8, mf2, tu, ma
@@ -451,8 +453,9 @@ define <8 x i8> @splat_ve2_we0_ins_i2we4(<8 x i8> %v, <8 x i8> %w) {
 define <8 x i8> @splat_ve2_we0_ins_i2ve4_i5we6(<8 x i8> %v, <8 x i8> %w) {
 ; CHECK-LABEL: splat_ve2_we0_ins_i2ve4_i5we6:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-NEXT:    vsetivli zero, 6, e8, mf2, ta, ma
 ; CHECK-NEXT:    vmv.v.i v10, 6
+; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
 ; CHECK-NEXT:    vmv.v.i v11, 0
 ; CHECK-NEXT:    lui a0, 8256
 ; CHECK-NEXT:    addi a0, a0, 2
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int.ll
index cba8de82ec41b9..59c7feb53ce94e 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int.ll
@@ -1100,15 +1100,17 @@ define void @mulhu_v8i16(ptr %x) {
 ; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
 ; CHECK-NEXT:    vle16.v v8, (a0)
 ; CHECK-NEXT:    vmv.v.i v9, 0
+; CHECK-NEXT:    vsetivli zero, 7, e16, m1, ta, ma
 ; CHECK-NEXT:    vmv.v.i v10, 1
 ; CHECK-NEXT:    li a1, 33
 ; CHECK-NEXT:    vmv.s.x v0, a1
 ; CHECK-NEXT:    lui a1, %hi(.LCPI66_0)
 ; CHECK-NEXT:    addi a1, a1, %lo(.LCPI66_0)
+; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
 ; CHECK-NEXT:    vmv.v.i v11, 3
 ; CHECK-NEXT:    vle16.v v12, (a1)
 ; CHECK-NEXT:    vmerge.vim v11, v11, 2, v0
-; CHECK-NEXT:    vmv.v.i v13, 0
+; CHECK-NEXT:    vmv1r.v v13, v9
 ; CHECK-NEXT:    vsetivli zero, 7, e16, m1, tu, ma
 ; CHECK-NEXT:    vslideup.vi v9, v10, 6
 ; CHECK-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-changes-length.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-changes-length.ll
index 66f95b70776720..abbbfe8f252fb2 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-changes-length.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-changes-length.ll
@@ -97,8 +97,9 @@ define <4 x i32> @v4i32_v8i32(<8 x i32>) {
 define <4 x i32> @v4i32_v16i32(<16 x i32>) {
 ; RV32-LABEL: v4i32_v16i32:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
+; RV32-NEXT:    vsetivli zero, 2, e16, m1, ta, ma
 ; RV32-NEXT:    vmv.v.i v12, 1
+; RV32-NEXT:    vsetivli zero, 8, e16, m1, ta, ma
 ; RV32-NEXT:    vmv.v.i v14, 6
 ; RV32-NEXT:    li a0, 32
 ; RV32-NEXT:    vmv.v.i v0, 10
diff --git a/llvm/test/CodeGen/RISCV/rvv/vdiv-vp.ll b/llvm/test/CodeGen/RISCV/rvv/vdiv-vp.ll
index c7b5200979370e..2814be2792de9a 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vdiv-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vdiv-vp.ll
@@ -11,9 +11,7 @@ define <vscale x 8 x i7> @vdiv_vx_nxv8i7(<vscale x 8 x i7> %a, i7 signext %b, <v
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli zero, a1, e8, m1, ta, ma
 ; CHECK-NEXT:    vsll.vi v8, v8, 1, v0.t
-; CHECK-NEXT:    vsetvli a2, zero, e8, m1, ta, ma
 ; CHECK-NEXT:    vmv.v.x v9, a0
-; CHECK-NEXT:    vsetvli zero, a1, e8, m1, ta, ma
 ; CHECK-NEXT:    vsra.vi v8, v8, 1, v0.t
 ; CHECK-NEXT:    vsll.vi v9, v9, 1, v0.t
 ; CHECK-NEXT:    vsra.vi v9, v9, 1, v0.t
diff --git a/llvm/test/CodeGen/RISCV/rvv/vdivu-vp.ll b/llvm/test/CodeGen/RISCV/rvv/vdivu-vp.ll
index 850ad863dd384e..3e913d4f682ed4 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vdivu-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vdivu-vp.ll
@@ -10,9 +10,8 @@ define <vscale x 8 x i7> @vdivu_vx_nxv8i7(<vscale x 8 x i7> %a, i7 signext %b, <
 ; CHECK-LABEL: vdivu_vx_nxv8i7:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    li a2, 127
-; CHECK-NEXT:    vsetvli a3, zero, e8, m1, ta, ma
-; CHECK-NEXT:    vmv.v.x v9, a0
 ; CHECK-NEXT:    vsetvli zero, a1, e8, m1, ta, ma
+; CHECK-NEXT:    vmv.v.x v9, a0
 ; CHECK-NEXT:    vand.vx v8, v8, a2, v0.t
 ; CHECK-NEXT:    vand.vx v9, v9, a2, v0.t
 ; CHECK-NEXT:    vdivu.vv v8, v8, v9, v0.t
diff --git a/llvm/test/CodeGen/RISCV/rvv/vfma-vp.ll b/llvm/test/CodeGen/RISCV/rvv/vfma-vp.ll
index 7ca1983e8b32c0..ab67e9833c78aa 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vfma-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vfma-vp.ll
@@ -4301,10 +4301,9 @@ define <vscale x 1 x half> @vfnmadd_vf_nxv1f16_neg_splat(<vscale x 1 x half> %va
 ; ZVFHMIN-LABEL: vfnmadd_vf_nxv1f16_neg_splat:
 ; ZVFHMIN:       # %bb.0:
 ; ZVFHMIN-NEXT:    fmv.x.h a1, fa0
-; ZVFHMIN-NEXT:    vsetvli a2, zero, e16, mf4, ta, ma
+; ZVFHMIN-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
 ; ZVFHMIN-NEXT:    vmv.v.x v10, a1
 ; ZVFHMIN-NEXT:    lui a1, 8
-; ZVFHMIN-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
 ; ZVFHMIN-NEXT:    vxor.vx v10, v10, a1, v0.t
 ; ZVFHMIN-NEXT:    vxor.vx v9, v9, a1, v0.t
 ; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, mf4, ta, ma
@@ -4334,10 +4333,9 @@ define <vscale x 1 x half> @vfnmadd_vf_nxv1f16_neg_splat_commute(<vscale x 1 x h
 ; ZVFHMIN-LABEL: vfnmadd_vf_nxv1f16_neg_splat_commute:
 ; ZVFHMIN:       # %bb.0:
 ; ZVFHMIN-NEXT:    fmv.x.h a1, fa0
-; ZVFHMIN-NEXT:    vsetvli a2, zero, e16, mf4, ta, ma
+; ZVFHMIN-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
 ; ZVFHMIN-NEXT:    vmv.v.x v10, a1
 ; ZVFHMIN-NEXT:    lui a1, 8
-; ZVFHMIN-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
 ; ZVFHMIN-NEXT:    vxor.vx v10, v10, a1, v0.t
 ; ZVFHMIN-NEXT:    vxor.vx v9, v9, a1, v0.t
 ; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, mf4, ta, ma
@@ -4367,10 +4365,9 @@ define <vscale x 1 x half> @vfnmadd_vf_nxv1f16_neg_splat_unmasked(<vscale x 1 x
 ; ZVFHMIN-LABEL: vfnmadd_vf_nxv1f16_neg_splat_unmasked:
 ; ZVFHMIN:       # %bb.0:
 ; ZVFHMIN-NEXT:    fmv.x.h a1, fa0
-; ZVFHMIN-NEXT:    vsetvli a2, zero, e16, mf4, ta, ma
+; ZVFHMIN-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
 ; ZVFHMIN-NEXT:    vmv.v.x v10, a1
 ; ZVFHMIN-NEXT:    lui a1, 8
-; ZVFHMIN-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
 ; ZVFHMIN-NEXT:    vxor.vx v9, v9, a1
 ; ZVFHMIN-NEXT:    vxor.vx v10, v10, a1
 ; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, mf4, ta, ma
@@ -4400,10 +4397,9 @@ define <vscale x 1 x half> @vfnmadd_vf_nxv1f16_neg_splat_unmasked_commute(<vscal
 ; ZVFHMIN-LABEL: vfnmadd_vf_nxv1f16_neg_splat_unmasked_commute:
 ; ZVFHMIN:       # %bb.0:
 ; ZVFHMIN-NEXT:    fmv.x.h a1, fa0
-; ZVFHMIN-NEXT:    vsetvli a2, zero, e16, mf4, ta, ma
+; ZVFHMIN-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
 ; ZVFHMIN-NEXT:    vmv.v.x v10, a1
 ; ZVFHMIN-NEXT:    lui a1, 8
-; ZVFHMIN-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
 ; ZVFHMIN-NEXT:    vxor.vx v9, v9, a1
 ; ZVFHMIN-NEXT:    vxor.vx v10, v10, a1
 ; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, mf4, ta, ma
@@ -4670,9 +4666,10 @@ define <vscale x 1 x half> @vfnmsub_vf_nxv1f16_neg_splat(<vscale x 1 x half> %va
 ; ZVFHMIN-LABEL: vfnmsub_vf_nxv1f16_neg_splat:
 ; ZVFHMIN:       # %bb.0:
 ; ZVFHMIN-NEXT:    fmv.x.h a1, fa0
-; ZVFHMIN-NEXT:    vsetvli a2, zero, e16, mf4, ta, ma
+; ZVFHMIN-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
 ; ZVFHMIN-NEXT:    vmv.v.x v10, a1
 ; ZVFHMIN-NEXT:    lui a1, 8
+; ZVFHMIN-NEXT:    vsetvli a2, zero, e16, mf4, ta, ma
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v11, v9
 ; ZVFHMIN-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
 ; ZVFHMIN-NEXT:    vxor.vx v9, v10, a1, v0.t
@@ -4701,9 +4698,10 @@ define <vscale x 1 x half> @vfnmsub_vf_nxv1f16_neg_splat_commute(<vscale x 1 x h
 ; ZVFHMIN-LABEL: vfnmsub_vf_nxv1f16_neg_splat_commute:
 ; ZVFHMIN:       # %bb.0:
 ; ZVFHMIN-NEXT:    fmv.x.h a1, fa0
-; ZVFHMIN-NEXT:    vsetvli a2, zero, e16, mf4, ta, ma
+; ZVFHMIN-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
 ; ZVFHMIN-NEXT:    vmv.v.x v10, a1
 ; ZVFHMIN-NEXT:    lui a1, 8
+; ZVFHMIN-NEXT:    vsetvli a2, zero, e16, mf4, ta, ma
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v11, v9
 ; ZVFHMIN-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
 ; ZVFHMIN-NEXT:    vxor.vx v9, v10, a1, v0.t
@@ -4732,9 +4730,10 @@ define <vscale x 1 x half> @vfnmsub_vf_nxv1f16_neg_splat_unmasked(<vscale x 1 x
 ; ZVFHMIN-LABEL: vfnmsub_vf_nxv1f16_neg_splat_unmasked:
 ; ZVFHMIN:       # %bb.0:
 ; ZVFHMIN-NEXT:    fmv.x.h a1, fa0
-; ZVFHMIN-NEXT:    vsetvli a2, zero, e16, mf4, ta, ma
+; ZVFHMIN-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
 ; ZVFHMIN-NEXT:    vmv.v.x v10, a1
 ; ZVFHMIN-NEXT:    lui a1, 8
+; ZVFHMIN-NEXT:    vsetvli a2, zero, e16, mf4, ta, ma
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v11, v9
 ; ZVFHMIN-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
 ; ZVFHMIN-NEXT:    vxor.vx v9, v10, a1
@@ -4763,9 +4762,10 @@ define <vscale x 1 x half> @vfnmsub_vf_nxv1f16_neg_splat_unmasked_commute(<vscal
 ; ZVFHMIN-LABEL: vfnmsub_vf_nxv1f16_neg_splat_unmasked_commute:
 ; ZVFHMIN:       # %bb.0:
 ; ZVFHMIN-NEXT:    fmv.x.h a1, fa0
-; ZVFHMIN-NEXT:    vsetvli a2, zero, e16, mf4, ta, ma
+; ZVFHMIN-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
 ; ZVFHMIN-NEXT:    vmv.v.x v10, a1
 ; ZVFHMIN-NEXT:    lui a1, 8
+; ZVFHMIN-NEXT:    vsetvli a2, zero, e16, mf4, ta, ma
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v11, v9
 ; ZVFHMIN-NEXT:    vsetvli zero, a0, e16, mf4, ta, ma
 ; ZVFHMIN-NEXT:    vxor.vx v9, v10, a1
@@ -5220,10 +5220,9 @@ define <vscale x 2 x half> @vfnmadd_vf_nxv2f16_neg_splat(<vscale x 2 x half> %va
 ; ZVFHMIN-LABEL: vfnmadd_vf_nxv2f16_neg_splat:
 ; ZVFHMIN:       # %bb.0:
 ; ZVFHMIN-NEXT:    fmv.x.h a1, fa0
-; ZVFHMIN-NEXT:    vsetvli a2, zero, e16, mf2, ta, ma
+; ZVFHMIN-NEXT:    vsetvli zero, a0, e16, mf2, ta, ma
 ; ZVFHMIN-NEXT:    vmv.v.x v10, a1
 ; ZVFHMIN-NEXT:    lui a1, 8
-; ZVFHMIN-NEXT:    vsetvli zero, a0, e16, mf2, ta, ma
 ; ZVFHMIN-NEXT:    vxor.vx v10, v10, a1, v0.t
 ; ZVFHMIN-NEXT:    vxor.vx v9, v9, a1, v0.t
 ; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, mf2, ta, ma
@@ -5253,10 +5252,9 @@ define <vscale x 2 x half> @vfnmadd_vf_nxv2f16_neg_splat_commute(<vscale x 2 x h
 ; ZVFHMIN-LABEL: vfnmadd_vf_nxv2f16_neg_splat_commute:
 ; ZVFHMIN:       # %bb.0:
 ; ZVFHMIN-NEXT:    fmv.x.h a1, fa0
-; ZVFHMIN-NEXT:    vsetvli a2, zero, e16, mf2, ta, ma
+; ZVFHMIN-NEXT:    vsetvli zero, a0, e16, mf2, ta, ma
 ; ZVFHMIN-NEXT:    vmv.v.x v10, a1
 ; ZVFHMIN-NEXT:    lui a1, 8
-; ZVFHMIN-NEXT:    vsetvli zero, a0, e16, mf2, ta, ma
 ; ZVFHMIN-NEXT:    vxor.vx v10, v10, a1, v0.t
 ; ZVFHMIN-NEXT:    vxor.vx v9, v9, a1, v0.t
 ; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, mf2, ta, ma
@@ -5286,10 +5284,9 @@ define <vscale x 2 x half> @vfnmadd_vf_nxv2f16_neg_splat_unmasked(<vscale x 2 x
 ; ZVFHMIN-LABEL: vfnmadd_vf_nxv2f16_neg_splat_unmasked:
 ; ZVFHMI...
[truncated]

@michaelmaitland
Copy link
Contributor Author

ping

Now that we have testing of all instructions in the isSupportedInstr switch,
and better coverage of getOperandInfo, I think it is a good time to enable this
by default.

I'd like for llvm#112231 and
llvm#119416 to land before this patch,
so it'd be great for anyone reviewing this to check those out first.
@lukel97
Copy link
Contributor

lukel97 commented Dec 17, 2024

Have you had a chance to kick the tires with this on llvm-test-suite or SPEC?

@michaelmaitland
Copy link
Contributor Author

Have you had a chance to kick the tires with this on llvm-test-suite or SPEC?

I had a chance to run this on spec2006/int/train and spec2017/int/train on qemu. Both build and terminate without errors.

@preames
Copy link
Collaborator

preames commented Dec 17, 2024

Reading through the code, I spotted one potential correctness issue. This is a cornercase, but probably still worth fixing.

Imagine you have the following:
%v = VADD_VV ...
%s = VREDSUM w/ %v as scalar source
%dead = VADD_VV %v, %v w/ VL=0

The last instruction is dead - it can be folded to it's passthru. (In practice, it probably will have been folded, but it's possible something could slip through to here.) However, when scaning the users of %v, we will decide that the correct VL for %v is 0 (or a register which might be zero), and reduce it below the minimum VL=1 required by the reduction.

To fix this, I believe you need to treat the CommonVL for the scalar operand case as being VL=1. You could also track a non-zero state instead.

Other than that, looks good to me. Once you've fixed this issue, happy to approve.

@lukel97
Copy link
Contributor

lukel97 commented Dec 17, 2024

Have you had a chance to kick the tires with this on llvm-test-suite or SPEC?

I had a chance to run this on spec2006/int/train and spec2017/int/train on qemu. Both build and terminate without errors.

Nice, this also just came to mind but did you run it with the rvv_ta_all_1s=1 option set? I'm thinking if there were any potential miscompiles then this would probably be needed to catch them

@michaelmaitland
Copy link
Contributor Author

Have you had a chance to kick the tires with this on llvm-test-suite or SPEC?

I had a chance to run this on spec2006/int/train and spec2017/int/train on qemu. Both build and terminate without errors.

Nice, this also just came to mind but did you run it with the rvv_ta_all_1s=1 option set? I'm thinking if there were any potential miscompiles then this would probably be needed to catch them

Yes

@topperc
Copy link
Collaborator

topperc commented Dec 17, 2024

Reading through the code, I spotted one potential correctness issue. This is a cornercase, but probably still worth fixing.

Imagine you have the following: %v = VADD_VV ... %s = VREDSUM w/ %v as scalar source %dead = VADD_VV %v, %v w/ VL=0

The last instruction is dead - it can be folded to it's passthru. (In practice, it probably will have been folded, but it's possible something could slip through to here.) However, when scaning the users of %v, we will decide that the correct VL for %v is 0 (or a register which might be zero), and reduce it below the minimum VL=1 required by the reduction.

To fix this, I believe you need to treat the CommonVL for the scalar operand case as being VL=1. You could also track a non-zero state instead.

Other than that, looks good to me. Once you've fixed this issue, happy to approve.

Should we just remove this code for now. There are no directed tests for it.

    // Instructions like reductions may use a vector register as a scalar
    // register. In this case, we should treat it like a scalar register which
    // does not impact the decision on whether to optimize VL.
    if (isVectorOpUsedAsScalarOp(UserOp)) {
      [[maybe_unused]] Register R = UserOp.getReg();
      [[maybe_unused]] const TargetRegisterClass *RC = MRI->getRegClass(R);
      assert(RISCV::VRRegClass.hasSubClassEq(RC) &&
             "Expect LMUL 1 register class for vector as scalar operands!");
      LLVM_DEBUG(dbgs() << "    Use this operand as a scalar operand\n");
      continue;
    }

@preames
Copy link
Collaborator

preames commented Dec 17, 2024

Should we just remove this code for now

This would be fine by me. Incrementalism is good. :)

@michaelmaitland
Copy link
Contributor Author

Should we just remove this code for now

This would be fine by me. Incrementalism is good. :)

I've removed it in #120291.

FWIW, I don't think what you're concerned about can happen with or without #120291 merged since all the instructions that isVectorOpUsedAsScalarOp deal with return OperandInfo(Unknown) and won't lead to any (incorrect) optimization.

@topperc
Copy link
Collaborator

topperc commented Dec 17, 2024

Should we just remove this code for now

This would be fine by me. Incrementalism is good. :)

I've removed it in #120291.

FWIW, I don't think what you're concerned about can happen with or without #120291 merged since all the instructions that isVectorOpUsedAsScalarOp deal with return OperandInfo(Unknown) and won't lead to any (incorrect) optimization.

The code that was there said that we could ignore the reduction and not call getOperandInfo on it. So it doesn't matter that the reduction is missing from getOperandInfo. The code effectively said that a scalar operand doesn't depend on any elements from the producer. This is incorrect, it demands exactly 1 element. With that code in place only the VL of the other consumers was used. If they used less than 1 element then the 1 element that scalar op needs wouldn't be valid.

@michaelmaitland
Copy link
Contributor Author

michaelmaitland commented Dec 17, 2024

Should we just remove this code for now

This would be fine by me. Incrementalism is good. :)

I've removed it in #120291.
FWIW, I don't think what you're concerned about can happen with or without #120291 merged since all the instructions that isVectorOpUsedAsScalarOp deal with return OperandInfo(Unknown) and won't lead to any (incorrect) optimization.

The code that was there said that we could ignore the reduction and not call getOperandInfo on it. So it doesn't matter that the reduction is missing from getOperandInfo. The code effectively said that a scalar operand doesn't depend on any elements from the producer. This is incorrect, it demands exactly 1 element. With that code in place only the VL of the other consumers was used. If they used less than 1 element then the 1 element that scalar op needs wouldn't be valid.

Yes, my bad. I agree we should remove the suggested code in #120291.

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

; CHECK-NEXT: vmv.v.i v9, 3
; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non blocking, but this shows a case where we probably want to teach VSETVLI insertion that it can increase VL if the instruction is tail undefined.

@michaelmaitland michaelmaitland merged commit 169c32e into llvm:main Dec 17, 2024
4 of 7 checks passed
@michaelmaitland michaelmaitland deleted the enable-vlopt branch December 17, 2024 21:19
@llvm-ci
Copy link
Collaborator

llvm-ci commented Dec 17, 2024

LLVM Buildbot has detected a new failure on builder cross-project-tests-sie-ubuntu running on doug-worker-1a while building llvm at step 2 "checkout".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/181/builds/10532

Here is the relevant piece of the build log for the reference
Step 2 (checkout) failure: update (failure)

; RV32-NEXT: vmv.v.i v12, 1
; RV32-NEXT: vsetivli zero, 8, e16, m1, ta, ma
; RV32-NEXT: vmv.v.i v14, 6
Copy link
Contributor

@lukel97 lukel97 Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know why only one of the vmv.v.is had their VL reduced here?

Edit, just seeing Philip's comment above that explains it.

Copy link
Contributor Author

@michaelmaitland michaelmaitland Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at the MIR: https://godbolt.org/z/xrvG13qx6

You can see that %4 is used as a tied operand. We don't optimize that case:

// Tied operands might pass through.

raikonenfnu added a commit to iree-org/llvm-project that referenced this pull request Dec 26, 2024
raikonenfnu added a commit to iree-org/llvm-project that referenced this pull request Dec 26, 2024
raikonenfnu added a commit to raikonenfnu/iree that referenced this pull request Dec 27, 2024
Update LLVM to llvm/llvm-project@ac8bb735. C++ changes are related to
change in behavior of TypeConverter. It used to generate
UnrealizedConversionCastOp, during applySignatureConversion in
GenericOpTypePropagation of TypePropagationPass.cpp, however now it's
not. This causes unrealized_conversion_cast to be generated later and
hence survive the pass. To repro above behavior, try undo the C++ change
in this PR and then:

```
wget https://gist.githubusercontent.com/raikonenfnu/dfb3b274007df8c4be87daf9ee67a5f4/raw/e48cc07e5fa558cd2c450b0e3ae46568136e1be6/type_propagate_repro.mlir
iree-opt --pass-pipeline='builtin.module(func.func(iree-codegen-type-propagation))' propagate_test.mlir -o /dev/null

error: failed to legalize unresolved materialization from ('i8') to ('i1') that remained live after conversion
  ^bb0(%in: i1, %in_0: f32, %in_1: f32, %out: f32):
       ^
propagate_test.mlir:5:8: note: see current operation: %10 = "builtin.unrealized_conversion_cast"(%arg0) : (i8) -> i1
propagate_test.mlir:6:11: note: see existing live user here: %10 = arith.select %9, %in_0, %in_1 : f32
```

This PR also carries the following reverts:

llvm/llvm-project#120999
llvm/llvm-project#120115
llvm/llvm-project#119461

The main issue with this PR(12099 and 120115) is it breaks matvec codegen generating scf.if instead of scf.for(s). An issue will be pushed up for repro.

The main issue with PR 119461 is it breaks e2e riscv test by making it get stuck on infinite loop.
```
/path/to/iree-build/tools/iree-compile --output-format=vm-bytecode --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=llvm-cpu --iree-input-type=stablehlo --iree-input-demote-f64-to-f32 --iree-llvmcpu-target-cpu=generic /path/to/iree/tests/e2e/stablehlo_ops/three_fry.mlir -o three_fly_exec_target.mlir --iree-llvmcpu-target-triple=riscv64 --iree-llvmcpu-target-abi=lp64d --iree-llvmcpu-target-cpu-features=+m,+a,+d,+zvl512b,+v --mlir-disable-threading
> infinite loop
```

Signed-off-by: Stanley Winata <stanley.winata@amd.com>
raikonenfnu added a commit to iree-org/llvm-project that referenced this pull request Dec 27, 2024
Groverkss pushed a commit to iree-org/llvm-project that referenced this pull request Dec 27, 2024
Groverkss pushed a commit to iree-org/llvm-project that referenced this pull request Dec 27, 2024
raikonenfnu added a commit to raikonenfnu/iree that referenced this pull request Dec 28, 2024
Update LLVM to llvm/llvm-project@ac8bb735. C++ changes are related to
change in behavior of TypeConverter. It used to generate
UnrealizedConversionCastOp, during applySignatureConversion in
GenericOpTypePropagation of TypePropagationPass.cpp, however now it's
not. This causes unrealized_conversion_cast to be generated later and
hence survive the pass. To repro above behavior, try undo the C++ change
in this PR and then:

```
wget https://gist.githubusercontent.com/raikonenfnu/dfb3b274007df8c4be87daf9ee67a5f4/raw/e48cc07e5fa558cd2c450b0e3ae46568136e1be6/type_propagate_repro.mlir
iree-opt --pass-pipeline='builtin.module(func.func(iree-codegen-type-propagation))' propagate_test.mlir -o /dev/null

error: failed to legalize unresolved materialization from ('i8') to ('i1') that remained live after conversion
  ^bb0(%in: i1, %in_0: f32, %in_1: f32, %out: f32):
       ^
propagate_test.mlir:5:8: note: see current operation: %10 = "builtin.unrealized_conversion_cast"(%arg0) : (i8) -> i1
propagate_test.mlir:6:11: note: see existing live user here: %10 = arith.select %9, %in_0, %in_1 : f32
```

This PR also carries the following reverts:

llvm/llvm-project#120999
llvm/llvm-project#120115
llvm/llvm-project#119461

The main issue with this PR(12099 and 120115) is it breaks matvec codegen generating scf.if instead of scf.for(s). An issue will be pushed up for repro.

The main issue with PR 119461 is it breaks e2e riscv test by making it get stuck on infinite loop.
```
/path/to/iree-build/tools/iree-compile --output-format=vm-bytecode --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=llvm-cpu --iree-input-type=stablehlo --iree-input-demote-f64-to-f32 --iree-llvmcpu-target-cpu=generic /path/to/iree/tests/e2e/stablehlo_ops/three_fry.mlir -o three_fly_exec_target.mlir --iree-llvmcpu-target-triple=riscv64 --iree-llvmcpu-target-abi=lp64d --iree-llvmcpu-target-cpu-features=+m,+a,+d,+zvl512b,+v --mlir-disable-threading
> infinite loop
```

Signed-off-by: Stanley Winata <stanley.winata@amd.com>
raikonenfnu added a commit to iree-org/iree that referenced this pull request Dec 31, 2024
Update LLVM to llvm/llvm-project@ac8bb735. C++ changes are related to
change in behavior of TypeConverter changed in

iree-org/llvm-project@3cc311a.
It used to generate UnrealizedConversionCastOp, during
applySignatureConversion in GenericOpTypePropagation of
TypePropagationPass.cpp, however now it's not. This causes
unrealized_conversion_cast to be generated later and hence survive the
pass. To repro above behavior, try undo the C++ change in this PR and
then:

```
wget https://gist.githubusercontent.com/raikonenfnu/dfb3b274007df8c4be87daf9ee67a5f4/raw/e48cc07e5fa558cd2c450b0e3ae46568136e1be6/type_propagate_repro.mlir
iree-opt --pass-pipeline='builtin.module(func.func(iree-codegen-type-propagation))' propagate_test.mlir -o /dev/null

error: failed to legalize unresolved materialization from ('i8') to ('i1') that remained live after conversion
  ^bb0(%in: i1, %in_0: f32, %in_1: f32, %out: f32):
       ^
propagate_test.mlir:5:8: note: see current operation: %10 = "builtin.unrealized_conversion_cast"(%arg0) : (i8) -> i1
propagate_test.mlir:6:11: note: see existing live user here: %10 = arith.select %9, %in_0, %in_1 : f32
```

Additionally, we made API changes in
6ed8924 from:
1. `applyPatternsAndFoldGreedily` -> `applyPatternsGreedily`
2. `applyOpPatternsAndFold` -> `applyOpPatternsGreedily`
To resolve depracated API error in bazel 

This PR also carries the following reverts:

llvm/llvm-project#119461

The main issue with PR 119461 is it breaks e2e riscv test by making it
get stuck on infinite loop.
```
/path/to/iree-build/tools/iree-compile --output-format=vm-bytecode --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=llvm-cpu --iree-input-type=stablehlo --iree-input-demote-f64-to-f32 --iree-llvmcpu-target-cpu=generic /path/to/iree/tests/e2e/stablehlo_ops/three_fry.mlir -o three_fly_exec_target.mlir --iree-llvmcpu-target-triple=riscv64 --iree-llvmcpu-target-abi=lp64d --iree-llvmcpu-target-cpu-features=+m,+a,+d,+zvl512b,+v --mlir-disable-threading
> infinite loop
```

---------

Signed-off-by: Stanley Winata <stanley.winata@amd.com>
IanWood1 pushed a commit to iree-org/llvm-project that referenced this pull request Jan 2, 2025
MaheshRavishankar added a commit to iree-org/llvm-project that referenced this pull request Jan 2, 2025
MaheshRavishankar added a commit to iree-org/llvm-project that referenced this pull request Jan 3, 2025
MaheshRavishankar added a commit to iree-org/llvm-project that referenced this pull request Jan 7, 2025
MaheshRavishankar added a commit to iree-org/llvm-project that referenced this pull request Jan 13, 2025
nirvedhmeshram pushed a commit to iree-org/llvm-project that referenced this pull request Jan 20, 2025
MaheshRavishankar added a commit to iree-org/llvm-project that referenced this pull request Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants