-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV][VLOPT] Enable the RISCVVLOptimizer by default #119461
Conversation
@llvm/pr-subscribers-backend-risc-v Author: Michael Maitland (michaelmaitland) ChangesNow that we have testing of all instructions in the isSupportedInstr switch, and better coverage of getOperandInfo, I think it is a good time to enable this by default. I'd like for #112231 and #119416 to land before this patch, so it'd be great for anyone reviewing this to check those out first. Patch is 81.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/119461.diff 34 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index dcd3598f658f6a..c507ab3f4f3885 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -112,7 +112,7 @@ static cl::opt<bool> EnablePostMISchedLoadStoreClustering(
static cl::opt<bool>
EnableVLOptimizer("riscv-enable-vl-optimizer",
cl::desc("Enable the RISC-V VL Optimizer pass"),
- cl::init(false), cl::Hidden);
+ cl::init(true), cl::Hidden);
static cl::opt<bool> DisableVectorMaskMutation(
"riscv-disable-vector-mask-mutation",
diff --git a/llvm/test/CodeGen/RISCV/O3-pipeline.ll b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
index 8fd9ae98503665..b0c756e26985bb 100644
--- a/llvm/test/CodeGen/RISCV/O3-pipeline.ll
+++ b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
@@ -119,6 +119,8 @@
; RV64-NEXT: RISC-V Optimize W Instructions
; CHECK-NEXT: RISC-V Pre-RA pseudo instruction expansion pass
; CHECK-NEXT: RISC-V Merge Base Offset
+; CHECK-NEXT: MachineDominator Tree Construction
+; CHECK-NEXT: RISC-V VL Optimizer
; CHECK-NEXT: RISC-V Insert Read/Write CSR Pass
; CHECK-NEXT: RISC-V Insert Write VXRM Pass
; CHECK-NEXT: RISC-V Landing Pad Setup
@@ -129,7 +131,6 @@
; CHECK-NEXT: Live Variable Analysis
; CHECK-NEXT: Eliminate PHI nodes for register allocation
; CHECK-NEXT: Two-Address instruction pass
-; CHECK-NEXT: MachineDominator Tree Construction
; CHECK-NEXT: Slot index numbering
; CHECK-NEXT: Live Interval Analysis
; CHECK-NEXT: Register Coalescer
diff --git a/llvm/test/CodeGen/RISCV/rvv/ctlz-vp.ll b/llvm/test/CodeGen/RISCV/rvv/ctlz-vp.ll
index ce4bc48dff0426..6f515996677ee6 100644
--- a/llvm/test/CodeGen/RISCV/rvv/ctlz-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/ctlz-vp.ll
@@ -2654,9 +2654,8 @@ define <vscale x 1 x i9> @vp_ctlo_zero_undef_nxv1i9(<vscale x 1 x i9> %va, <vsca
; CHECK-LABEL: vp_ctlo_zero_undef_nxv1i9:
; CHECK: # %bb.0:
; CHECK-NEXT: li a1, 511
-; CHECK-NEXT: vsetvli a2, zero, e16, mf4, ta, ma
-; CHECK-NEXT: vxor.vx v8, v8, a1
; CHECK-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
+; CHECK-NEXT: vxor.vx v8, v8, a1
; CHECK-NEXT: vsll.vi v8, v8, 7, v0.t
; CHECK-NEXT: vfwcvt.f.xu.v v9, v8, v0.t
; CHECK-NEXT: vsetvli zero, zero, e32, mf2, ta, ma
@@ -2670,9 +2669,8 @@ define <vscale x 1 x i9> @vp_ctlo_zero_undef_nxv1i9(<vscale x 1 x i9> %va, <vsca
; CHECK-ZVBB-LABEL: vp_ctlo_zero_undef_nxv1i9:
; CHECK-ZVBB: # %bb.0:
; CHECK-ZVBB-NEXT: li a1, 511
-; CHECK-ZVBB-NEXT: vsetvli a2, zero, e16, mf4, ta, ma
-; CHECK-ZVBB-NEXT: vxor.vx v8, v8, a1
; CHECK-ZVBB-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
+; CHECK-ZVBB-NEXT: vxor.vx v8, v8, a1
; CHECK-ZVBB-NEXT: vsll.vi v8, v8, 7, v0.t
; CHECK-ZVBB-NEXT: vclz.v v8, v8, v0.t
; CHECK-ZVBB-NEXT: ret
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-abs.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-abs.ll
index ac7d3d9109e39c..3153b44386d7ae 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-abs.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-abs.ll
@@ -39,9 +39,7 @@ define void @abs_v6i16(ptr %x) {
; CHECK: # %bb.0:
; CHECK-NEXT: vsetivli zero, 6, e16, m1, ta, ma
; CHECK-NEXT: vle16.v v8, (a0)
-; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
; CHECK-NEXT: vrsub.vi v9, v8, 0
-; CHECK-NEXT: vsetivli zero, 6, e16, m1, ta, ma
; CHECK-NEXT: vmax.vv v8, v8, v9
; CHECK-NEXT: vse16.v v8, (a0)
; CHECK-NEXT: ret
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp.ll
index 36bbec12e9b06c..15793eaada0783 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp.ll
@@ -788,11 +788,9 @@ define void @copysign_v6bf16(ptr %x, ptr %y) {
; CHECK-NEXT: vle16.v v8, (a1)
; CHECK-NEXT: vle16.v v9, (a0)
; CHECK-NEXT: lui a1, 8
-; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
; CHECK-NEXT: vand.vx v8, v8, a1
; CHECK-NEXT: addi a1, a1, -1
; CHECK-NEXT: vand.vx v9, v9, a1
-; CHECK-NEXT: vsetivli zero, 6, e16, m1, ta, ma
; CHECK-NEXT: vor.vv v8, v9, v8
; CHECK-NEXT: vse16.v v8, (a0)
; CHECK-NEXT: ret
@@ -848,11 +846,9 @@ define void @copysign_v6f16(ptr %x, ptr %y) {
; ZVFHMIN-NEXT: vle16.v v8, (a1)
; ZVFHMIN-NEXT: vle16.v v9, (a0)
; ZVFHMIN-NEXT: lui a1, 8
-; ZVFHMIN-NEXT: vsetivli zero, 8, e16, m1, ta, ma
; ZVFHMIN-NEXT: vand.vx v8, v8, a1
; ZVFHMIN-NEXT: addi a1, a1, -1
; ZVFHMIN-NEXT: vand.vx v9, v9, a1
-; ZVFHMIN-NEXT: vsetivli zero, 6, e16, m1, ta, ma
; ZVFHMIN-NEXT: vor.vv v8, v9, v8
; ZVFHMIN-NEXT: vse16.v v8, (a0)
; ZVFHMIN-NEXT: ret
@@ -924,12 +920,10 @@ define void @copysign_vf_v6bf16(ptr %x, bfloat %y) {
; CHECK-NEXT: vsetivli zero, 6, e16, m1, ta, ma
; CHECK-NEXT: vle16.v v8, (a0)
; CHECK-NEXT: lui a2, 8
-; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
; CHECK-NEXT: vmv.v.x v9, a1
; CHECK-NEXT: addi a1, a2, -1
; CHECK-NEXT: vand.vx v8, v8, a1
; CHECK-NEXT: vand.vx v9, v9, a2
-; CHECK-NEXT: vsetivli zero, 6, e16, m1, ta, ma
; CHECK-NEXT: vor.vv v8, v8, v9
; CHECK-NEXT: vse16.v v8, (a0)
; CHECK-NEXT: ret
@@ -986,12 +980,10 @@ define void @copysign_vf_v6f16(ptr %x, half %y) {
; ZVFHMIN-NEXT: vsetivli zero, 6, e16, m1, ta, ma
; ZVFHMIN-NEXT: vle16.v v8, (a0)
; ZVFHMIN-NEXT: lui a2, 8
-; ZVFHMIN-NEXT: vsetivli zero, 8, e16, m1, ta, ma
; ZVFHMIN-NEXT: vmv.v.x v9, a1
; ZVFHMIN-NEXT: addi a1, a2, -1
; ZVFHMIN-NEXT: vand.vx v8, v8, a1
; ZVFHMIN-NEXT: vand.vx v9, v9, a2
-; ZVFHMIN-NEXT: vsetivli zero, 6, e16, m1, ta, ma
; ZVFHMIN-NEXT: vor.vv v8, v8, v9
; ZVFHMIN-NEXT: vse16.v v8, (a0)
; ZVFHMIN-NEXT: ret
@@ -1065,11 +1057,9 @@ define void @copysign_neg_v6bf16(ptr %x, ptr %y) {
; CHECK-NEXT: vle16.v v9, (a0)
; CHECK-NEXT: lui a1, 8
; CHECK-NEXT: addi a2, a1, -1
-; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
; CHECK-NEXT: vxor.vx v8, v8, a1
; CHECK-NEXT: vand.vx v9, v9, a2
; CHECK-NEXT: vand.vx v8, v8, a1
-; CHECK-NEXT: vsetivli zero, 6, e16, m1, ta, ma
; CHECK-NEXT: vor.vv v8, v9, v8
; CHECK-NEXT: vse16.v v8, (a0)
; CHECK-NEXT: ret
@@ -1129,11 +1119,9 @@ define void @copysign_neg_v6f16(ptr %x, ptr %y) {
; ZVFHMIN-NEXT: vle16.v v9, (a0)
; ZVFHMIN-NEXT: lui a1, 8
; ZVFHMIN-NEXT: addi a2, a1, -1
-; ZVFHMIN-NEXT: vsetivli zero, 8, e16, m1, ta, ma
; ZVFHMIN-NEXT: vxor.vx v8, v8, a1
; ZVFHMIN-NEXT: vand.vx v9, v9, a2
; ZVFHMIN-NEXT: vand.vx v8, v8, a1
-; ZVFHMIN-NEXT: vsetivli zero, 6, e16, m1, ta, ma
; ZVFHMIN-NEXT: vor.vv v8, v9, v8
; ZVFHMIN-NEXT: vse16.v v8, (a0)
; ZVFHMIN-NEXT: ret
@@ -1211,12 +1199,12 @@ define void @copysign_neg_trunc_v3bf16_v3f32(ptr %x, ptr %y) {
; CHECK-NEXT: vle32.v v9, (a1)
; CHECK-NEXT: lui a1, 8
; CHECK-NEXT: addi a2, a1, -1
-; CHECK-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
; CHECK-NEXT: vand.vx v8, v8, a2
+; CHECK-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
; CHECK-NEXT: vfncvtbf16.f.f.w v10, v9
+; CHECK-NEXT: vsetivli zero, 3, e16, mf2, ta, ma
; CHECK-NEXT: vxor.vx v9, v10, a1
; CHECK-NEXT: vand.vx v9, v9, a1
-; CHECK-NEXT: vsetivli zero, 3, e16, mf2, ta, ma
; CHECK-NEXT: vor.vv v8, v8, v9
; CHECK-NEXT: vse16.v v8, (a0)
; CHECK-NEXT: ret
@@ -1283,12 +1271,12 @@ define void @copysign_neg_trunc_v3f16_v3f32(ptr %x, ptr %y) {
; ZVFHMIN-NEXT: vle32.v v9, (a1)
; ZVFHMIN-NEXT: lui a1, 8
; ZVFHMIN-NEXT: addi a2, a1, -1
-; ZVFHMIN-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
; ZVFHMIN-NEXT: vand.vx v8, v8, a2
+; ZVFHMIN-NEXT: vsetivli zero, 4, e16, mf2, ta, ma
; ZVFHMIN-NEXT: vfncvt.f.f.w v10, v9
+; ZVFHMIN-NEXT: vsetivli zero, 3, e16, mf2, ta, ma
; ZVFHMIN-NEXT: vxor.vx v9, v10, a1
; ZVFHMIN-NEXT: vand.vx v9, v9, a1
-; ZVFHMIN-NEXT: vsetivli zero, 3, e16, mf2, ta, ma
; ZVFHMIN-NEXT: vor.vv v8, v8, v9
; ZVFHMIN-NEXT: vse16.v v8, (a0)
; ZVFHMIN-NEXT: ret
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
index e9fd0a19e3eb66..276b5401a902a4 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
@@ -910,8 +910,9 @@ define <4 x i8> @buildvec_not_vid_v4i8_2() {
define <16 x i8> @buildvec_not_vid_v16i8() {
; CHECK-LABEL: buildvec_not_vid_v16i8:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetivli zero, 16, e8, m1, ta, ma
+; CHECK-NEXT: vsetivli zero, 7, e8, m1, ta, ma
; CHECK-NEXT: vmv.v.i v9, 3
+; CHECK-NEXT: vsetivli zero, 16, e8, m1, ta, ma
; CHECK-NEXT: vmv.v.i v8, 0
; CHECK-NEXT: vsetivli zero, 7, e8, m1, tu, ma
; CHECK-NEXT: vslideup.vi v8, v9, 6
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-shuffles.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-shuffles.ll
index 1c6e1a37fa8af5..a8e12dfaa82e9c 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-shuffles.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-shuffles.ll
@@ -348,8 +348,9 @@ define <8 x i8> @splat_ve4_ins_i0ve2(<8 x i8> %v) {
define <8 x i8> @splat_ve4_ins_i1ve3(<8 x i8> %v) {
; CHECK-LABEL: splat_ve4_ins_i1ve3:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-NEXT: vsetivli zero, 2, e8, mf2, ta, ma
; CHECK-NEXT: vmv.v.i v9, 3
+; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
; CHECK-NEXT: vmv.v.i v10, 4
; CHECK-NEXT: vsetivli zero, 2, e8, mf2, tu, ma
; CHECK-NEXT: vslideup.vi v10, v9, 1
@@ -432,8 +433,9 @@ define <8 x i8> @splat_ve2_we0_ins_i2ve4(<8 x i8> %v, <8 x i8> %w) {
define <8 x i8> @splat_ve2_we0_ins_i2we4(<8 x i8> %v, <8 x i8> %w) {
; CHECK-LABEL: splat_ve2_we0_ins_i2we4:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-NEXT: vsetivli zero, 3, e8, mf2, ta, ma
; CHECK-NEXT: vmv.v.i v10, 4
+; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
; CHECK-NEXT: vmv.v.i v11, 0
; CHECK-NEXT: li a0, 70
; CHECK-NEXT: vsetivli zero, 3, e8, mf2, tu, ma
@@ -451,8 +453,9 @@ define <8 x i8> @splat_ve2_we0_ins_i2we4(<8 x i8> %v, <8 x i8> %w) {
define <8 x i8> @splat_ve2_we0_ins_i2ve4_i5we6(<8 x i8> %v, <8 x i8> %w) {
; CHECK-LABEL: splat_ve2_we0_ins_i2ve4_i5we6:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-NEXT: vsetivli zero, 6, e8, mf2, ta, ma
; CHECK-NEXT: vmv.v.i v10, 6
+; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
; CHECK-NEXT: vmv.v.i v11, 0
; CHECK-NEXT: lui a0, 8256
; CHECK-NEXT: addi a0, a0, 2
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int.ll
index cba8de82ec41b9..59c7feb53ce94e 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int.ll
@@ -1100,15 +1100,17 @@ define void @mulhu_v8i16(ptr %x) {
; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
; CHECK-NEXT: vle16.v v8, (a0)
; CHECK-NEXT: vmv.v.i v9, 0
+; CHECK-NEXT: vsetivli zero, 7, e16, m1, ta, ma
; CHECK-NEXT: vmv.v.i v10, 1
; CHECK-NEXT: li a1, 33
; CHECK-NEXT: vmv.s.x v0, a1
; CHECK-NEXT: lui a1, %hi(.LCPI66_0)
; CHECK-NEXT: addi a1, a1, %lo(.LCPI66_0)
+; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
; CHECK-NEXT: vmv.v.i v11, 3
; CHECK-NEXT: vle16.v v12, (a1)
; CHECK-NEXT: vmerge.vim v11, v11, 2, v0
-; CHECK-NEXT: vmv.v.i v13, 0
+; CHECK-NEXT: vmv1r.v v13, v9
; CHECK-NEXT: vsetivli zero, 7, e16, m1, tu, ma
; CHECK-NEXT: vslideup.vi v9, v10, 6
; CHECK-NEXT: vsetivli zero, 8, e16, m1, ta, ma
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-changes-length.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-changes-length.ll
index 66f95b70776720..abbbfe8f252fb2 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-changes-length.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-changes-length.ll
@@ -97,8 +97,9 @@ define <4 x i32> @v4i32_v8i32(<8 x i32>) {
define <4 x i32> @v4i32_v16i32(<16 x i32>) {
; RV32-LABEL: v4i32_v16i32:
; RV32: # %bb.0:
-; RV32-NEXT: vsetivli zero, 8, e16, m1, ta, ma
+; RV32-NEXT: vsetivli zero, 2, e16, m1, ta, ma
; RV32-NEXT: vmv.v.i v12, 1
+; RV32-NEXT: vsetivli zero, 8, e16, m1, ta, ma
; RV32-NEXT: vmv.v.i v14, 6
; RV32-NEXT: li a0, 32
; RV32-NEXT: vmv.v.i v0, 10
diff --git a/llvm/test/CodeGen/RISCV/rvv/vdiv-vp.ll b/llvm/test/CodeGen/RISCV/rvv/vdiv-vp.ll
index c7b5200979370e..2814be2792de9a 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vdiv-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vdiv-vp.ll
@@ -11,9 +11,7 @@ define <vscale x 8 x i7> @vdiv_vx_nxv8i7(<vscale x 8 x i7> %a, i7 signext %b, <v
; CHECK: # %bb.0:
; CHECK-NEXT: vsetvli zero, a1, e8, m1, ta, ma
; CHECK-NEXT: vsll.vi v8, v8, 1, v0.t
-; CHECK-NEXT: vsetvli a2, zero, e8, m1, ta, ma
; CHECK-NEXT: vmv.v.x v9, a0
-; CHECK-NEXT: vsetvli zero, a1, e8, m1, ta, ma
; CHECK-NEXT: vsra.vi v8, v8, 1, v0.t
; CHECK-NEXT: vsll.vi v9, v9, 1, v0.t
; CHECK-NEXT: vsra.vi v9, v9, 1, v0.t
diff --git a/llvm/test/CodeGen/RISCV/rvv/vdivu-vp.ll b/llvm/test/CodeGen/RISCV/rvv/vdivu-vp.ll
index 850ad863dd384e..3e913d4f682ed4 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vdivu-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vdivu-vp.ll
@@ -10,9 +10,8 @@ define <vscale x 8 x i7> @vdivu_vx_nxv8i7(<vscale x 8 x i7> %a, i7 signext %b, <
; CHECK-LABEL: vdivu_vx_nxv8i7:
; CHECK: # %bb.0:
; CHECK-NEXT: li a2, 127
-; CHECK-NEXT: vsetvli a3, zero, e8, m1, ta, ma
-; CHECK-NEXT: vmv.v.x v9, a0
; CHECK-NEXT: vsetvli zero, a1, e8, m1, ta, ma
+; CHECK-NEXT: vmv.v.x v9, a0
; CHECK-NEXT: vand.vx v8, v8, a2, v0.t
; CHECK-NEXT: vand.vx v9, v9, a2, v0.t
; CHECK-NEXT: vdivu.vv v8, v8, v9, v0.t
diff --git a/llvm/test/CodeGen/RISCV/rvv/vfma-vp.ll b/llvm/test/CodeGen/RISCV/rvv/vfma-vp.ll
index 7ca1983e8b32c0..ab67e9833c78aa 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vfma-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vfma-vp.ll
@@ -4301,10 +4301,9 @@ define <vscale x 1 x half> @vfnmadd_vf_nxv1f16_neg_splat(<vscale x 1 x half> %va
; ZVFHMIN-LABEL: vfnmadd_vf_nxv1f16_neg_splat:
; ZVFHMIN: # %bb.0:
; ZVFHMIN-NEXT: fmv.x.h a1, fa0
-; ZVFHMIN-NEXT: vsetvli a2, zero, e16, mf4, ta, ma
+; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
; ZVFHMIN-NEXT: vmv.v.x v10, a1
; ZVFHMIN-NEXT: lui a1, 8
-; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
; ZVFHMIN-NEXT: vxor.vx v10, v10, a1, v0.t
; ZVFHMIN-NEXT: vxor.vx v9, v9, a1, v0.t
; ZVFHMIN-NEXT: vsetvli a1, zero, e16, mf4, ta, ma
@@ -4334,10 +4333,9 @@ define <vscale x 1 x half> @vfnmadd_vf_nxv1f16_neg_splat_commute(<vscale x 1 x h
; ZVFHMIN-LABEL: vfnmadd_vf_nxv1f16_neg_splat_commute:
; ZVFHMIN: # %bb.0:
; ZVFHMIN-NEXT: fmv.x.h a1, fa0
-; ZVFHMIN-NEXT: vsetvli a2, zero, e16, mf4, ta, ma
+; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
; ZVFHMIN-NEXT: vmv.v.x v10, a1
; ZVFHMIN-NEXT: lui a1, 8
-; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
; ZVFHMIN-NEXT: vxor.vx v10, v10, a1, v0.t
; ZVFHMIN-NEXT: vxor.vx v9, v9, a1, v0.t
; ZVFHMIN-NEXT: vsetvli a1, zero, e16, mf4, ta, ma
@@ -4367,10 +4365,9 @@ define <vscale x 1 x half> @vfnmadd_vf_nxv1f16_neg_splat_unmasked(<vscale x 1 x
; ZVFHMIN-LABEL: vfnmadd_vf_nxv1f16_neg_splat_unmasked:
; ZVFHMIN: # %bb.0:
; ZVFHMIN-NEXT: fmv.x.h a1, fa0
-; ZVFHMIN-NEXT: vsetvli a2, zero, e16, mf4, ta, ma
+; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
; ZVFHMIN-NEXT: vmv.v.x v10, a1
; ZVFHMIN-NEXT: lui a1, 8
-; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
; ZVFHMIN-NEXT: vxor.vx v9, v9, a1
; ZVFHMIN-NEXT: vxor.vx v10, v10, a1
; ZVFHMIN-NEXT: vsetvli a1, zero, e16, mf4, ta, ma
@@ -4400,10 +4397,9 @@ define <vscale x 1 x half> @vfnmadd_vf_nxv1f16_neg_splat_unmasked_commute(<vscal
; ZVFHMIN-LABEL: vfnmadd_vf_nxv1f16_neg_splat_unmasked_commute:
; ZVFHMIN: # %bb.0:
; ZVFHMIN-NEXT: fmv.x.h a1, fa0
-; ZVFHMIN-NEXT: vsetvli a2, zero, e16, mf4, ta, ma
+; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
; ZVFHMIN-NEXT: vmv.v.x v10, a1
; ZVFHMIN-NEXT: lui a1, 8
-; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
; ZVFHMIN-NEXT: vxor.vx v9, v9, a1
; ZVFHMIN-NEXT: vxor.vx v10, v10, a1
; ZVFHMIN-NEXT: vsetvli a1, zero, e16, mf4, ta, ma
@@ -4670,9 +4666,10 @@ define <vscale x 1 x half> @vfnmsub_vf_nxv1f16_neg_splat(<vscale x 1 x half> %va
; ZVFHMIN-LABEL: vfnmsub_vf_nxv1f16_neg_splat:
; ZVFHMIN: # %bb.0:
; ZVFHMIN-NEXT: fmv.x.h a1, fa0
-; ZVFHMIN-NEXT: vsetvli a2, zero, e16, mf4, ta, ma
+; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
; ZVFHMIN-NEXT: vmv.v.x v10, a1
; ZVFHMIN-NEXT: lui a1, 8
+; ZVFHMIN-NEXT: vsetvli a2, zero, e16, mf4, ta, ma
; ZVFHMIN-NEXT: vfwcvt.f.f.v v11, v9
; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
; ZVFHMIN-NEXT: vxor.vx v9, v10, a1, v0.t
@@ -4701,9 +4698,10 @@ define <vscale x 1 x half> @vfnmsub_vf_nxv1f16_neg_splat_commute(<vscale x 1 x h
; ZVFHMIN-LABEL: vfnmsub_vf_nxv1f16_neg_splat_commute:
; ZVFHMIN: # %bb.0:
; ZVFHMIN-NEXT: fmv.x.h a1, fa0
-; ZVFHMIN-NEXT: vsetvli a2, zero, e16, mf4, ta, ma
+; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
; ZVFHMIN-NEXT: vmv.v.x v10, a1
; ZVFHMIN-NEXT: lui a1, 8
+; ZVFHMIN-NEXT: vsetvli a2, zero, e16, mf4, ta, ma
; ZVFHMIN-NEXT: vfwcvt.f.f.v v11, v9
; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
; ZVFHMIN-NEXT: vxor.vx v9, v10, a1, v0.t
@@ -4732,9 +4730,10 @@ define <vscale x 1 x half> @vfnmsub_vf_nxv1f16_neg_splat_unmasked(<vscale x 1 x
; ZVFHMIN-LABEL: vfnmsub_vf_nxv1f16_neg_splat_unmasked:
; ZVFHMIN: # %bb.0:
; ZVFHMIN-NEXT: fmv.x.h a1, fa0
-; ZVFHMIN-NEXT: vsetvli a2, zero, e16, mf4, ta, ma
+; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
; ZVFHMIN-NEXT: vmv.v.x v10, a1
; ZVFHMIN-NEXT: lui a1, 8
+; ZVFHMIN-NEXT: vsetvli a2, zero, e16, mf4, ta, ma
; ZVFHMIN-NEXT: vfwcvt.f.f.v v11, v9
; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
; ZVFHMIN-NEXT: vxor.vx v9, v10, a1
@@ -4763,9 +4762,10 @@ define <vscale x 1 x half> @vfnmsub_vf_nxv1f16_neg_splat_unmasked_commute(<vscal
; ZVFHMIN-LABEL: vfnmsub_vf_nxv1f16_neg_splat_unmasked_commute:
; ZVFHMIN: # %bb.0:
; ZVFHMIN-NEXT: fmv.x.h a1, fa0
-; ZVFHMIN-NEXT: vsetvli a2, zero, e16, mf4, ta, ma
+; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
; ZVFHMIN-NEXT: vmv.v.x v10, a1
; ZVFHMIN-NEXT: lui a1, 8
+; ZVFHMIN-NEXT: vsetvli a2, zero, e16, mf4, ta, ma
; ZVFHMIN-NEXT: vfwcvt.f.f.v v11, v9
; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf4, ta, ma
; ZVFHMIN-NEXT: vxor.vx v9, v10, a1
@@ -5220,10 +5220,9 @@ define <vscale x 2 x half> @vfnmadd_vf_nxv2f16_neg_splat(<vscale x 2 x half> %va
; ZVFHMIN-LABEL: vfnmadd_vf_nxv2f16_neg_splat:
; ZVFHMIN: # %bb.0:
; ZVFHMIN-NEXT: fmv.x.h a1, fa0
-; ZVFHMIN-NEXT: vsetvli a2, zero, e16, mf2, ta, ma
+; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf2, ta, ma
; ZVFHMIN-NEXT: vmv.v.x v10, a1
; ZVFHMIN-NEXT: lui a1, 8
-; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf2, ta, ma
; ZVFHMIN-NEXT: vxor.vx v10, v10, a1, v0.t
; ZVFHMIN-NEXT: vxor.vx v9, v9, a1, v0.t
; ZVFHMIN-NEXT: vsetvli a1, zero, e16, mf2, ta, ma
@@ -5253,10 +5252,9 @@ define <vscale x 2 x half> @vfnmadd_vf_nxv2f16_neg_splat_commute(<vscale x 2 x h
; ZVFHMIN-LABEL: vfnmadd_vf_nxv2f16_neg_splat_commute:
; ZVFHMIN: # %bb.0:
; ZVFHMIN-NEXT: fmv.x.h a1, fa0
-; ZVFHMIN-NEXT: vsetvli a2, zero, e16, mf2, ta, ma
+; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf2, ta, ma
; ZVFHMIN-NEXT: vmv.v.x v10, a1
; ZVFHMIN-NEXT: lui a1, 8
-; ZVFHMIN-NEXT: vsetvli zero, a0, e16, mf2, ta, ma
; ZVFHMIN-NEXT: vxor.vx v10, v10, a1, v0.t
; ZVFHMIN-NEXT: vxor.vx v9, v9, a1, v0.t
; ZVFHMIN-NEXT: vsetvli a1, zero, e16, mf2, ta, ma
@@ -5286,10 +5284,9 @@ define <vscale x 2 x half> @vfnmadd_vf_nxv2f16_neg_splat_unmasked(<vscale x 2 x
; ZVFHMIN-LABEL: vfnmadd_vf_nxv2f16_neg_splat_unmasked:
; ZVFHMI...
[truncated]
|
2768ab7
to
ad48d4d
Compare
ping |
Now that we have testing of all instructions in the isSupportedInstr switch, and better coverage of getOperandInfo, I think it is a good time to enable this by default. I'd like for llvm#112231 and llvm#119416 to land before this patch, so it'd be great for anyone reviewing this to check those out first.
ad48d4d
to
7ffa9e5
Compare
Have you had a chance to kick the tires with this on llvm-test-suite or SPEC? |
I had a chance to run this on spec2006/int/train and spec2017/int/train on qemu. Both build and terminate without errors. |
Reading through the code, I spotted one potential correctness issue. This is a cornercase, but probably still worth fixing. Imagine you have the following: The last instruction is dead - it can be folded to it's passthru. (In practice, it probably will have been folded, but it's possible something could slip through to here.) However, when scaning the users of %v, we will decide that the correct VL for %v is 0 (or a register which might be zero), and reduce it below the minimum VL=1 required by the reduction. To fix this, I believe you need to treat the CommonVL for the scalar operand case as being VL=1. You could also track a non-zero state instead. Other than that, looks good to me. Once you've fixed this issue, happy to approve. |
Nice, this also just came to mind but did you run it with the |
Yes |
Should we just remove this code for now. There are no directed tests for it.
|
This would be fine by me. Incrementalism is good. :) |
I've removed it in #120291. FWIW, I don't think what you're concerned about can happen with or without #120291 merged since all the instructions that isVectorOpUsedAsScalarOp deal with return |
The code that was there said that we could ignore the reduction and not call getOperandInfo on it. So it doesn't matter that the reduction is missing from getOperandInfo. The code effectively said that a scalar operand doesn't depend on any elements from the producer. This is incorrect, it demands exactly 1 element. With that code in place only the VL of the other consumers was used. If they used less than 1 element then the 1 element that scalar op needs wouldn't be valid. |
Yes, my bad. I agree we should remove the suggested code in #120291. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
; CHECK-NEXT: vmv.v.i v9, 3 | ||
; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, ma |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non blocking, but this shows a case where we probably want to teach VSETVLI insertion that it can increase VL if the instruction is tail undefined.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/181/builds/10532 Here is the relevant piece of the build log for the reference
|
; RV32-NEXT: vmv.v.i v12, 1 | ||
; RV32-NEXT: vsetivli zero, 8, e16, m1, ta, ma | ||
; RV32-NEXT: vmv.v.i v14, 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know why only one of the vmv.v.is had their VL reduced here?
Edit, just seeing Philip's comment above that explains it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at the MIR: https://godbolt.org/z/xrvG13qx6
You can see that %4
is used as a tied operand. We don't optimize that case:
// Tied operands might pass through. |
Update LLVM to llvm/llvm-project@ac8bb735. C++ changes are related to change in behavior of TypeConverter. It used to generate UnrealizedConversionCastOp, during applySignatureConversion in GenericOpTypePropagation of TypePropagationPass.cpp, however now it's not. This causes unrealized_conversion_cast to be generated later and hence survive the pass. To repro above behavior, try undo the C++ change in this PR and then: ``` wget https://gist.githubusercontent.com/raikonenfnu/dfb3b274007df8c4be87daf9ee67a5f4/raw/e48cc07e5fa558cd2c450b0e3ae46568136e1be6/type_propagate_repro.mlir iree-opt --pass-pipeline='builtin.module(func.func(iree-codegen-type-propagation))' propagate_test.mlir -o /dev/null error: failed to legalize unresolved materialization from ('i8') to ('i1') that remained live after conversion ^bb0(%in: i1, %in_0: f32, %in_1: f32, %out: f32): ^ propagate_test.mlir:5:8: note: see current operation: %10 = "builtin.unrealized_conversion_cast"(%arg0) : (i8) -> i1 propagate_test.mlir:6:11: note: see existing live user here: %10 = arith.select %9, %in_0, %in_1 : f32 ``` This PR also carries the following reverts: llvm/llvm-project#120999 llvm/llvm-project#120115 llvm/llvm-project#119461 The main issue with this PR(12099 and 120115) is it breaks matvec codegen generating scf.if instead of scf.for(s). An issue will be pushed up for repro. The main issue with PR 119461 is it breaks e2e riscv test by making it get stuck on infinite loop. ``` /path/to/iree-build/tools/iree-compile --output-format=vm-bytecode --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=llvm-cpu --iree-input-type=stablehlo --iree-input-demote-f64-to-f32 --iree-llvmcpu-target-cpu=generic /path/to/iree/tests/e2e/stablehlo_ops/three_fry.mlir -o three_fly_exec_target.mlir --iree-llvmcpu-target-triple=riscv64 --iree-llvmcpu-target-abi=lp64d --iree-llvmcpu-target-cpu-features=+m,+a,+d,+zvl512b,+v --mlir-disable-threading > infinite loop ``` Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Update LLVM to llvm/llvm-project@ac8bb735. C++ changes are related to change in behavior of TypeConverter. It used to generate UnrealizedConversionCastOp, during applySignatureConversion in GenericOpTypePropagation of TypePropagationPass.cpp, however now it's not. This causes unrealized_conversion_cast to be generated later and hence survive the pass. To repro above behavior, try undo the C++ change in this PR and then: ``` wget https://gist.githubusercontent.com/raikonenfnu/dfb3b274007df8c4be87daf9ee67a5f4/raw/e48cc07e5fa558cd2c450b0e3ae46568136e1be6/type_propagate_repro.mlir iree-opt --pass-pipeline='builtin.module(func.func(iree-codegen-type-propagation))' propagate_test.mlir -o /dev/null error: failed to legalize unresolved materialization from ('i8') to ('i1') that remained live after conversion ^bb0(%in: i1, %in_0: f32, %in_1: f32, %out: f32): ^ propagate_test.mlir:5:8: note: see current operation: %10 = "builtin.unrealized_conversion_cast"(%arg0) : (i8) -> i1 propagate_test.mlir:6:11: note: see existing live user here: %10 = arith.select %9, %in_0, %in_1 : f32 ``` This PR also carries the following reverts: llvm/llvm-project#120999 llvm/llvm-project#120115 llvm/llvm-project#119461 The main issue with this PR(12099 and 120115) is it breaks matvec codegen generating scf.if instead of scf.for(s). An issue will be pushed up for repro. The main issue with PR 119461 is it breaks e2e riscv test by making it get stuck on infinite loop. ``` /path/to/iree-build/tools/iree-compile --output-format=vm-bytecode --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=llvm-cpu --iree-input-type=stablehlo --iree-input-demote-f64-to-f32 --iree-llvmcpu-target-cpu=generic /path/to/iree/tests/e2e/stablehlo_ops/three_fry.mlir -o three_fly_exec_target.mlir --iree-llvmcpu-target-triple=riscv64 --iree-llvmcpu-target-abi=lp64d --iree-llvmcpu-target-cpu-features=+m,+a,+d,+zvl512b,+v --mlir-disable-threading > infinite loop ``` Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Update LLVM to llvm/llvm-project@ac8bb735. C++ changes are related to change in behavior of TypeConverter changed in iree-org/llvm-project@3cc311a. It used to generate UnrealizedConversionCastOp, during applySignatureConversion in GenericOpTypePropagation of TypePropagationPass.cpp, however now it's not. This causes unrealized_conversion_cast to be generated later and hence survive the pass. To repro above behavior, try undo the C++ change in this PR and then: ``` wget https://gist.githubusercontent.com/raikonenfnu/dfb3b274007df8c4be87daf9ee67a5f4/raw/e48cc07e5fa558cd2c450b0e3ae46568136e1be6/type_propagate_repro.mlir iree-opt --pass-pipeline='builtin.module(func.func(iree-codegen-type-propagation))' propagate_test.mlir -o /dev/null error: failed to legalize unresolved materialization from ('i8') to ('i1') that remained live after conversion ^bb0(%in: i1, %in_0: f32, %in_1: f32, %out: f32): ^ propagate_test.mlir:5:8: note: see current operation: %10 = "builtin.unrealized_conversion_cast"(%arg0) : (i8) -> i1 propagate_test.mlir:6:11: note: see existing live user here: %10 = arith.select %9, %in_0, %in_1 : f32 ``` Additionally, we made API changes in 6ed8924 from: 1. `applyPatternsAndFoldGreedily` -> `applyPatternsGreedily` 2. `applyOpPatternsAndFold` -> `applyOpPatternsGreedily` To resolve depracated API error in bazel This PR also carries the following reverts: llvm/llvm-project#119461 The main issue with PR 119461 is it breaks e2e riscv test by making it get stuck on infinite loop. ``` /path/to/iree-build/tools/iree-compile --output-format=vm-bytecode --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=llvm-cpu --iree-input-type=stablehlo --iree-input-demote-f64-to-f32 --iree-llvmcpu-target-cpu=generic /path/to/iree/tests/e2e/stablehlo_ops/three_fry.mlir -o three_fly_exec_target.mlir --iree-llvmcpu-target-triple=riscv64 --iree-llvmcpu-target-abi=lp64d --iree-llvmcpu-target-cpu-features=+m,+a,+d,+zvl512b,+v --mlir-disable-threading > infinite loop ``` --------- Signed-off-by: Stanley Winata <stanley.winata@amd.com>
…llvm#119461)"" This reverts commit 0f42dbd.
Now that we have testing of all instructions in the isSupportedInstr switch, and better coverage of getOperandInfo, I think it is a good time to enable this by default.