[InstCombine] Fold shift+cttz with power of 2 operands #127055

MDevereau · 2025-02-13T13:03:12Z

#121386 Introduced cttz intrinsics which caused a regression where vscale/vscale divisions could no longer be constant folded.

This fold was suggested as a fix in #126411. https://alive2.llvm.org/ce/z/gWbtPw

(llvm#121386) Introduced cttz intrinsics which caused a regression where vscale/vscale divisions could no longer be constant folded. This fold was suggested as a fix in (llvm#126411)

llvmbot · 2025-02-13T13:03:47Z

@llvm/pr-subscribers-llvm-transforms

Author: Matthew Devereau (MDevereau)

Changes

#121386 Introduced cttz intrinsics which caused a regression where vscale/vscale divisions could no longer be constant folded.

This fold was suggested as a fix in #126411.

Full diff: https://github.com/llvm/llvm-project/pull/127055.diff

2 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp (+16)
(modified) llvm/test/Transforms/InstCombine/shift-cttz-ctlz.ll (+34)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp b/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
index 7ef95800975db..ac0f9b005f317 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
@@ -1613,6 +1613,22 @@ Instruction *InstCombinerImpl::visitLShr(BinaryOperator &I) {
   if (Instruction *Overflow = foldLShrOverflowBit(I))
     return Overflow;
 
+  // Transform ((pow2 << x) >> cttz(pow2 << y)) -> ((1 << x) >> y)
+  Value *Shl0_Op0, *Shl0_Op1, *Shl1_Op0, *Shl1_Op1;
+  BinaryOperator *Shl1;
+  if (match(Op0, m_Shl(m_Value(Shl0_Op0), m_Value(Shl0_Op1))) &&
+      match(Op1, m_Intrinsic<Intrinsic::cttz>(m_BinOp(Shl1))) &&
+      match(Shl1, m_Shl(m_Value(Shl1_Op0), m_Value(Shl1_Op1))) &&
+      isKnownToBeAPowerOfTwo(Shl1, false, 0, SQ.getWithInstruction(&I).CxtI) &&
+      Shl0_Op0 == Shl1_Op0) {
+    auto *Shl0 = cast<BinaryOperator>(Op0);
+    if ((Shl0->hasNoUnsignedWrap() && Shl1->hasNoUnsignedWrap()) ||
+        (Shl0->hasNoSignedWrap() && Shl1->hasNoSignedWrap())) {
+      Value *NewShl =
+          Builder.CreateShl(ConstantInt::get(Shl1->getType(), 1), Shl0_Op1);
+      return BinaryOperator::CreateLShr(NewShl, Shl1_Op1);
+    }
+  }
   return nullptr;
 }
 
diff --git a/llvm/test/Transforms/InstCombine/shift-cttz-ctlz.ll b/llvm/test/Transforms/InstCombine/shift-cttz-ctlz.ll
index 63caec9501325..6269f29c880e3 100644
--- a/llvm/test/Transforms/InstCombine/shift-cttz-ctlz.ll
+++ b/llvm/test/Transforms/InstCombine/shift-cttz-ctlz.ll
@@ -103,4 +103,38 @@ entry:
   ret i32 %res
 }
 
+define i64 @fold_cttz_64() vscale_range(1,16) {
+; CHECK-LABEL: define i64 @fold_cttz_64(
+; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    ret i64 4
+;
+entry:
+  %0 = tail call i64 @llvm.vscale.i64()
+  %1 = shl nuw nsw i64 %0, 4
+  %2 = shl nuw nsw i64 %0, 2
+  %3 = tail call range(i64 2, 65) i64 @llvm.cttz.i64(i64 %2, i1 true)
+  %div1 = lshr i64 %1, %3
+  ret i64 %div1
+}
+
+define i32 @fold_cttz_32() vscale_range(1,16) {
+; CHECK-LABEL: define i32 @fold_cttz_32(
+; CHECK-SAME: ) #[[ATTR0]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    ret i32 4
+;
+entry:
+  %0 = tail call i32 @llvm.vscale.i32()
+  %1 = shl nuw nsw i32 %0, 4
+  %2 = shl nuw nsw i32 %0, 2
+  %3 = tail call range(i32 2, 65) i32 @llvm.cttz.i32(i32 %2, i1 true)
+  %div1 = lshr i32 %1, %3
+  ret i32 %div1
+}
+
+declare i64 @llvm.vscale.i64()
+declare i64 @llvm.cttz.i64(i64, i1 immarg)
+declare i32 @llvm.vscale.i32()
+declare i32 @llvm.cttz.i32(i32, i1 immarg)
 declare void @use(i32)

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp

llvm/test/Transforms/InstCombine/shift-cttz-ctlz.ll

Propagate nowrap flags to new shift Use named values in tests Remove intrinsic declarations

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp

dtcxzyw

LGTM.

See #126411 / #127055, the test isn't expected to fold in a single instcombine iteration, needing instcombine->cse->instcombine.

llvm#121386 Introduced cttz intrinsics which caused a regression where vscale/vscale divisions could no longer be constant folded. This fold was suggested as a fix in llvm#126411. https://alive2.llvm.org/ce/z/gWbtPw

See llvm#126411 / llvm#127055, the test isn't expected to fold in a single instcombine iteration, needing instcombine->cse->instcombine.

llvm#121386 Introduced cttz intrinsics which caused a regression where vscale/vscale divisions could no longer be constant folded. This fold was suggested as a fix in llvm#126411. https://alive2.llvm.org/ce/z/gWbtPw

See llvm#126411 / llvm#127055, the test isn't expected to fold in a single instcombine iteration, needing instcombine->cse->instcombine.

llvm#121386 Introduced cttz intrinsics which caused a regression where vscale/vscale divisions could no longer be constant folded. This fold was suggested as a fix in llvm#126411. https://alive2.llvm.org/ce/z/gWbtPw

See llvm#126411 / llvm#127055, the test isn't expected to fold in a single instcombine iteration, needing instcombine->cse->instcombine.

[InstCombine] Fold cttz with power of 2 operands

9f70feb

(llvm#121386) Introduced cttz intrinsics which caused a regression where vscale/vscale divisions could no longer be constant folded. This fold was suggested as a fix in (llvm#126411)

MDevereau requested a review from davemgreen February 13, 2025 13:03

MDevereau requested a review from nikic as a code owner February 13, 2025 13:03

llvmbot added llvm:instcombine llvm:transforms labels Feb 13, 2025

dtcxzyw reviewed Feb 13, 2025

View reviewed changes

Add fold for zero

c26fff8

Propagate nowrap flags to new shift Use named values in tests Remove intrinsic declarations

dtcxzyw reviewed Feb 14, 2025

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp Outdated Show resolved Hide resolved

davemgreen mentioned this pull request Feb 16, 2025

[InstCombine] Detect different vscales in div by shift combine. #126411

Closed

Remove unnecessary line

4c2004a

This was referenced Feb 18, 2025

Fuzz PR127055 dtcxzyw/llvm-mutation-based-fuzz-service#31

Closed

Task submission dtcxzyw/llvm-opt-benchmark#1312

Open

pre-commit: PR127055 dtcxzyw/llvm-opt-benchmark#2127

Closed

dtcxzyw approved these changes Feb 18, 2025

View reviewed changes

MDevereau merged commit 251377c into llvm:main Feb 18, 2025
8 checks passed

MDevereau deleted the svcnt branch February 18, 2025 10:26

davemgreen added a commit that referenced this pull request Feb 18, 2025

[AArch64] Add a phase-ordering test for dividing vscale. NFC

c71f914

See #126411 / #127055, the test isn't expected to fold in a single instcombine iteration, needing instcombine->cse->instcombine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InstCombine] Fold shift+cttz with power of 2 operands #127055

[InstCombine] Fold shift+cttz with power of 2 operands #127055

MDevereau commented Feb 13, 2025 •

edited

Loading

llvmbot commented Feb 13, 2025

dtcxzyw left a comment

[InstCombine] Fold shift+cttz with power of 2 operands #127055

[InstCombine] Fold shift+cttz with power of 2 operands #127055

Conversation

MDevereau commented Feb 13, 2025 • edited Loading

llvmbot commented Feb 13, 2025

dtcxzyw left a comment

Choose a reason for hiding this comment

MDevereau commented Feb 13, 2025 •

edited

Loading