Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InstCombine] Replace an integer comparison of a phi node with multiple ucmp/scmp operands and a constant with phi of individual comparisons of original intrinsic's arguments #107769

Merged
merged 8 commits into from
Sep 13, 2024

Conversation

Poseydon42
Copy link
Contributor

When we have a phi instruction with more than one of its incoming values being a call to ucmp or scmp, which is then compared with an integer constant, we can move the comparison through the phi into the incoming basic blocks because we know that a comparison of ucmp/scmp with a constant will be simplified by the next iteration of InstCombine.

This should help with one of the regressions that's blocking #107314

There's a high chance that other similar patterns can be identified, in which case they can be easily handled by the same code by moving the check for "simplifiable" instructions into a lambda.

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 8, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Volodymyr Vasylkun (Poseydon42)

Changes

When we have a phi instruction with more than one of its incoming values being a call to ucmp or scmp, which is then compared with an integer constant, we can move the comparison through the phi into the incoming basic blocks because we know that a comparison of ucmp/scmp with a constant will be simplified by the next iteration of InstCombine.

This should help with one of the regressions that's blocking #107314

There's a high chance that other similar patterns can be identified, in which case they can be easily handled by the same code by moving the check for "simplifiable" instructions into a lambda.


Full diff: https://github.com/llvm/llvm-project/pull/107769.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstructionCombining.cpp (+27)
  • (added) llvm/test/Transforms/InstCombine/phi-with-multiple-unsimplifiable-values.ll (+32)
diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
index 8195e0539305cc..4c27359752002c 100644
--- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
@@ -1809,6 +1809,7 @@ Instruction *InstCombinerImpl::foldOpIntoPhi(Instruction &I, PHINode *PN) {
   // Check to see whether the instruction can be folded into each phi operand.
   // If there is one operand that does not fold, remember the BB it is in.
   SmallVector<Value *> NewPhiValues;
+  SmallVector<unsigned int> OpsToMoveUseTo;
   BasicBlock *NonSimplifiedBB = nullptr;
   Value *NonSimplifiedInVal = nullptr;
   for (unsigned i = 0; i != NumPHIValues; ++i) {
@@ -1820,6 +1821,16 @@ Instruction *InstCombinerImpl::foldOpIntoPhi(Instruction &I, PHINode *PN) {
       continue;
     }
 
+    // If the only use of phi is comparing it with a constant then we can
+    // put this comparison in the incoming BB directly after a ucmp/scmp call
+    // because we know that it will simplify to a single icmp.
+    if (isa<CmpIntrinsic>(InVal) &&
+        match(&I, m_c_ICmp(m_Specific(PN), m_Constant()))) {
+      OpsToMoveUseTo.push_back(i);
+      NewPhiValues.push_back(nullptr);
+      continue;
+    }
+
     if (NonSimplifiedBB) return nullptr;  // More than one non-simplified value.
 
     NonSimplifiedBB = InBB;
@@ -1851,6 +1862,22 @@ Instruction *InstCombinerImpl::foldOpIntoPhi(Instruction &I, PHINode *PN) {
       return nullptr;
   }
 
+  // Clone the instruction that uses the phi node and move it into the incoming
+  // BB because we know that the next iteration of InstCombine will simplify it.
+  for (auto OpIndex : OpsToMoveUseTo) {
+    Instruction *Clone = I.clone();
+    Value *Op = PN->getIncomingValue(OpIndex);
+    BasicBlock *OpBB = PN->getIncomingBlock(OpIndex);
+    for (Use &U : Clone->operands()) {
+      if (U == PN)
+        U = Op;
+      else
+        U = U->DoPHITranslation(PN->getParent(), OpBB);
+    }
+    Clone = InsertNewInstBefore(Clone, OpBB->getTerminator()->getIterator());
+    NewPhiValues[OpIndex] = Clone;
+  }
+
   // Okay, we can do the transformation: create the new PHI node.
   PHINode *NewPN = PHINode::Create(I.getType(), PN->getNumIncomingValues());
   InsertNewInstBefore(NewPN, PN->getIterator());
diff --git a/llvm/test/Transforms/InstCombine/phi-with-multiple-unsimplifiable-values.ll b/llvm/test/Transforms/InstCombine/phi-with-multiple-unsimplifiable-values.ll
new file mode 100644
index 00000000000000..b91d8fc6e05705
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/phi-with-multiple-unsimplifiable-values.ll
@@ -0,0 +1,32 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -passes=instcombine -S | FileCheck %s
+
+define i1 @icmp_of_phi_of_scmp_with_constant(i1 %c, i16 %x, i16 %y)
+; CHECK-LABEL: define i1 @icmp_of_phi_of_scmp_with_constant(
+; CHECK-SAME: i1 [[C:%.*]], i16 [[X:%.*]], i16 [[Y:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    br i1 [[C]], label %[[TRUE:.*]], label %[[FALSE:.*]]
+; CHECK:       [[TRUE]]:
+; CHECK-NEXT:    [[TMP0:%.*]] = icmp slt i16 [[X]], [[Y]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[FALSE]]:
+; CHECK-NEXT:    [[TMP1:%.*]] = icmp slt i16 [[Y]], [[X]]
+; CHECK-NEXT:    br label %[[EXIT]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    [[R:%.*]] = phi i1 [ [[TMP0]], %[[TRUE]] ], [ [[TMP1]], %[[FALSE]] ]
+; CHECK-NEXT:    ret i1 [[R]]
+;
+{
+entry:
+  br i1 %c, label %true, label %false
+true:
+  %cmp1 = call i8 @llvm.scmp(i16 %x, i16 %y)
+  br label %exit
+false:
+  %cmp2 = call i8 @llvm.scmp(i16 %y, i16 %x)
+  br label %exit
+exit:
+  %phi = phi i8 [%cmp1, %true], [%cmp2, %false]
+  %r = icmp slt i8 %phi, 0
+  ret i1 %r
+}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please also add negative tests where one scmp is not one-use or the icmp operand is not constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case you're proposing would still actually enable this optimization since we can still move the icmp into every incoming BB, and only one of the scmps will not be removed due to it being used more than once (in this case the BB with scmp becomes what was previously NonSimplifiedBB). I think it's better to add a negative test where both scmps are not one-use, since in that case the optimization wouldn't be trivially profitable.

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Sep 9, 2024
Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!
Please wait for additional approval from other reviewers :)

exit:
%phi = phi i8 [%cmp1, %true], [%cmp2, %false]
%r = icmp slt i8 %phi, 0
ret i1 %r
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any tests where only 1 of the incoming values is an {s,u}cmp intrin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Worth noting that as implemented, the one-use check is also load-bearing for correctness, not just profitability. Otherwise we'd have to make sure we handle multi-edges correctly.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp Outdated Show resolved Hide resolved
@Poseydon42
Copy link
Contributor Author

Worth noting that as implemented, the one-use check is also load-bearing for correctness, not just profitability. Otherwise we'd have to make sure we handle multi-edges correctly.

I don't think I quite understand the problem, can you give an example of a test case where removing the one-use check would cause issues?

@nikic
Copy link
Contributor

nikic commented Sep 11, 2024

Worth noting that as implemented, the one-use check is also load-bearing for correctness, not just profitability. Otherwise we'd have to make sure we handle multi-edges correctly.

I don't think I quite understand the problem, can you give an example of a test case where removing the one-use check would cause issues?

Something like this: https://llvm.godbolt.org/z/zEfdK4b3W The tricky part is that a predecessors occurs two times in the phi, and the value must be the same for both. So if we clone the instruction two times, we'd get a verifier error. To support this case we would need to make sure that we only clone once per duplicate predecessor.

With the one-use limitation, this is a problem we can ignore for now.

@Poseydon42
Copy link
Contributor Author

Worth noting that as implemented, the one-use check is also load-bearing for correctness, not just profitability. Otherwise we'd have to make sure we handle multi-edges correctly.

I don't think I quite understand the problem, can you give an example of a test case where removing the one-use check would cause issues?

Something like this: https://llvm.godbolt.org/z/zEfdK4b3W The tricky part is that a predecessors occurs two times in the phi, and the value must be the same for both. So if we clone the instruction two times, we'd get a verifier error. To support this case we would need to make sure that we only clone once per duplicate predecessor.

With the one-use limitation, this is a problem we can ignore for now.

Thank you. I'll add a comment mentioning that so there's no confusion in the future. Apart from that the patch is OK to merge, right?

@nikic
Copy link
Contributor

nikic commented Sep 12, 2024

Yes, the patch looks good to me!

@Poseydon42 Poseydon42 merged commit 21e3a21 into llvm:main Sep 13, 2024
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants