Skip to content

[DAG] SelectionDAG::canCreateUndefOrPoison - Mark AVGFLOORS and AVGCEILS as safe #148191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

aabhinavg1
Copy link
Contributor

@aabhinavg1 aabhinavg1 commented Jul 11, 2025

This patch updates SelectionDAG::canCreateUndefOrPoison to indicate that ISD::AVGFLOORS and ISD::AVGCEILS do not introduce poison or undef values.

This is formally verified using Alive2:

These patterns are safe due to the use of sext i8 into i32, which ensures no signed overflow occurs. The arithmetic is done in the wider domain before truncating safely back to i8.

Includes test coverage to ensure correctness.

Differential Revision:

…ILS as safe

This patch updates `SelectionDAG::canCreateUndefOrPoison` to indicate that
`ISD::AVGFLOORS` and `ISD::AVGCEILS` do not introduce poison or undef values.

This is formally verified using Alive2:
- AVGFLOORS: https://alive2.llvm.org/ce/z/JWZcNr
- AVGCEILS: https://alive2.llvm.org/ce/z/cW3jrR

These patterns are safe due to the use of `sext i8` into `i32`, which ensures
no signed overflow occurs. The arithmetic is done in the wider domain before
truncating safely back to `i8`.

Includes test coverage to ensure correctness.

Differential Revision: <fill this after uploading to Phabricator or GH>
@llvmbot llvmbot added backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well labels Jul 11, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 11, 2025

@llvm/pr-subscribers-llvm-selectiondag

Author: None (aabhinavg1)

Changes

…ILS as safe

This patch updates SelectionDAG::canCreateUndefOrPoison to indicate that ISD::AVGFLOORS and ISD::AVGCEILS do not introduce poison or undef values.

This is formally verified using Alive2:

These patterns are safe due to the use of sext i8 into i32, which ensures no signed overflow occurs. The arithmetic is done in the wider domain before truncating safely back to i8.

Includes test coverage to ensure correctness.

Differential Revision: <fill this after uploading to Phabricator or GH>


Full diff: https://github.com/llvm/llvm-project/pull/148191.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+4)
  • (added) llvm/test/CodeGen/AArch64/avgfloor-u8.ll (+16)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index c1356239ad206..78f00809e3862 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -5542,6 +5542,10 @@ bool SelectionDAG::canCreateUndefOrPoison(SDValue Op, const APInt &DemandedElts,
   case ISD::UADDSAT:
   case ISD::SSUBSAT:
   case ISD::USUBSAT:
+  case ISD::AVGFLOORS:
+  case ISD::AVGFLOORU:
+  case ISD::AVGCEILS:
+  case ISD::AVGCEILU:
   case ISD::MULHU:
   case ISD::MULHS:
   case ISD::SMIN:
diff --git a/llvm/test/CodeGen/AArch64/avgfloor-u8.ll b/llvm/test/CodeGen/AArch64/avgfloor-u8.ll
new file mode 100644
index 0000000000000..669f6bbad14a3
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/avgfloor-u8.ll
@@ -0,0 +1,16 @@
+; RUN: llc < %s -march=arm64 -mcpu=apple-m1 | FileCheck %s
+
+; CHECK-LABEL: avg:
+; CHECK:       add
+; CHECK:       lsr
+; CHECK:       ret
+
+define zeroext i8 @avg(i8 noundef zeroext %a, i8 noundef zeroext %b) {
+entry:
+  %conv = zext i8 %a to i16
+  %conv1 = zext i8 %b to i16
+  %add = add nuw nsw i16 %conv1, %conv
+  %div3 = lshr i16 %add, 1
+  %conv2 = trunc nuw i16 %div3 to i8
+  ret i8 %conv2
+}

@llvmbot
Copy link
Member

llvmbot commented Jul 11, 2025

@llvm/pr-subscribers-backend-aarch64

Author: None (aabhinavg1)

Changes

…ILS as safe

This patch updates SelectionDAG::canCreateUndefOrPoison to indicate that ISD::AVGFLOORS and ISD::AVGCEILS do not introduce poison or undef values.

This is formally verified using Alive2:

These patterns are safe due to the use of sext i8 into i32, which ensures no signed overflow occurs. The arithmetic is done in the wider domain before truncating safely back to i8.

Includes test coverage to ensure correctness.

Differential Revision: <fill this after uploading to Phabricator or GH>


Full diff: https://github.com/llvm/llvm-project/pull/148191.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+4)
  • (added) llvm/test/CodeGen/AArch64/avgfloor-u8.ll (+16)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index c1356239ad206..78f00809e3862 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -5542,6 +5542,10 @@ bool SelectionDAG::canCreateUndefOrPoison(SDValue Op, const APInt &DemandedElts,
   case ISD::UADDSAT:
   case ISD::SSUBSAT:
   case ISD::USUBSAT:
+  case ISD::AVGFLOORS:
+  case ISD::AVGFLOORU:
+  case ISD::AVGCEILS:
+  case ISD::AVGCEILU:
   case ISD::MULHU:
   case ISD::MULHS:
   case ISD::SMIN:
diff --git a/llvm/test/CodeGen/AArch64/avgfloor-u8.ll b/llvm/test/CodeGen/AArch64/avgfloor-u8.ll
new file mode 100644
index 0000000000000..669f6bbad14a3
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/avgfloor-u8.ll
@@ -0,0 +1,16 @@
+; RUN: llc < %s -march=arm64 -mcpu=apple-m1 | FileCheck %s
+
+; CHECK-LABEL: avg:
+; CHECK:       add
+; CHECK:       lsr
+; CHECK:       ret
+
+define zeroext i8 @avg(i8 noundef zeroext %a, i8 noundef zeroext %b) {
+entry:
+  %conv = zext i8 %a to i16
+  %conv1 = zext i8 %b to i16
+  %add = add nuw nsw i16 %conv1, %conv
+  %div3 = lshr i16 %add, 1
+  %conv2 = trunc nuw i16 %div3 to i8
+  ret i8 %conv2
+}

@jayfoad
Copy link
Contributor

jayfoad commented Jul 11, 2025

Code change looks fine but I don't understand what the test you added actually shows.

Differential Revision:

Don't add this! We haven't used this since we moved away from Phabricator for code reviews.

@jayfoad
Copy link
Contributor

jayfoad commented Jul 11, 2025

Also, I don't understand how your alive2 links are relevant. They verify an expansion of avgfloors/avgceils, but what does this have to do with introducing poison?

@aabhinavg1
Copy link
Contributor Author

aabhinavg1 commented Jul 11, 2025

Hi @jayfoad
Thanks for pointing that out,

I included the Alive2 links to show that the lowering of AVGFLOORS and AVGCEILS avoids poison, but I get that it doesn't directly prove that the SelectionDAG op itself is safe.

If you don’t mind, could you suggest what kind of test cases would better show that these ops don’t introduce poison or undef?

@RKSimon
Copy link
Collaborator

RKSimon commented Jul 11, 2025

You need to remove the noundef and compare the effect of freezing the result vs the inputs:

----------------------------------------
define i8 @src(i8 %a, i8 %b) {
#0:
  %a32 = sext i8 %a to i32
  %b32 = sext i8 %b to i32
  %and = and i32 %a32, %b32
  %xor = xor i32 %a32, %b32
  %shr = ashr i32 %xor, 1
  %res = add i32 %and, %shr
  %trunc = trunc i32 %res to i8
  %freeze = freeze i8 %trunc
  ret i8 %freeze
}
=>
define i8 @tgt(i8 %a, i8 %b) {
#0:
  %fa = freeze i8 %a
  %fb = freeze i8 %b
  %a32 = sext i8 %fa to i32
  %b32 = sext i8 %fb to i32
  %and = and i32 %a32, %b32
  %xor = xor i32 %a32, %b32
  %shr = ashr i32 %xor, 1
  %res = add i32 %and, %shr
  %trunc = trunc i32 %res to i8
  ret i8 %trunc
}
Transformation seems to be correct!

(if you remove either of the dst freezes it should fail)

@RKSimon RKSimon changed the title [DAG] SelectionDAG::canCreateUndefOrPoison - Mark AVGFLOORS and AVGCE… [DAG] SelectionDAG::canCreateUndefOrPoison - Mark AVGFLOORS and AVGCEILS as safe Jul 11, 2025
case ISD::AVGFLOORS:
case ISD::AVGFLOORU:
case ISD::AVGCEILS:
case ISD::AVGCEILU:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need test coverage for all 4 patterns

Copy link
Contributor Author

@aabhinavg1 aabhinavg1 Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @RKSimon

Thanks for the feedback!

Alive2 Proofs for AVG Operations

Opcode Operation Type Alive2 Proof Link
AVGFLOORS Signed Floor Avg Alive2 Link
AVGCEILS Signed Ceil Avg Alive2 Link
AVGFLOORU Unsigned Floor Avg Alive2 Link
AVGCEILU Unsigned Ceil Avg Alive2 Link

Let me know if need to add anything else.

@davemgreen
Copy link
Collaborator

Could you add a tests to llvm/test/CodeGen/AArch64/hadd-combine.ll that use the aarch64.neon.uhadd/urhadd intrinsics? They should be converted to these avg nodes so can be tested directly.

@aabhinavg1
Copy link
Contributor Author

Could you add a tests to llvm/test/CodeGen/AArch64/hadd-combine.ll that use the aarch64.neon.uhadd/urhadd intrinsics? They should be converted to these avg nodes so can be tested directly.

Hi @davemgreen,

Thanks for the suggestion!

I've drafted the following test cases in llvm/test/CodeGen/AArch64/hadd-combine.ll to exercise the aarch64.neon.uhadd and aarch64.neon.urhadd intrinsics:

; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -mattr=+neon | FileCheck %s

declare <8 x i8> @llvm.aarch64.neon.uhadd.v8i8(<8 x i8>, <8 x i8>)
declare <8 x i8> @llvm.aarch64.neon.urhadd.v8i8(<8 x i8>, <8 x i8>)
declare <16 x i8> @llvm.aarch64.neon.uhadd.v16i8(<16 x i8>, <16 x i8>)
declare <16 x i8> @llvm.aarch64.neon.urhadd.v16i8(<16 x i8>, <16 x i8>)

; CHECK-LABEL: test_uhadd_v8i8:
; CHECK: uhadd v0.8b, v0.8b, v1.8b
define <8 x i8> @test_uhadd_v8i8(<8 x i8> %a, <8 x i8> %b) {
  %res = call <8 x i8> @llvm.aarch64.neon.uhadd.v8i8(<8 x i8> %a, <8 x i8> %b)
  ret <8 x i8> %res
}

; CHECK-LABEL: test_urhadd_v8i8:
; CHECK: urhadd v0.8b, v0.8b, v1.8b
define <8 x i8> @test_urhadd_v8i8(<8 x i8> %a, <8 x i8> %b) {
  %res = call <8 x i8> @llvm.aarch64.neon.urhadd.v8i8(<8 x i8> %a, <8 x i8> %b)
  ret <8 x i8> %res
}

; CHECK-LABEL: test_uhadd_v16i8:
; CHECK: uhadd v0.16b, v0.16b, v1.16b
define <16 x i8> @test_uhadd_v16i8(<16 x i8> %a, <16 x i8> %b) {
  %res = call <16 x i8> @llvm.aarch64.neon.uhadd.v16i8(<16 x i8> %a, <16 x i8> %b)
  ret <16 x i8> %res
}

; CHECK-LABEL: test_urhadd_v16i8:
; CHECK: urhadd v0.16b, v0.16b, v1.16b
define <16 x i8> @test_urhadd_v16i8(<16 x i8> %a, <16 x i8> %b) {
  %res = call <16 x i8> @llvm.aarch64.neon.urhadd.v16i8(<16 x i8> %a, <16 x i8> %b)
  ret <16 x i8> %res
}

Would this be acceptable for verifying that the uhadd/urhadd intrinsics get correctly lowered into the corresponding DAG avg patterns?

@aabhinavg1 aabhinavg1 self-assigned this Jul 11, 2025
@RKSimon
Copy link
Collaborator

RKSimon commented Jul 11, 2025

What could be done is to fold based on the knownbits/signbits result of a avgu/avgs node - if the freeze is moved then the knownbits/signbits calculation is exposed and can allow the fold to occur.

@EugeneZelenko EugeneZelenko removed their request for review July 11, 2025 13:24
@aabhinavg1
Copy link
Contributor Author

Hi @RKSimon

here is the test could you please look in to it if i have not missed anything

; RUN: opt -passes=instcombine -S < %s | FileCheck %s

; === Signed Floor Average (AVGFLOORS) ===

define i8 @avg_signed_floor_freeze_before(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_signed_floor_freeze_before(
; CHECK: freeze i8
entry:
  %fa = freeze i8 %a
  %fb = freeze i8 %b
  %a32 = sext i8 %fa to i32
  %b32 = sext i8 %fb to i32
  %and = and i32 %a32, %b32
  %xor = xor i32 %a32, %b32
  %shr = ashr i32 %xor, 1
  %sum = add i32 %and, %shr
  %res = trunc i32 %sum to i8
  ret i8 %res
}

define i8 @avg_signed_floor_freeze_after(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_signed_floor_freeze_after(
; CHECK: freeze i8
entry:
  %a32 = sext i8 %a to i32
  %b32 = sext i8 %b to i32
  %and = and i32 %a32, %b32
  %xor = xor i32 %a32, %b32
  %shr = ashr i32 %xor, 1
  %sum = add i32 %and, %shr
  %res = trunc i32 %sum to i8
  %fr = freeze i8 %res
  ret i8 %fr
}

; === Signed Ceil Average (AVGCEILS) ===

define i8 @avg_signed_ceil_freeze_before(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_signed_ceil_freeze_before(
; CHECK: freeze i8
entry:
  %fa = freeze i8 %a
  %fb = freeze i8 %b
  %a32 = sext i8 %fa to i32
  %b32 = sext i8 %fb to i32
  %add = add i32 %a32, %b32
  %add1 = add i32 %add, 1
  %shr = ashr i32 %add1, 1
  %res = trunc i32 %shr to i8
  ret i8 %res
}

define i8 @avg_signed_ceil_freeze_after(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_signed_ceil_freeze_after(
; CHECK: freeze i8
entry:
  %a32 = sext i8 %a to i32
  %b32 = sext i8 %b to i32
  %add = add i32 %a32, %b32
  %add1 = add i32 %add, 1
  %shr = ashr i32 %add1, 1
  %res = trunc i32 %shr to i8
  %fr = freeze i8 %res
  ret i8 %fr
}

; === Unsigned Floor Average (AVGFLOORU) ===

define i8 @avg_unsigned_floor_freeze_before(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_unsigned_floor_freeze_before(
; CHECK: freeze i8
entry:
  %fa = freeze i8 %a
  %fb = freeze i8 %b
  %a32 = zext i8 %fa to i32
  %b32 = zext i8 %fb to i32
  %and = and i32 %a32, %b32
  %xor = xor i32 %a32, %b32
  %shr = lshr i32 %xor, 1
  %sum = add i32 %and, %shr
  %res = trunc i32 %sum to i8
  ret i8 %res
}

define i8 @avg_unsigned_floor_freeze_after(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_unsigned_floor_freeze_after(
; CHECK: freeze i8
entry:
  %a32 = zext i8 %a to i32
  %b32 = zext i8 %b to i32
  %and = and i32 %a32, %b32
  %xor = xor i32 %a32, %b32
  %shr = lshr i32 %xor, 1
  %sum = add i32 %and, %shr
  %res = trunc i32 %sum to i8
  %fr = freeze i8 %res
  ret i8 %fr
}

; === Unsigned Ceil Average (AVGCEILU) ===

define i8 @avg_unsigned_ceil_freeze_before(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_unsigned_ceil_freeze_before(
; CHECK: freeze i8
entry:
  %fa = freeze i8 %a
  %fb = freeze i8 %b
  %a32 = zext i8 %fa to i32
  %b32 = zext i8 %fb to i32
  %add = add i32 %a32, %b32
  %add1 = add i32 %add, 1
  %shr = lshr i32 %add1, 1
  %res = trunc i32 %shr to i8
  ret i8 %res
}

define i8 @avg_unsigned_ceil_freeze_after(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_unsigned_ceil_freeze_after(
; CHECK: freeze i8
entry:
  %a32 = zext i8 %a to i32
  %b32 = zext i8 %b to i32
  %add = add i32 %a32, %b32
  %add1 = add i32 %add, 1
  %shr = lshr i32 %add1, 1
  %res = trunc i32 %shr to i8
  %fr = freeze i8 %res
  ret i8 %fr
}

I’ve added avg-poison-safety.ll with test coverage for all four ops:

  • AVGFLOORS (signed floor)
  • AVGCEILS (signed ceil)
  • AVGFLOORU (unsigned floor)
  • AVGCEILU (unsigned ceil)

Could you please review if I’ve missed anything?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants