Skip to content

[DAG] SelectionDAG::canCreateUndefOrPoison - Mark AVGFLOORS and AVGCEILS as safe #148191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5542,6 +5542,10 @@ bool SelectionDAG::canCreateUndefOrPoison(SDValue Op, const APInt &DemandedElts,
case ISD::UADDSAT:
case ISD::SSUBSAT:
case ISD::USUBSAT:
case ISD::AVGFLOORS:
case ISD::AVGFLOORU:
case ISD::AVGCEILS:
case ISD::AVGCEILU:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need test coverage for all 4 patterns

Copy link
Contributor Author

@aabhinavg1 aabhinavg1 Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @RKSimon

Thanks for the feedback!

Alive2 Proofs for AVG Operations

Opcode Operation Type Alive2 Proof Link
AVGFLOORS Signed Floor Avg Alive2 Link
AVGCEILS Signed Ceil Avg Alive2 Link
AVGFLOORU Unsigned Floor Avg Alive2 Link
AVGCEILU Unsigned Ceil Avg Alive2 Link

Let me know if need to add anything else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These proofs are weird: the whole point of the and/xor/ashr/add implementation is that you don't need to extend to a wider type, you can do all operations in i8.

case ISD::MULHU:
case ISD::MULHS:
case ISD::SMIN:
Expand Down
99 changes: 99 additions & 0 deletions llvm/test/CodeGen/AArch64/neon-hadd-freeze.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5

; RUN: llc < %s -march=arm64 -mattr=+neon | FileCheck %s
; Test that the presence of 'freeze' does not block instruction selection of:
; - uhadd
; - urhadd
; - shadd
; - srhadd

declare <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16>, <8 x i16>)
declare <8 x i16> @llvm.aarch64.neon.urhadd.v8i16(<8 x i16>, <8 x i16>)
declare <8 x i16> @llvm.aarch64.neon.shadd.v8i16(<8 x i16>, <8 x i16>)
declare <8 x i16> @llvm.aarch64.neon.srhadd.v8i16(<8 x i16>, <8 x i16>)

;===---------------------------------------------------------------------===;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these ;===-----'s. They don't add much.

; Test: freeze does not block uhadd instruction selection
;===---------------------------------------------------------------------===;

define <8 x i16> @uhadd_freeze(<8 x i16> %a0, <8 x i16> %a1) {
; CHECK-LABEL: uhadd_freeze:
; CHECK: // %bb.0:
; CHECK-NEXT: movi v2.8h, #15
; CHECK-NEXT: and v0.16b, v0.16b, v2.16b
; CHECK-NEXT: and v1.16b, v1.16b, v2.16b
; CHECK-NEXT: movi v2.8h, #31
; CHECK-NEXT: uhadd v0.8h, v0.8h, v1.8h
; CHECK-NEXT: and v0.16b, v0.16b, v2.16b
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and should have gone - any idea why it hasn't?

; CHECK-NEXT: ret
%m0 = and <8 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
%m1 = and <8 x i16> %a1, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
%avg = call <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16> %m0, <8 x i16> %m1)
%frozen = freeze <8 x i16> %avg
%masked = and <8 x i16> %frozen, <i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31>
ret <8 x i16> %masked
}

;===---------------------------------------------------------------------===;
; Test: freeze does not block urhadd instruction selection
;===---------------------------------------------------------------------===;

define <8 x i16> @urhadd_freeze(<8 x i16> %a0, <8 x i16> %a1) {
; CHECK-LABEL: urhadd_freeze:
; CHECK: // %bb.0:
; CHECK-NEXT: movi v2.8h, #15
; CHECK-NEXT: and v0.16b, v0.16b, v2.16b
; CHECK-NEXT: and v1.16b, v1.16b, v2.16b
; CHECK-NEXT: movi v2.8h, #31
; CHECK-NEXT: urhadd v0.8h, v0.8h, v1.8h
; CHECK-NEXT: and v0.16b, v0.16b, v2.16b
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and should have gone - any idea why it hasn't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM must be conservative around freeze to avoid miscompilation. Instruction selection may proceed, but value range assumptions cannot be made post-freeze.

#58321

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it help if you apply #149323?

Copy link
Contributor Author

@aabhinavg1 aabhinavg1 Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I tested with PR [#149323]
After rebuilding and rerunning llc on the neon-hadd-freeze.ll test, the instruction:

and v1.16b, v1.16b, v2.16b

is still present in the output for both uhadd_freeze and urhadd_freeze.

So it seems the patch does not remove the redundant and as expected.

; CHECK-NEXT: ret
%m0 = and <8 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
%m1 = and <8 x i16> %a1, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
%avg = call <8 x i16> @llvm.aarch64.neon.urhadd.v8i16(<8 x i16> %m0, <8 x i16> %m1)
%frozen = freeze <8 x i16> %avg
%masked = and <8 x i16> %frozen, <i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31>
ret <8 x i16> %masked
}

;===---------------------------------------------------------------------===;
; Test: freeze does not block shadd instruction selection
;===---------------------------------------------------------------------===;

define <8 x i16> @shadd_freeze(<8 x i16> %a0, <8 x i16> %a1) {
; CHECK-LABEL: shadd_freeze:
; CHECK: // %bb.0:
; CHECK-NEXT: bic v0.8h, #15
; CHECK-NEXT: bic v1.8h, #15
; CHECK-NEXT: movi v2.8h, #63
; CHECK-NEXT: shadd v0.8h, v0.8h, v1.8h
; CHECK-NEXT: and v0.16b, v0.16b, v2.16b
; CHECK-NEXT: ret
%m0 = and <8 x i16> %a0, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>
%m1 = and <8 x i16> %a1, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>
%avg = call <8 x i16> @llvm.aarch64.neon.shadd.v8i16(<8 x i16> %m0, <8 x i16> %m1)
%frozen = freeze <8 x i16> %avg
%masked = and <8 x i16> %frozen, <i16 63, i16 63, i16 63, i16 63, i16 63, i16 63, i16 63, i16 63>
ret <8 x i16> %masked
}

;===---------------------------------------------------------------------===;
; Test: freeze does not block srhadd instruction selection
;===---------------------------------------------------------------------===;

define <8 x i16> @srhadd_freeze(<8 x i16> %a0, <8 x i16> %a1) {
; CHECK-LABEL: srhadd_freeze:
; CHECK: // %bb.0:
; CHECK-NEXT: bic v0.8h, #15
; CHECK-NEXT: bic v1.8h, #15
; CHECK-NEXT: movi v2.8h, #63
; CHECK-NEXT: srhadd v0.8h, v0.8h, v1.8h
; CHECK-NEXT: and v0.16b, v0.16b, v2.16b
; CHECK-NEXT: ret
%m0 = and <8 x i16> %a0, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>
%m1 = and <8 x i16> %a1, <i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16, i16 -16>
%avg = call <8 x i16> @llvm.aarch64.neon.srhadd.v8i16(<8 x i16> %m0, <8 x i16> %m1)
%frozen = freeze <8 x i16> %avg
%masked = and <8 x i16> %frozen, <i16 63, i16 63, i16 63, i16 63, i16 63, i16 63, i16 63, i16 63>
ret <8 x i16> %masked
}
Loading