Skip to content

[AMDGPU] AMDGPULateCodeGenPrepare Legacy PM: replace setPreservesAll() with setPreservesCFG() #148167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

J-MR-T
Copy link
Contributor

@J-MR-T J-MR-T commented Jul 11, 2025

This PR depends on #148165; the first commit (90f1d0a) belongs to that PR. The changes are distinct, so separate PRs seemed like the best option. I don't have commit access, so I couldn't use user-branches to mark the dependency.

As AMDGPULateCodeGenPrepare actually performs changes that invalidate Uniformity Analysis; use setPreservesCFG() to mark this, instead of setPreservesAll() which wrongly includes preserving Uniformity Analysis.

Note that before #148165, this would still have preserved Uniformity Analysis, hence the dependency. In addition, amdgpu/llc-pipeline.cc needs to be changed when both changes are in effect, but those changes would make the test fail if the PRs weren't based on one another.

Note on why this hasn't caused issues so far:
It just so happens that AMDGPULateCodeGenPrepare is always immediately followed by AMDGPUUnifyDivergentExitNodes, which does invalidate most analyses, including Uniformity. And because UnifyDivergentExitNodes only looks at terminators, and LateCGP seemingly does not replace uniform values with divergent values, or divergent values with uniform values, and it only inserts new values that are not looked at by UnifyDivergentExitNodes, this bug remained hidden.


I ran git-clang-format on my changes. I tested them using the check-llvm target; no unexpected failures occurred after I made the change to amdgpu/llc-pipeline.ll.

@llvmbot llvmbot added backend:AMDGPU llvm:analysis Includes value tracking, cost tables and constant folding labels Jul 11, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 11, 2025

@llvm/pr-subscribers-llvm-analysis

Author: Jim M. R. Teichgräber (J-MR-T)

Changes

<details><summary>This PR depends on #148165; the first commit (90f1d0a) belongs to that PR.</summary>
<p>
The changes are distinct, so separate PRs seemed the best option. I don't have commit access, so I couldn't use user-branches to mark the dependency.
</p>
</details>

As AMDGPULateCodeGenPrepare actually performs changes that invalidate Uniformity Analysis; use setPreservesCFG() to mark this.

Note that before #148165, this would still have preserved Uniformity Analysis, hence the dependency. In addition, amdgpu/llc-pipeline.cc needs to be changed when both changes are in effect, but those changes would make the test fail if the PRs weren't based on one another.

<details><summary>Note on why this hasn't caused issues so far</summary>
<p>

It just so happens that AMDGPULateCodeGenPrepare is always immediately followed by AMDGPUUnifyDivergentExitNodes, which does invalidate most analyses, including Uniformity. And because UnifyDivergentExitNodes only looks at terminators, and LateCGP seemingly does not replace uniform values with divergent values, or divergent values with uniform values, and it only inserts new values that are not looked at by UnifyDivergentExitNodes, this bug remained hidden.

</p>
</details>


I ran git-clang-format on my changes. I tested them using the check-llvm target; no unexpected failures occurred after I made the change to amdgpu/llc-pipeline.ll.


Full diff: https://github.com/llvm/llvm-project/pull/148167.diff

3 Files Affected:

  • (modified) llvm/lib/Analysis/UniformityAnalysis.cpp (+4-1)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp (+3-1)
  • (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline.ll (+4)
diff --git a/llvm/lib/Analysis/UniformityAnalysis.cpp b/llvm/lib/Analysis/UniformityAnalysis.cpp
index 2101fdfacfc8f..0daf4041a72f3 100644
--- a/llvm/lib/Analysis/UniformityAnalysis.cpp
+++ b/llvm/lib/Analysis/UniformityAnalysis.cpp
@@ -150,8 +150,11 @@ INITIALIZE_PASS_BEGIN(UniformityInfoWrapperPass, "uniformity",
 INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(CycleInfoWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
+// Though Uniformity Analysis depends on the CFG,
+// it also needs to be invalidated if values are changed, so isCFGOnly: false.
+// See NOTE on updatability at the start of GenericUniformityImpl.h
 INITIALIZE_PASS_END(UniformityInfoWrapperPass, "uniformity",
-                    "Uniformity Analysis", true, true)
+                    "Uniformity Analysis", false, true)
 
 void UniformityInfoWrapperPass::getAnalysisUsage(AnalysisUsage &AU) const {
   AU.setPreservesAll();
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp b/llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp
index f0d63f523088b..b109fb73deaa5 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp
@@ -546,7 +546,9 @@ class AMDGPULateCodeGenPrepareLegacy : public FunctionPass {
     AU.addRequired<TargetPassConfig>();
     AU.addRequired<AssumptionCacheTracker>();
     AU.addRequired<UniformityInfoWrapperPass>();
-    AU.setPreservesAll();
+    // makes changes that can invalidate Uniformity Analysis,
+    // so don't preserveAll here (see new PM version above)
+    AU.setPreservesCFG();
   }
 
   bool runOnFunction(Function &F) override;
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll b/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
index dd2ff2e013cc8..b23662c63338c 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
@@ -255,6 +255,7 @@
 ; GCN-O1-NEXT:        Uniformity Analysis
 ; GCN-O1-NEXT:        AMDGPU IR late optimizations
 ; GCN-O1-NEXT:        Post-Dominator Tree Construction
+; GCN-O1-NEXT:        Uniformity Analysis
 ; GCN-O1-NEXT:        Unify divergent function exit nodes
 ; GCN-O1-NEXT:        Dominator Tree Construction
 ; GCN-O1-NEXT:        Cycle Info Analysis
@@ -557,6 +558,7 @@
 ; GCN-O1-OPTS-NEXT:        Uniformity Analysis
 ; GCN-O1-OPTS-NEXT:        AMDGPU IR late optimizations
 ; GCN-O1-OPTS-NEXT:        Post-Dominator Tree Construction
+; GCN-O1-OPTS-NEXT:        Uniformity Analysis
 ; GCN-O1-OPTS-NEXT:        Unify divergent function exit nodes
 ; GCN-O1-OPTS-NEXT:        Dominator Tree Construction
 ; GCN-O1-OPTS-NEXT:        Cycle Info Analysis
@@ -871,6 +873,7 @@
 ; GCN-O2-NEXT:        Uniformity Analysis
 ; GCN-O2-NEXT:        AMDGPU IR late optimizations
 ; GCN-O2-NEXT:        Post-Dominator Tree Construction
+; GCN-O2-NEXT:        Uniformity Analysis
 ; GCN-O2-NEXT:        Unify divergent function exit nodes
 ; GCN-O2-NEXT:        Dominator Tree Construction
 ; GCN-O2-NEXT:        Cycle Info Analysis
@@ -1200,6 +1203,7 @@
 ; GCN-O3-NEXT:        Uniformity Analysis
 ; GCN-O3-NEXT:        AMDGPU IR late optimizations
 ; GCN-O3-NEXT:        Post-Dominator Tree Construction
+; GCN-O3-NEXT:        Uniformity Analysis
 ; GCN-O3-NEXT:        Unify divergent function exit nodes
 ; GCN-O3-NEXT:        Dominator Tree Construction
 ; GCN-O3-NEXT:        Cycle Info Analysis

@llvmbot
Copy link
Member

llvmbot commented Jul 11, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Jim M. R. Teichgräber (J-MR-T)

Changes

<details><summary>This PR depends on #148165; the first commit (90f1d0a) belongs to that PR.</summary>
<p>
The changes are distinct, so separate PRs seemed the best option. I don't have commit access, so I couldn't use user-branches to mark the dependency.
</p>
</details>

As AMDGPULateCodeGenPrepare actually performs changes that invalidate Uniformity Analysis; use setPreservesCFG() to mark this.

Note that before #148165, this would still have preserved Uniformity Analysis, hence the dependency. In addition, amdgpu/llc-pipeline.cc needs to be changed when both changes are in effect, but those changes would make the test fail if the PRs weren't based on one another.

<details><summary>Note on why this hasn't caused issues so far</summary>
<p>

It just so happens that AMDGPULateCodeGenPrepare is always immediately followed by AMDGPUUnifyDivergentExitNodes, which does invalidate most analyses, including Uniformity. And because UnifyDivergentExitNodes only looks at terminators, and LateCGP seemingly does not replace uniform values with divergent values, or divergent values with uniform values, and it only inserts new values that are not looked at by UnifyDivergentExitNodes, this bug remained hidden.

</p>
</details>


I ran git-clang-format on my changes. I tested them using the check-llvm target; no unexpected failures occurred after I made the change to amdgpu/llc-pipeline.ll.


Full diff: https://github.com/llvm/llvm-project/pull/148167.diff

3 Files Affected:

  • (modified) llvm/lib/Analysis/UniformityAnalysis.cpp (+4-1)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp (+3-1)
  • (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline.ll (+4)
diff --git a/llvm/lib/Analysis/UniformityAnalysis.cpp b/llvm/lib/Analysis/UniformityAnalysis.cpp
index 2101fdfacfc8f..0daf4041a72f3 100644
--- a/llvm/lib/Analysis/UniformityAnalysis.cpp
+++ b/llvm/lib/Analysis/UniformityAnalysis.cpp
@@ -150,8 +150,11 @@ INITIALIZE_PASS_BEGIN(UniformityInfoWrapperPass, "uniformity",
 INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(CycleInfoWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
+// Though Uniformity Analysis depends on the CFG,
+// it also needs to be invalidated if values are changed, so isCFGOnly: false.
+// See NOTE on updatability at the start of GenericUniformityImpl.h
 INITIALIZE_PASS_END(UniformityInfoWrapperPass, "uniformity",
-                    "Uniformity Analysis", true, true)
+                    "Uniformity Analysis", false, true)
 
 void UniformityInfoWrapperPass::getAnalysisUsage(AnalysisUsage &AU) const {
   AU.setPreservesAll();
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp b/llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp
index f0d63f523088b..b109fb73deaa5 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp
@@ -546,7 +546,9 @@ class AMDGPULateCodeGenPrepareLegacy : public FunctionPass {
     AU.addRequired<TargetPassConfig>();
     AU.addRequired<AssumptionCacheTracker>();
     AU.addRequired<UniformityInfoWrapperPass>();
-    AU.setPreservesAll();
+    // makes changes that can invalidate Uniformity Analysis,
+    // so don't preserveAll here (see new PM version above)
+    AU.setPreservesCFG();
   }
 
   bool runOnFunction(Function &F) override;
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll b/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
index dd2ff2e013cc8..b23662c63338c 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
@@ -255,6 +255,7 @@
 ; GCN-O1-NEXT:        Uniformity Analysis
 ; GCN-O1-NEXT:        AMDGPU IR late optimizations
 ; GCN-O1-NEXT:        Post-Dominator Tree Construction
+; GCN-O1-NEXT:        Uniformity Analysis
 ; GCN-O1-NEXT:        Unify divergent function exit nodes
 ; GCN-O1-NEXT:        Dominator Tree Construction
 ; GCN-O1-NEXT:        Cycle Info Analysis
@@ -557,6 +558,7 @@
 ; GCN-O1-OPTS-NEXT:        Uniformity Analysis
 ; GCN-O1-OPTS-NEXT:        AMDGPU IR late optimizations
 ; GCN-O1-OPTS-NEXT:        Post-Dominator Tree Construction
+; GCN-O1-OPTS-NEXT:        Uniformity Analysis
 ; GCN-O1-OPTS-NEXT:        Unify divergent function exit nodes
 ; GCN-O1-OPTS-NEXT:        Dominator Tree Construction
 ; GCN-O1-OPTS-NEXT:        Cycle Info Analysis
@@ -871,6 +873,7 @@
 ; GCN-O2-NEXT:        Uniformity Analysis
 ; GCN-O2-NEXT:        AMDGPU IR late optimizations
 ; GCN-O2-NEXT:        Post-Dominator Tree Construction
+; GCN-O2-NEXT:        Uniformity Analysis
 ; GCN-O2-NEXT:        Unify divergent function exit nodes
 ; GCN-O2-NEXT:        Dominator Tree Construction
 ; GCN-O2-NEXT:        Cycle Info Analysis
@@ -1200,6 +1203,7 @@
 ; GCN-O3-NEXT:        Uniformity Analysis
 ; GCN-O3-NEXT:        AMDGPU IR late optimizations
 ; GCN-O3-NEXT:        Post-Dominator Tree Construction
+; GCN-O3-NEXT:        Uniformity Analysis
 ; GCN-O3-NEXT:        Unify divergent function exit nodes
 ; GCN-O3-NEXT:        Dominator Tree Construction
 ; GCN-O3-NEXT:        Cycle Info Analysis

@J-MR-T
Copy link
Contributor Author

J-MR-T commented Jul 11, 2025

CC @nhaehnle

@jayfoad
Copy link
Contributor

jayfoad commented Jul 11, 2025

Your desciption contains HTML style markup tags. I don't think we want that in the git commit message when you squash-and-merge, do we?

@J-MR-T
Copy link
Contributor Author

J-MR-T commented Jul 11, 2025

Ah, good point, I didn't consider that the description is included as well. Given that the notes aren't too long, always showing them shouldn't matter too much. Thanks!

Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code change LGTM

Copy link
Collaborator

@nhaehnle nhaehnle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM, but a nit in the comments. (Also, this depends on #148165 which should go in first)

@J-MR-T J-MR-T force-pushed the amdgpu-late-cgp-preserve-cfg-not-all branch from 4b7737c to 9b81632 Compare July 11, 2025 14:49
@J-MR-T
Copy link
Contributor Author

J-MR-T commented Jul 11, 2025

(Now the first 3 commits belong to the dependent PR #148165, because of review changes there)

nhaehnle pushed a commit that referenced this pull request Jul 11, 2025
…lse (#148165)

Currently, Uniformity Analysis is preserved in the Legacy PM when a pass
sets `setPreservesCFG()`. This is incorrect: any values' uniformity not
only depends on the CFG, but also on the uniformity of other values, so
a CFG-preserving change in many cases doesn't preserve uniformity
analysis.

This corrected behavior also matches the behavior of the new PM.

---

On its own, this change does not affect the pass pipeline because of the
happenstance of pass ordering. I also created a PR to change
AMDGPULateCodeGenPrepare (#148167), this will have an actual impact on
pass executions. That PR also includes changes to the
`amdgpu/llc-pipeline.ll` test in order to check that this change works,
but if this is preferred, I would also be happy to try to extend this PR
to add an isolated test case; though my personal opinion is that the
test in #148167 should suffice, as it should also accurately pinpoint
failures related to this change.

---

I ran `git-clang-format` on my changes. I tested them using the
`check-llvm` target; no unexpected failures occurred.
@J-MR-T
Copy link
Contributor Author

J-MR-T commented Jul 12, 2025

Should be ready now :). If you agree, could I ask you to merge this one as well? I don't have commit perms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU llvm:analysis Includes value tracking, cost tables and constant folding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants