add typed based alias analysis blog post #530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

gerardogtn wants to merge 1 commit into sampsyo:2025sp from gerardogtn:gteruel/tbaa

Contributor

gerardogtn commented May 6, 2025

Blog post for TBAA with @mt-xing and @arnavm30.


          add typed based alias analysis blog post

2b67df5

sampsyo requested changes

View reviewed changes

Owner

sampsyo left a comment

Really nice work on this; you’ve done a great job highlighting and contextualizing the main ideas. I just have a few minor suggestions—when you have a chance to address them, please “request a review” in GitHub to let me know to take another look.

content/blog/2025-05-06-typed-based-alias-analysis.md

+              +++
+              One of the largest challenges facing an optimizing compiler is the fact that multiple variables in a program may refer to the same underlying memory, known as aliasing. The possibility of aliasing results in many compiler optimizations becoming unsafe, resulting in the compiler being unable to apply many seemingly reasonable optimizations unless it is able to conclusively determine that two variables are in fact not aliases.
+              This week’s paper, “Type-Based Alias Analysis” by Amer Diwan, Kathryn S. McKinley, and J. Eliot B. Moss, proposes a method of using type information and other statically available information like field names in order to better find variable pairs that are guaranteed not to be aliases.

Owner

sampsyo May 7, 2025

Can you please include a hyperlink from the title to the paper itself?

content/blog/2025-05-06-typed-based-alias-analysis.md

+              	return *a == *b;
+              }
+              ```
+              It would be convenient if the compiler could optimize the return value to be the constant false. Especially if a and b were dead after this operation, the entire function could be inlined to simply be false. However, if the pointers a and b both refer to the same memory location, then the result would actually be true, since \*a and \*b would fetch the same value.

Owner

sampsyo May 7, 2025

Markdown tip: instead of escaping the * character with a backslash, an easier (and more readable) way would be to use a backtick-delimited code span around all variable names and C expressions.

content/blog/2025-05-06-typed-based-alias-analysis.md

+              ```
+              It would be convenient if the compiler could optimize the return value to be the constant false. Especially if a and b were dead after this operation, the entire function could be inlined to simply be false. However, if the pointers a and b both refer to the same memory location, then the result would actually be true, since \*a and \*b would fetch the same value.
+              For this reason, compiler optimizations depend upon alias analysis, where the compiler attempts to statically determine which variables could alias which others. Specifically, the compiler is generally looking for cases where two variables must not be aliases, as those are where opportunities for optimization typically arise.
+              A naive approach to alias analysis could involve a simple dataflow analysis, checking for what possible memory locations a variable could point to. However, due to the conservative nature of this analysis, it necessarily results in a large number of false positives.

Owner

sampsyo May 7, 2025

It’s also not necessarily the most efficient thing in the world, especially if you have a complicated heap model. (I mention this because one claimed advantage of TBAA is its simplicity/efficiency.)

content/blog/2025-05-06-typed-based-alias-analysis.md

		## Annotations, Rich Types, and Alias Analysis


		One of the aspects of alias analysis is that it underestimates the true count of aliasing in a program. In TBAA it might be the case that we write a function that may alias but that in the total usages of the function it never actually aliases. In this scenario, we lose the opportunity to optimize some code, but we accept this tradeoff in order to keep the analysis fast and correct. In particular, TBAA is really powerful as it only relies on language constructs to perform the analysis so it doesn’t require special modifications to the source code to improve the results of the alias analysis. But this same approach opens more doors for optimizations if the source language has a richer type system or if there is a way for programmers to provide hints to the compiler. An example is the restrict keyword in C, where the programmers inform the compiler there will be no aliasing between parameters:

Owner

sampsyo May 7, 2025

underestimates the true count of aliasing in a program

Could you possibly tweak this wording to make it clear which “direction” you’re claiming? Maybe you mean “underestimates the number of pairs of variables that will alias at run time,” or maybe the opposite, like “underestimates the opportunities for optimization”? The current wording sounds like the former, but the latter seems more correct and aligns with the rest of the paragraph…

content/blog/2025-05-06-typed-based-alias-analysis.md

		## Annotations, Rich Types, and Alias Analysis


		One of the aspects of alias analysis is that it underestimates the true count of aliasing in a program. In TBAA it might be the case that we write a function that may alias but that in the total usages of the function it never actually aliases. In this scenario, we lose the opportunity to optimize some code, but we accept this tradeoff in order to keep the analysis fast and correct. In particular, TBAA is really powerful as it only relies on language constructs to perform the analysis so it doesn’t require special modifications to the source code to improve the results of the alias analysis. But this same approach opens more doors for optimizations if the source language has a richer type system or if there is a way for programmers to provide hints to the compiler. An example is the restrict keyword in C, where the programmers inform the compiler there will be no aliasing between parameters:

Owner

sampsyo May 7, 2025

TBAA is really powerful as it only relies on language constructs to perform the analysis so it doesn’t require special modifications to the source code

Hmm… I would say that this is true of all alias analyses. As in, every alias analysis in the world works with the language as-is and does not requires special aliasing annotations. Stuff like restrict isn’t an alias analysis; it’s an “escape hatch” from the analysis.

content/blog/2025-05-06-typed-based-alias-analysis.md

+              bool myfn(std::unique_ptr<int> a, std::unique_ptr<int> b) {
+              	return false;
+              }
+              ```

Owner

sampsyo May 7, 2025

Just to confirm: is this a hypothetical, or did you actually try this in a real C++ compiler? Hypothetical is totally fine, but if you happen to have tried it out, that would be nice to mention explicitly (along with the compiler name & version you used).

content/blog/2025-05-06-typed-based-alias-analysis.md

+              }
+              ```
+              So although the original paper did not consider annotations or richer type constructs, the idea can easily be extended to provide even better results for alias analysis.

Owner

sampsyo May 7, 2025

I like this section! It’s great to get somewhat deeper into what language features make TBAA more or less powerful.

content/blog/2025-05-06-typed-based-alias-analysis.md


		## Measuring an Upper Bound

		Despite the large number of compiler optimization techniques, it is often difficult to gauge how impactful an optimization will be in the performance of an arbitrary program; the most common approach is to use a set of benchmarks execute them a number of times in the original source and in the optimized source do a pairwise comparison (usually in ratios) and aggregate the gains using the arithmetic or harmonic mean. This approach is not perfect, but makes it easy to compare different optimizations and is standard in literature. Diwan, McKinley and Moss used a completely different approach for TBAA, one that relies on establishing an upper bound.

Owner

sampsyo May 7, 2025

the most common approach is to use a set of benchmarks execute them a number of times in the original source and in the optimized source do a pairwise comparison

Seems like there are some missing commas in here?

content/blog/2025-05-06-typed-based-alias-analysis.md


		## Connections to Computing Landscape

		As compilers have evolved, TBAA has been applied to not only type safe languages but also unsafe languages like C and C++. The introduction of the “strict aliasing rule” in the [C99 standard](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf) allowed compilers to safely assume pointers of incompatible types wouldn’t alias, treating any violations as undefined behavior. There are some exceptions, though, such as character pointers. Compilers such as GCC and LLVM exploit this rule to enable more aggressive optimization. GCC enables [-fstrict-aliasing](https://gcc.gnu.org/onlinedocs/gcc-7.5.0/gnat_ugn/Optimization-and-Strict-Aliasing.html) by default at higher optimization levels like -O2 and LLVM/Clang enables a similar [TBAA metadata system](https://llvm.org/docs/LangRef.html#tbaa-metadata) by default at all optimization levels. A notable drawback is that legacy code that violated these strict aliasing rules could break unexpectedly, prompting the creation and usage of the [-fno-strict-aliasing](https://gcc.gnu.org/onlinedocs/gcc-7.5.0/gnat_ugn/Optimization-and-Strict-Aliasing.html) flag to preserve the previous behavior; moreover, Clang provides [TypeSanitizer](https://clang.llvm.org/docs/TypeSanitizer.html) (enabled with -fsanitize=type flag), a run-time library that uses TBAA metadata and dynamic instrumentation to detect strict type aliasing violations.

Owner

sampsyo May 7, 2025

I hadn’t heard of TypeSanitizer; that’s very cool!

sampsyo added the 2025sp label

Owner

sampsyo commented May 28, 2025

Hi, @gerardogtn @mt-xing @arnavm30! I would love to publish your blog post. Can you please wrap up the revisions discussed above so I can hit the green button?

Owner

sampsyo commented Jun 5, 2025

Just one last reminder about the above, @gerardogtn @mt-xing @arnavm30—if you don't want to wrap up the small additions above, we can just close this PR.

mt-xing mentioned this pull request

Type based alias analysis blog post (v2) #550

Open

Contributor

mt-xing commented Jun 28, 2025 •

edited

Loading

Hi Professor. I don't have access to Gerardo's branch, so I've made a separate PR from my own fork that attempts to address the comments. When you have time, would you be able to review the updated PR #550 instead? In particular, this commit shows the diff of all the changes. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels