Skip to content

add typed based alias analysis blog post #530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 2025sp
Choose a base branch
from

Conversation

gerardogtn
Copy link
Contributor

Blog post for TBAA with @mt-xing and @arnavm30.

Copy link
Owner

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work on this; you’ve done a great job highlighting and contextualizing the main ideas. I just have a few minor suggestions—when you have a chance to address them, please “request a review” in GitHub to let me know to take another look.

+++

One of the largest challenges facing an optimizing compiler is the fact that multiple variables in a program may refer to the same underlying memory, known as aliasing. The possibility of aliasing results in many compiler optimizations becoming unsafe, resulting in the compiler being unable to apply many seemingly reasonable optimizations unless it is able to conclusively determine that two variables are in fact not aliases.
This week’s paper, “Type-Based Alias Analysis” by Amer Diwan, Kathryn S. McKinley, and J. Eliot B. Moss, proposes a method of using type information and other statically available information like field names in order to better find variable pairs that are guaranteed not to be aliases.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please include a hyperlink from the title to the paper itself?

return *a == *b;
}
```
It would be convenient if the compiler could optimize the return value to be the constant false. Especially if a and b were dead after this operation, the entire function could be inlined to simply be false. However, if the pointers a and b both refer to the same memory location, then the result would actually be true, since \*a and \*b would fetch the same value.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Markdown tip: instead of escaping the * character with a backslash, an easier (and more readable) way would be to use a backtick-delimited code span around all variable names and C expressions.

```
It would be convenient if the compiler could optimize the return value to be the constant false. Especially if a and b were dead after this operation, the entire function could be inlined to simply be false. However, if the pointers a and b both refer to the same memory location, then the result would actually be true, since \*a and \*b would fetch the same value.
For this reason, compiler optimizations depend upon alias analysis, where the compiler attempts to statically determine which variables could alias which others. Specifically, the compiler is generally looking for cases where two variables must not be aliases, as those are where opportunities for optimization typically arise.
A naive approach to alias analysis could involve a simple dataflow analysis, checking for what possible memory locations a variable could point to. However, due to the conservative nature of this analysis, it necessarily results in a large number of false positives.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s also not necessarily the most efficient thing in the world, especially if you have a complicated heap model. (I mention this because one claimed advantage of TBAA is its simplicity/efficiency.)

## Annotations, Rich Types, and Alias Analysis


One of the aspects of alias analysis is that it underestimates the true count of aliasing in a program. In TBAA it might be the case that we write a function that may alias but that in the total usages of the function it never actually aliases. In this scenario, we lose the opportunity to optimize some code, but we accept this tradeoff in order to keep the analysis fast and correct. In particular, TBAA is really powerful as it only relies on language constructs to perform the analysis so it doesn’t require special modifications to the source code to improve the results of the alias analysis. But this same approach opens more doors for optimizations if the source language has a richer type system or if there is a way for programmers to provide hints to the compiler. An example is the restrict keyword in C, where the programmers inform the compiler there will be no aliasing between parameters:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

underestimates the true count of aliasing in a program

Could you possibly tweak this wording to make it clear which “direction” you’re claiming? Maybe you mean “underestimates the number of pairs of variables that will alias at run time,” or maybe the opposite, like “underestimates the opportunities for optimization”? The current wording sounds like the former, but the latter seems more correct and aligns with the rest of the paragraph…

## Annotations, Rich Types, and Alias Analysis


One of the aspects of alias analysis is that it underestimates the true count of aliasing in a program. In TBAA it might be the case that we write a function that may alias but that in the total usages of the function it never actually aliases. In this scenario, we lose the opportunity to optimize some code, but we accept this tradeoff in order to keep the analysis fast and correct. In particular, TBAA is really powerful as it only relies on language constructs to perform the analysis so it doesn’t require special modifications to the source code to improve the results of the alias analysis. But this same approach opens more doors for optimizations if the source language has a richer type system or if there is a way for programmers to provide hints to the compiler. An example is the restrict keyword in C, where the programmers inform the compiler there will be no aliasing between parameters:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBAA is really powerful as it only relies on language constructs to perform the analysis so it doesn’t require special modifications to the source code

Hmm… I would say that this is true of all alias analyses. As in, every alias analysis in the world works with the language as-is and does not requires special aliasing annotations. Stuff like restrict isn’t an alias analysis; it’s an “escape hatch” from the analysis.

bool myfn(std::unique_ptr<int> a, std::unique_ptr<int> b) {
return false;
}
```
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm: is this a hypothetical, or did you actually try this in a real C++ compiler? Hypothetical is totally fine, but if you happen to have tried it out, that would be nice to mention explicitly (along with the compiler name & version you used).

}
```

So although the original paper did not consider annotations or richer type constructs, the idea can easily be extended to provide even better results for alias analysis.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this section! It’s great to get somewhat deeper into what language features make TBAA more or less powerful.


## Measuring an Upper Bound

Despite the large number of compiler optimization techniques, it is often difficult to gauge how impactful an optimization will be in the performance of an arbitrary program; the most common approach is to use a set of benchmarks execute them a number of times in the original source and in the optimized source do a pairwise comparison (usually in ratios) and aggregate the gains using the arithmetic or harmonic mean. This approach is not perfect, but makes it easy to compare different optimizations and is standard in literature. Diwan, McKinley and Moss used a completely different approach for TBAA, one that relies on establishing an upper bound.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the most common approach is to use a set of benchmarks execute them a number of times in the original source and in the optimized source do a pairwise comparison

Seems like there are some missing commas in here?


## Connections to Computing Landscape

As compilers have evolved, TBAA has been applied to not only type safe languages but also unsafe languages like C and C++. The introduction of the “strict aliasing rule” in the [C99 standard](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf) allowed compilers to safely assume pointers of incompatible types wouldn’t alias, treating any violations as undefined behavior. There are some exceptions, though, such as character pointers. Compilers such as GCC and LLVM exploit this rule to enable more aggressive optimization. GCC enables [-fstrict-aliasing](https://gcc.gnu.org/onlinedocs/gcc-7.5.0/gnat_ugn/Optimization-and-Strict-Aliasing.html) by default at higher optimization levels like -O2 and LLVM/Clang enables a similar [TBAA metadata system](https://llvm.org/docs/LangRef.html#tbaa-metadata) by default at all optimization levels. A notable drawback is that legacy code that violated these strict aliasing rules could break unexpectedly, prompting the creation and usage of the [-fno-strict-aliasing](https://gcc.gnu.org/onlinedocs/gcc-7.5.0/gnat_ugn/Optimization-and-Strict-Aliasing.html) flag to preserve the previous behavior; moreover, Clang provides [TypeSanitizer](https://clang.llvm.org/docs/TypeSanitizer.html) (enabled with -fsanitize=type flag), a run-time library that uses TBAA metadata and dynamic instrumentation to detect strict type aliasing violations.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn’t heard of TypeSanitizer; that’s very cool!

@sampsyo sampsyo added the 2025sp label May 16, 2025
@sampsyo
Copy link
Owner

sampsyo commented May 28, 2025

Hi, @gerardogtn @mt-xing @arnavm30! I would love to publish your blog post. Can you please wrap up the revisions discussed above so I can hit the green button?

@sampsyo
Copy link
Owner

sampsyo commented Jun 5, 2025

Just one last reminder about the above, @gerardogtn @mt-xing @arnavm30—if you don't want to wrap up the small additions above, we can just close this PR.

@mt-xing
Copy link
Contributor

mt-xing commented Jun 28, 2025

Hi Professor. I don't have access to Gerardo's branch, so I've made a separate PR from my own fork that attempts to address the comments. When you have time, would you be able to review the updated PR #550 instead? In particular, this commit shows the diff of all the changes. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants