-
-
Notifications
You must be signed in to change notification settings - Fork 3k
[mypyc] Speed up generator allocation by using a per-type freelist #19316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Use e.g. `CPyThreadLocal int x;` to define a thread-local variable that should work across most compilers we might want to support.
jhance
approved these changes
Jul 2, 2025
mypyc/lib-rt/mypyc_util.h
Outdated
#define CPyThreadLocal __thread | ||
|
||
#else | ||
#error "Cannot define CPyThreadLocal for this compiler/target" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the error message more specific to the fact that this is related to nogil? Suggest turning on GIL or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good idea. Done.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add support for per-type free "lists" that can cache up to one instance for quick allocation.
If a free list is empty, fall back to regular object allocation.
The per-type free list can be enabled for a class by setting a flag in ClassIR. Currently there is no way for users to control this, and these must be enabled based on heuristics.
Use this free list for generator objects and coroutines, since they are often short-lived.
Use a thread local variable for the free list so that each thread in a free threaded build has a separate free list. This way we need less synchronization, and the free list hit rate is higher for multithreaded workloads.
This speeds up a microbenchmark that performs non-blocking calls of async functions in a loop by about 20%. The impact will become significantly bigger after some follow-up optimizations that I'm working on.
This trades off memory use for performance, which is often good. This could use a lot of memory if many threads are calling async functions, but generally async functions are run on a single thread, so this case seems unlikely right now. Also, in my experience with large code bases only a small fraction of functions are async functions or generators, so the overall memory use impact shouldn't be too bad.
We can later look into making this profile guided, so that only functions that are called frequently get the free list. Also we could add a compile-time flag to optimize for memory use, and it would turn this off.