Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve DLL imports at CRT startup, not on demand #81478

Merged
merged 1 commit into from
Jan 31, 2021

Conversation

sivadeilra
Copy link

On Windows, libstd uses GetProcAddress to locate some DLL imports, so
that libstd can run on older versions of Windows. If a given DLL import
is not present, then libstd uses other behavior (such as fallback
implementations).

This commit uses a feature of the Windows CRT to do these DLL imports
during module initialization, before main() (or DllMain()) is called.
This is the ideal time to resolve imports, because the module is
effectively single-threaded at that point; no other threads can
touch the data or code of the module that is being initialized.

This avoids several problems. First, it makes the cost of performing
the DLL import lookups deterministic. Right now, the DLL imports are
done on demand, which means that application threads might have to
do the DLL import during some time-sensitive operation. This is a
small source of unpredictability. Since threads can race, it's even
possible to have more than one thread running the same redundant
DLL lookup.

This commit also removes using the heap to allocate strings, during
the DLL lookups.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @joshtriplett (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 28, 2021
@sivadeilra
Copy link
Author

Suggest @m-ou-se review this, since they've made changes in this area before.

@m-ou-se m-ou-se assigned m-ou-se and unassigned joshtriplett Jan 28, 2021
Copy link
Member

@m-ou-se m-ou-se left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a great improvement!

Will this .CRT$XCU section work also work when it is part of a Rust cdylib/dll that'll be loaded into another program, possibly written in another language?

library/std/src/sys/windows/compat.rs Outdated Show resolved Hide resolved
@joshtriplett
Copy link
Member

My understanding was that we're removing the dynamic detection of these features in the course of removing Windows XP support.

@m-ou-se
Copy link
Member

m-ou-se commented Jan 29, 2021

My understanding was that we're removing the dynamic detection of these features in the course of removing Windows XP support.

That already happened. The few APIs left that are loaded dynamically now are also not available on Windows 7. E.g. WaitOnAddress is only available on Windows 8+.

@m-ou-se m-ou-se added T-libs Relevant to the library team, which will review and decide on the PR/issue. O-windows Operating system: Windows labels Jan 29, 2021
@m-ou-se
Copy link
Member

m-ou-se commented Jan 29, 2021

@sivadeilra This PR now adds a few files called .rs.per-module. Is that still a work in progress?

@sivadeilra
Copy link
Author

@sivadeilra This PR now adds a few files called .rs.per-module. Is that still a work in progress?

Eeek! No, that's an accidental inclusion. I'll remove it.

On Windows, libstd uses GetProcAddress to locate some DLL imports, so
that libstd can run on older versions of Windows. If a given DLL import
is not present, then libstd uses other behavior (such as fallback
implementations).

This commit uses a feature of the Windows CRT to do these DLL imports
during module initialization, before main() (or DllMain()) is called.
This is the ideal time to resolve imports, because the module is
effectively single-threaded at that point; no other threads can
touch the data or code of the module that is being initialized.

This avoids several problems. First, it makes the cost of performing
the DLL import lookups deterministic. Right now, the DLL imports are
done on demand, which means that application threads _might_ have to
do the DLL import during some time-sensitive operation. This is a
small source of unpredictability. Since threads can race, it's even
possible to have more than one thread running the same redundant
DLL lookup.

This commit also removes using the heap to allocate strings, during
the DLL lookups.
@m-ou-se
Copy link
Member

m-ou-se commented Jan 29, 2021

Not directly feedback on this PR, but just thinking a bit out loud about (the future of) these dynamic imports on Windows:

Since support for XP/Vista was dropped recently, the only dynamic imports left are:

  1. SetThreadDescription
  2. GetSystemTimePreciseAsFileTime
  3. WaitOnAddress + WakeByAddressSingle
  4. NtCreateKeyedEvent + NtReleaseKeyedEvent + NtWaitForKeyedEvent

Each of these is only used in one place.

The first one is used every time a thread is created. The fallback is a function that does nothing. That code calls this function unconditionally, but it could've avoided allocating the utf16 version of the name if it checked if this function was available first. (In that case, no fallback implementation is needed.)

The second one has the exact same signature as the less precise GetSystemTimeAsFileTime, which it calls in the fallback if the precise version is not available. Since it's the exact same signature, it is also possible to directly use the address of GetSystemTimeAsFileTime if GetSystemTimePreciseAsFileTime is not available. (Avoiding one level of calls, although probably not very significant.)

The third and forth sets are alternatives for eachother: The third set (the 'futex' calls) is used if available, and otherwise the fourth set (keyed events) is used instead. (For the thread parker implementation.) Lookups for the fourth set can be skipped entirely if the third set is available. Also, the fallback implementations for the third set are never called. And for both sets, either all functions of none of the functions will be available.

When NtCreateKeyedEvent is used, it is used 'once' in a racy way, just like the lookups before this PR. It needs to create a handle only once, but that will be done by the first thread needing one, because of the lack of static/'pre-main' initializers.


This all makes me wonder what the best outcome could be, if we start making use of this .CRT$XCU section for initialization code. I think this PR is a good improvement, but it does mean that some functions will be looked up even when they're not going to be used. However, changing this compat_fn macro to know in which cases to load which APIs might make it a lot more complicated and harder to maintain.

It also makes me wonder if this system of compat_fn!{ .. } with fallback implementations still makes sense now that only these few APIs are left, as most of these would be fine without a fallback.

Maybe a hand-written .CRT$XCU init function would work, and should be easy to maintain with this small amount of APIs left.

I'm not saying this is all something we should do right now, but maybe this PR could be a step towards something like that. (And it'd be good to get some experience in Rust with .CRT$XCU before making more things depend on it.)

Would love to hear your thoughts about this, as you know more about this than I do. (And I suppose you probably already thought about next steps here as well.)

@sivadeilra
Copy link
Author

These are great thoughts! There are several aspects of this:

First, @rylev and I have been working on supporting different Windows API versions as different targets for Rust. The goal is that you would be able to copile Rust apps (and libstd) for Windows 7, Windows 10, Xbox, etc., and we can optimize the behavior of libstd for each of these targets. So for Windows 10, all of these imports would be normal DLL imports, with no delayed binding. (And they should also use API sets, which is a relatively recent concept added to Windows DLL loading, and has some advantages.) @rylev's pre-RFC for this is here: https://internals.rust-lang.org/t/pre-rfc-min-target-api-version/13339

Second, Windows has built-in support for handling delayed DLL imports. So if you set things up correctly, the import tables say "I always need X, Y, and Z, but I also want W. If you can find W, then use it, otherwise use this fallback function." It's basically exactly what compat_fn! does right now, but the Windows DLL loader handles it for you. You get this "for free" if you use the right versions of the Windows SDK.

Right now, when rustc links an executable, it uses whatever version of the Windows SDK it can find. We need to improve that, and reconcile it with what Ryan is doing for the min_api_target_version (or whatever it becomes).

So ideally, if you compile for Win10, you get all normal imports, with no delayed binding. And once we land the min_api_target_version stuff, we can use #[cfg(all(windows, min_api_target_version = "..."))] to control whether the fallback code exists or not. That will make the locking code simpler, slightly faster, etc.

Then, if you compile for a specific downlevel version, like Windows 7, then we can set up the import tables so that Windows handles all of the delayed binding for you. The binaries would contain both the "do the fancy new thing" code as well as the fallback code, but libstd won't have to do its own lookups to find the DLL imports.

I wanted to submit this PR now, just to slightly improve things, while we're working on the better long-term solution.

Also, a member of my team is currently working on extending rustc so that we will not need external import libraries (the *.lib files from the Windows SDK). That will give us a lot more freedom for how we structure things, and would potentially eliminate the need to have the Windows SDK installed. That will help with moving to use lld for linking, for those people who want that.

@m-ou-se
Copy link
Member

m-ou-se commented Jan 29, 2021

Sounds good!

Let's get started with this PR then. :)

@bors r+

@bors
Copy link
Contributor

bors commented Jan 29, 2021

📌 Commit f4debc8 has been approved by m-ou-se

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 29, 2021
@m-ou-se
Copy link
Member

m-ou-se commented Jan 30, 2021

@bors rollup=never

Let's not roll this one up, to make sure it's easy to find if it causes any regressions.

@m-ou-se
Copy link
Member

m-ou-se commented Jan 30, 2021

Suggest @m-ou-se review this, since she's made changes in this area before.

By the way, feel free to use the r? command (e.g. r? @m-ou-se) to assign a reviewer, if you have a reviewer in mind.

@bors
Copy link
Contributor

bors commented Jan 31, 2021

⌛ Testing commit f4debc8 with merge 0e63af5...

@RalfJung
Copy link
Member

RalfJung commented Jan 31, 2021

FWIW, Miri does not implement these magic init link sections, so the new init code will just not be called when running on Miri. From what I understood, this means the static mut will remain None and the fallback will be used, which seems fine for now. However, if in the future Rust libstd relies on one of these "external to Rust" mechanisms to do anything, then Miri will have to have some implementation of those mechanisms, and I'd appreciate some help with that. :)

@bors
Copy link
Contributor

bors commented Jan 31, 2021

☀️ Test successful - checks-actions
Approved by: m-ou-se
Pushing 0e63af5 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 31, 2021
@bors bors merged commit 0e63af5 into rust-lang:master Jan 31, 2021
@rustbot rustbot added this to the 1.51.0 milestone Jan 31, 2021
bors added a commit to rust-lang/miri that referenced this pull request Jan 31, 2021
rustup; remove some no-longer-needed Windows shims

libstd now calls these lock functions directly, and `GetModuleHandleW` isn't use either any more since rust-lang/rust#81478.
@sivadeilra sivadeilra deleted the windows_dll_imports branch January 31, 2021 15:17
sivadeilra pushed a commit to sivadeilra/rust that referenced this pull request Jan 31, 2021
My PR rust-lang#81478 used the wrong calling convention for a set of
functions that are called by the CRT. These functions need to use
`extern "C"`.

This would only affect x86, which is the only target (that I know of)
that has multiple calling conventions.
JohnTitor added a commit to JohnTitor/rust that referenced this pull request Feb 1, 2021
…x86, r=m-ou-se

Fix calling convention for CRT startup

My PR rust-lang#81478 used the wrong calling convention for a set of
functions that are called by the CRT. These functions need to use
`extern "C"`.

This would only affect x86, which is the only target (that I know of)
that has multiple calling conventions.

`@bors` r? `@m-ou-se`
@ChrisDenton
Copy link
Member

Can I ask why the C runtime's init routine is used instead of Rust's native init function? The Windows implementation of which is currently empty.

Also, while I'm here, a quick question about GetModuleHandleA. Under the hood it surely needs to convert the string to UTF-16 so does it use a stack buffer or something like that instead of allocating?

@sivadeilra
Copy link
Author

To be honest, I was simply unaware that the Rust init function existed. The CRT initialization technique is the "right way" to do this in Windows land, so it did not occur to me to look for a different way. Are there particular advantages to using the Rust init function? Also, the CRT-based initialization strategy works for both DLLs and EXEs (and even static libraries); is the Rust init function called during DLL initialization (under DllMain)? For my own work, integrating Rust and C/C++ libraries is very important, so being able to use the same initialization model for both Rust and C/C++ is important.

About GetModuleHandleA. The contract for this API does not make any guarantees about allocating (or rather, about not allocating). Neither does the contract for GetModuleHandleW, which is the wide version of it. Both may require allocation, internally; Windows has never made any promise that these functions can't fail due to resource constraints. What Windows does guarantee is that, if the function does fail, that it returns in a well-defined state. The function(s) will return NULL and GetLastError() will return some meaningful error.

This may seem like a distinction without a difference, but there is a difference. The difference is that the Windows allocator state is separated from the Rust allocator state. It is safe to use GetModuleHandleA or -W in the implementation of a heap allocator in Rust, for example, without any risk of re-entrancy.

@ChrisDenton
Copy link
Member

Thanks, for the response!

For my own work, integrating Rust and C/C++ libraries is very important, so being able to use the same initialization model for both Rust and C/C++ is important.

That's fair enough. There have been others who want to be able to use "pure Rust" without the C runtime. Which admittedly isn't currently possible (well it's technically possible but not easy).

About GetModuleHandleA

Ah, I understand. My other concern with using A functions is how they're converted. If the user is on a system that's using a code page which isn't an extension of US-ASCII then it could produce the wrong UTF-16 string. I seem to recall the EBCDIC code pages being mentioned as an issue, though they're rarely used nowadays.

@sivadeilra
Copy link
Author

I don't see much value in "pure Rust", for it's own sake. I do see huge value in integrating Rust with the enormous existing software ecosystems, and at least today that means C/C++. If there is a reason to go with "pure Rust" in some situation, then I'd certainly be willing to understand that better, and possibly to support it. But purity for its own sake seems more like an aesthetic goal than one that has concrete benefits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. O-windows Operating system: Windows S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants