Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rndr: Add support for aarch64 RNDR register backend #494

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mrkajetanp
Copy link

AArch64 platforms from version Armv8.4 onwards may implement FEAT_RNG. FEAT_RNG introduces the RNDR (and RNDRRS) register, reading from which returns a random number.

Add support for using the RNDR register as a backend for getrandom. The implementation is hidden behind a new "rndr" crate feature.

Currently, detecting whether FEAT_RNG is available without std relies on the Linux Kernel's MRS emulation. For that reason the rndr implementation is marked as unsafe, because we cannot always detect whether the register is available or not.

This commit also adds a safe rndr_with_fallback backend for Linux systems. With this backend, getrandom will use the RNDR register on Linux systems where it is available and automatically fallback onto using Linux's getrandom syscall on systems where it is not.
This implementation allows the crate to be build for Linux with this feature in advance and then run without having to know whether FEAT_RNG is implemented or not.

Implementation questions that might use some discussion:

  1. If the rndr feature is switched on, should the behaviour be for the crate to prefer seeding using RNDR over the getrandom syscall or should it only be a fallback in the same way as Intel's RDRAND?

  2. if two or more dependencies both depend on getrandom, cargo will still only build it once. However, it unifies all the features, so if any one dependency enables rdnr, they will all get it. Should there be some disable_aarch64_rndr feature to turn it off for those cases or should it be left as-is?

  3. Feature detection with no_std relies on Linux MRS emulation which was only merged in Linux 4.11 while the minimum supported kernel version for Rust on aarch64 is 4.1. How should we handle this potential pitfall given that we cannot runtime-detect the feature on those few kernel versions.

@newpavlov
Copy link
Member

newpavlov commented Jul 26, 2024

Personally, I don't think we should do this. For example, we do not allow to overwrite Linux entropy source with RDRAND-based implementation. At the very least, it should be done with a configuration flag (i.e. cfg(flag)), not with a crate feature.

Why do you want to use RNDR instead of the getrandom syscall?

@mrkajetanp
Copy link
Author

mrkajetanp commented Jul 26, 2024

Why do you want to use RNDR instead of the getrandom syscall?

"Want" is a strong way to put it, the main goal here is to add support for the architecture feature. What the associated policy is is an open question as far as I'm concerned and that's why I'm asking for outside opinions :)

The situation as it currently stands is as follows. The way the getrandom syscall is implemented on aarch64 Linux is that it uses RNDR to get a seed for its own chacha20 implementation and then it uses chacha20 to produce the final number.
Thus, to get a random number from rand the chain would go

rndr -> kernel chacha20 -> getrandom -> rand chacha -> final number

If we directly have getrandom return the number from rndr if one is available, it will be faster than always using the syscall and the numbers generated by rndr are supposed to "conform to approved standards that are appropriate for the market requirements" as per the Arm ARM so it should be of good quality as well.

At the same time this crate is mainly supposed to be used for seeding anyway so the performance improvement might just not matter all that much. I'm happy to go either way depending on maintainer preference, both approaches have benefits and drawbacks.

Though the annoying downside is that we can only guarantee that we can detect if the feature is available at runtime on Linux meaning that we can only use this 100% safely on systems where the getrandom syscall is also available.

@newpavlov
Copy link
Member

newpavlov commented Jul 26, 2024

The way the getrandom syscall is implemented on aarch64 Linux is that it uses RNDR to get a seed for its own chacha20 implementation and then it uses chacha20 to produce the final number.

Linux should also use other sources to get entropy which gets mixed into its in-kernel CSPRNG, no? I think the same concerns which were raised against using RDRAND as the only entropy source can be applied to RNDR as well.

It may be worth to allow users to overwrite entropy source with RDRAND/RNDR (or with a custom impl) using opt-in configuration flags, but it requires a separate discussion. It probably will be part of the future v0.3 release.

We can leave this PR open for now and return to it later.

@mrkajetanp
Copy link
Author

Linux should also use other sources to get entropy which gets mixed into its in-kernel CSPRNG, no?

Maybe it should, though as it is implemented at least as of two weeks ago or so that does not seem to be the case, at least for the aarch64 implementation. It'll try the TrustZone TRNG, failing that RNDR and failing that just use the number of CPU cycles since the last interrupt as the seed.

It may be worth to allow users to overwrite entropy source with RDRAND

Agreed, that would be useful for sure.

I can just move the imp bit down the list for now? It won't be used by default but at least we'll have support for the backend in the codebase already similar to RDRAND, then we can have discussions on policy later which I feel are a separate issue from just having the backend in the first place.

@newpavlov
Copy link
Member

I can just move the imp bit down the list for now?

This would make the RNDR backend unreachable. It may be worth to make RNDR support similar to RDRAND, i.e. you could remove the Linux-specific fallback and instead make the RNDR backend available on all AArch64 targets which are not supported by default after rndr crate feature is enabled.

@mrkajetanp
Copy link
Author

Yes that could work. The issue there is that on no_std we can't do feature detection at runtime so the interface would only be safe if the user definitely knew their CPU had the register when they enabled the crate feature. Is that an assumption we're okay with making?

@newpavlov
Copy link
Member

newpavlov commented Jul 29, 2024

You could require enabling the rand target feature by adding something like this:

#[cfg(not(target_feature = "rand"))]
compile_error!("The RNDR backend requires for the `rand` target feature to be enabled at compile time");

But I am not sure if such backend will be useful in practice. Plus, the RNDR detection code could be useful when we will allow users to opt-in into RNDR on Linux targets.

@mrkajetanp
Copy link
Author

For sure, the compile-time is easy. The worry is more that if the crate is compiled for the feature but the binary ran on a CPU that doesn't have it the program will crash on a SIGILL. But I suppose this would only be relevant for embedded targets anyway where the users will know what they're building for.

Yeah my thinking is that it is good to have the support merged in first and then work on opt-in interfaces for the backends without actually making it the default.

AArch64 platforms from version Armv8.4 onwards may implement FEAT_RNG.
FEAT_RNG introduces the RNDR (and RNDRRS) register, reading from which
returns a random number.

Add support for using the RNDR register as a backend for getrandom.
The implementation is hidden behind a new "rndr" crate feature.
For the backend to work, users must ensure that the target platform
supports FEAT_RNG because it is not possible to reliably detect the
feature at runtime across platforms.
Currently, detecting whether FEAT_RNG is available without std relies
on the Linux Kernel's MRS emulation.

This commit adds a safe rndr_with_fallback backend for Linux systems.
With this backend, getrandom will use the RNDR register on Linux systems
where it is available and automatically fallback onto using Linux's
getrandom syscall on systems where it is not.
This implementation allows the crate to be build for Linux with this
feature in advance and then run without having to know whether
FEAT_RNG is implemented or not.

For the time being, this backend is not used by default on any platform
configuration. The intention is for it to be usable as an opt-in when an
opt-in mechanism is available in the crate.
@mrkajetanp
Copy link
Author

I did as you suggested for the rndr backend. I still added the fallback backend in a separate commit because it'll be useful for the opt-in interfaces, currently it's only included for testing.

Note: I stumbled onto an issue while writing the test in that it is currently not possible to write integration tests for any module that has util_libc in the dependency chain because util_libc calls Error::from_os_error which is not public. The least-invasive solution to make these kinds of tests compile is to make it public which is what I did in the commit.
The only other way I could think of would be to overhaul how the imports and dependencies work in the entire crate.

@mrkajetanp
Copy link
Author

Hi, is there some way we could move this forward? Would you like to see any specific changes?

@newpavlov
Copy link
Member

As I wrote above, I don't think we should merge it until v0.3 which should have a proper way to change backends at compile time (hopefully, it will be done before the end of this year). IIUC RNDR is a relatively new extension which has a very limited support in existing hardware (e.g. see the rng column in this list).

@mrkajetanp
Copy link
Author

I see, that makes sense. The specific use-case I'm trying to enable is making it possible for users of rand to e.g. request a Vec of random numbers generated by a hardware RNG directly if they have one on the system. As opposed to one generated from a seeded PRNG.
This is currently not possible because rand relies on this crate to do the actual low-level interactions and this crate makes it impossible to access two different generators without recompiling in between the two accesses. How do you think I should go about enabling this type of use-case then?

@newpavlov
Copy link
Member

The specific use-case I'm trying to enable is making it possible for users of rand to e.g. request a Vec of random numbers generated by a hardware RNG directly if they have one on the system.

You can define a crate like rdrand with a struct which implements the RngCore trait. After that, users can easily use it with the rest of the rand ecosystem.

How fast is RNDR in practice? Usually, hardware-based TRNGs are relatively small, so I think relative overhead of going through the getrandom syscall should be quite small.

@mrkajetanp
Copy link
Author

How fast is RNDR in practice?

It's just a register accessible from EL0 so pretty much as fast as it gets. Though it is possible for it not to return a random number and the retries might introduce some latency, that'll depend on the HW implementation. Still very fast even with the retries, given that the syscall on Linux goes through its own chacha implementation in addition to the syscall overhead. The docs for the Linux syscall specifically state that the implementation is pretty slow and not supposed to be fast:

Using these interfaces to provide large quantities of data for
Monte Carlo simulations or other programs/algorithms which are
doing probabilistic sampling will be slow.  Furthermore, it is
unnecessary, because such applications do not need
cryptographically secure random numbers.

It's probably not a big difference for small use-cases, but overhead starts to matter a lot if you're generating millions+ of numbers for ML workloads and such.

@newpavlov
Copy link
Member

It's just a register accessible from EL0 so pretty much as fast as it gets.

It's not a matter of reading from the register, but about throughput achievable in practice, i.e. how much data can be provided by the hardware. Usually, HW generators seeded by "true" RNGs are relatively slow compared to user-space PRNGs. For example, RDRAND provides up to 800 MB/s, which is fast, but still slower than even ChaCha8 CSPRNG.

It's probably not a big difference for small use-cases, but overhead starts to matter a lot if you're generating millions+ of numbers for ML workloads and such.

In such cases users most certainly should use lightweight user-space PRNGs, especially if the workload is not very sensitive to quality of random data.

@mrkajetanp
Copy link
Author

how much data can be provided by the hardware

That'll depend on the specific hardware implementation and vary a lot in practice, like it does with the x86 implementations.
The architecture itself makes no claims about the throughput.

On the wider point I agree though. On the machines I have access to which have the feature the throughput was better than rdrand on Intel machines but still worse than software PRNGs so you're right in that it's better to use those when throughput is a concern, fair enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants