Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rust] - Managing memory without a garbage collector #2916

Closed
8 tasks done
alexswan10k opened this issue Jun 1, 2022 · 24 comments
Closed
8 tasks done

[Rust] - Managing memory without a garbage collector #2916

alexswan10k opened this issue Jun 1, 2022 · 24 comments

Comments

@alexswan10k
Copy link
Contributor

alexswan10k commented Jun 1, 2022

Probably the biggest impedance mismatch between F# and Rust is the managed/unmanaged memory divide. Rust however has smart pointers, which can be used to achieve pretty near the same result. The philosophy in Rust however is "pick your guarantees", which is somewhat problematic when we really would like a one size fits all approach.

The point of this ticket is to track progress and finalize how these representations translate to Rust, and how we can ensure sensible defaults where possible, but allow enough control where it makes sense.

So far we have the following big open questions:

Pass by value or by reference

Currently implemented by:
  • All functions currently pass by ref and return owned (value)
    • Most efficient - least amount of cloning, benchmarked here
    • Leads to most Rust-like api's where generally pass by reference is preferred.
  • All constructors and Emit calls pass by value and can take ownership. An emit template can additionally pull references to pass down into a native function, thus allowing it to do both pass by value or by reference.
Problems:
  • Passing by reference is not consistent with dotnet default (passing by value)
  • There is currently no way to express byValue. This leads to weird things like double dereferencing pointers &Rc<T>, or primitives being passed by reference &i32.
    • It is not possible to pass some by value and some by reference due to the fact generics require these to be consistent. This was covered here
Ideas:
  • Could use inref to represent a reference.
    • Most similar to dotnet behaviour
    • Typesafe on the F# side - You cannot pass a ref to a function expecting a value, etc
    • Must bind value out before passing it down by reference making pipelining awkward.
    • Module scope values must be mutable
    • Major refactor/breaking change as the default would have to be flipped back to byValue
      • This means passing by value again everywhere which means lots of cloning, which is slow.
  • Could use a function and/or parameter level attribute
    • Function level attribute could flip a whole function to byRef with little friction, allowing opt-in speed benefits
    • Non standard & inconsistent with dotnet
    • Not typesafe on F# side
    • May cause problems when currying, applying functions to functions, etc
Direction:
  • Switch back to pass by value as default
  • ByRef attribute to switch a function over to passing by ref in its entirity
  • ByRef attribute on parameters to allow finer grained implicit control
  • inref support on parameters to allow explicit & syntax, consistent with dotnet

GC substitution - Rc, Arc, GcCell

Currently implemented by:
  • Everything garbage collectable is wrapped in Rc
Problems:
  • How do we deal with concurrency? Rc is not Send or Sync
  • Rc is faster than Arc, so we really want to use Rc when possible
  • A user may want to use GcCell or something completely different. Reference counting is not the fastest, and there are a number of experimental garbage collector crates already available. We probably do not want to force usage of a crate though, as Fable has no direct control over the cargo.toml file, so a sensible default which is library independent would be preferred.
Ideas:
  • Use a build switch to flip between Arc and Rc globally.
    • Conceptually simple as a user never has to think about wrappers
    • Native libraries would have to be duplicated for each wrapper
    • The cost of the abstraction is paid everywhere. If you use an Arc in 1 place, you pay the price everywhere.
  • Use an attribute on a class, so that the type can generate an Arc or an Rc or a GcCell (or anything else), but giving the user control
    • More control
    • Native libraries probably still need to be duplicated as Rust has no way to abstract over Arc/Rc. This is how immutable
Direction:
  • Rust global feature switch to jump between Rc and Arc, Lrc shared abstraction
  • Explicit control for each type via an attribute

Mutability - Cell/MutCell/Mutex/RwLock

Currently implemented by:
  • RefCell - generates a MutCell
  • Arrays use MutCell for interior mutability
  • Mutable fields use MutCell for interior mutability
Problems:
  • How to deal with concurrency as MutCell is not Send or Sync
Ideas
  • Make MutCell Send and Sync via opt-in rust feature "unsafe-cell" even though it is unsafe, and additionally implement .Net locking mechanisms. It is a user's responsibility to correctly implement these patterns, as is expected in .Net - there are no guards.
  • Make MutCell internally switch to using locks (Mutex or RwLock) when a feature switch is turned on. Perhaps "safe-cell-mutex" and "safe-cell-rwlock"
Direction:
  • Make MutCell Send and Sync via opt-in rust feature futures even though it is unsafe.
  • Implement some .NET synchronization and locking mechanisms like Interlocked, F# lock etc.
  • It is a user's responsibility to correctly implement these patterns, as is expected in .NET - there are no guards.

I will try and keep this ticket up to date as things progress.

@alexswan10k
Copy link
Contributor Author

@ncave I have been thinking about those damn wrappers again. I am struggling with this seemingly trivial example

[<Fact>]
let shouldMarshalMutOverAsyncClosureCorrectly () =
    let z = async { return 3 }
    let mutable x: int = 1
    let comp = async {
        let! z = z
        x <- x + 1
        return x + z
    }
    let t = Async.StartAsTask comp
    t.Result |> equal 5
    x |> equal 2

How can we declare that x needs to be Send and Sync without writing some clever lookahead algorithm? Attributes cannot be used in let bindings, so it seems like the only option is to use the type system.

My first thought was type aliases, but it doesn't look like Fable can see these at all (they are just expanded). Is this correct? Are they in the input F# AST? I would like to do something like the following really:

// in Fable.Core.Rust.ts
// Rust - ThreadSafe alias. This is intended to be transparent to dotnet (alias), but Rust will ensure the correct Send and Sync wrappers are used instead of their thread-unsafe counterparts (Rc/RefCell)
type Ts<'T> = 'T

//in test file
[<Fact>]
let shouldMarshalMutOverAsyncClosureCorrectly () =
    let z = async { return 3 }
    let mutable x: Fable.Core.Rust.Ts<int> = 1
    let comp = async {
        let! z = z
        x <- x + 1
        return x + z
    }
    let t = Async.StartAsTask comp
    t.Result |> equal 5

I guess otherwise you would need to make Ts a real wrapper like a single case DU, which makes usage cumbersome, and dotnet code more inefficient (as double pointer indirection).

The benefit with a wrapper is it allows you to cordon off built-in types to use alternative parallel versions (the big one being array of course). I am still torn if we should be trying to replace MutCell with RwLock/Mutex as locking is inconsistent with how dotnet works. I think the most accurate solution is just to make MutCell Send and Sync.

The alternative is of course the global switch, but I still think this is far too heavy-handed. Most code will never cross thread boundaries, so it seems mad to pay the price everywhere.

@ncave
Copy link
Collaborator

ncave commented Jun 29, 2022

@alexswan10k I don't see how we can avoid a global switch. Unfortunately project files are compiled independently and out of order, so it's hard to do global look-ahead. Also, fable-library-rust is pre-compiled, so we have to have 2 versions of it, one with Rc, one with Arc. Would it help to use Lrc? Not sure.

@alfonsogarciacaro
Copy link
Member

@alexswan10k Yes, in several points of the code Fable expands the type aliases to make sure in the Fable AST we only get the actual types. We could keep the alias information if necessary but this may complicate other scenarios as we always need to make sure the type is not an alias when checking something.

The exact thing you need, if I'm not mistaken, is: for local mutable bindings, know if the binding is at any point assigned within a closure (lambda/delegate or object expression).

If that it's correct, I can do it in the FSharp2Fable step for you. We only need to:

  1. Add an isCaptured field to Let local bindings in Fable AST, initially this is initialized to false
  2. In the context of FSharp2Fable we keep two sets, one mutable for captured variables and another for bindings declared in the current closure.
  3. When we detect an assignment to a local ident we check if we're in a closure and if so, if the ident has been declared within or outside the closure. If outside, we set the variable as captured in the context.
  4. When we finish transforming the member body, if there are captured variables, we make a pass to the AST to set isCaptured=true in the Let bindings as appropriate.
  5. In Fable2Rust you just need to check isCaptured to decide whether to use Rc or Arc.

If that looks good to you, I can send a PR with the changes 👍 Hopefully this one will take less time than #2933 ;)

@alexswan10k
Copy link
Contributor Author

alexswan10k commented Jun 30, 2022

Thanks for clarifying @alfonsogarciacaro this is good to know. Your suggestion sounds pretty useful beyond my problem actually, as we already have a similar thing we do for tracking multiple uses of idents (thus deciding if the compiler must clone or not). That being said, do not feel you need to push this forward any time soon as I am not sure how well the whole "smart" approach can even work comprehensively. The issue here is there are two types of closures conceptually - single threaded closures, and thread-safe closures (that expect Send and Sync stuff - Arc). The F# type system cannot really know which applies where, and baking the wrong one into the closure is a compiler error!

@ncave Lrc is a really interesting idea, and it has got me thinking. I believe I might even have a plan inspired by this. It all evolves around Rust features.

Phase 1

  • Redirect all type references to Arc or Rc to an abstracted alias ("Frc"..? names welcome!) as has been done for List, String, etc
  • Redirect all creation to functions controlled in the common fable library
  • Add a rust feature "thread-safe", which conditionally
    • Enables all the async modules
    • Enables the futures dependency
    • Turns all redirected Frc ops to conditionally use their Sync/Send or non SyncSend counterparts. (We could alternatively just use Lrc/Lock with dependent feature switches)
  • Remove ReferenceTypeAttribute and replace with a single more abstract concept of ThreadSafeAttribute.. more on that below..

You can then turn thread safety on globally just by adjusting the cargo.toml

[dependencies]
fable_library_rust = { path = "../../fable-library-rust", features=["thread-safe"] }

In order to test this, perhaps we can have two scripts "test-rust" and "test-rust-async" to run the tests in each context, copying one of two independent cargo.toml files where appropriate. It might even be possible to parameterise this from the command line and use the same cargo.toml file, need to look into that. Async tests would be omitted from the singular series via a local feature.

Phase 2 (optional + caveats)

This is where the semi-smart incrementally thread-safe stuff comes in. What if you could load the same package multiple times with different aliases? Imagine in your consumer Cargo.toml you did the following

[dependencies]
fable_library_rust = { package="fable-library-rust" }
fable_library_rust_ts = { package="fable-library-rust", features=["thread-safe"] }

Now before I go any further I need to point out that at this moment, features are additive, which will mean you actually end up with the same thread-safe version twice, but compiled once. This is however a commonly misunderstood and wrongly used feature, so I have a reasonable amount of confidence we may see mutually exclusive features in the future.

Now assuming you can actually do this (which you can't yet), the attribute ThreadSafeAttribute could be used at the function level, and type level, to redirect not the calls but the actual namespace of the called code to fable_library_rust_ts, so you are effectively consuming an entirely independent parallel-safe implementation. We already have a context var to propagate this. The final problem is then conversions, which can presumably be inferred from if something comes from a thread-safe or a non thread-safe context. Again - we already do something similar when working out if references need to be dereferenced/cloned, so this is doable.

MutCell and interior mutability wrappers

A final point. With features we could actually let a user decide if MutCell should be just Send and Sync (and ignore risks DIY like dotnet), or a slower locking replacement (such as Mutex or RwLock). We could even use features to decide which one to use.

@alfonsogarciacaro
Copy link
Member

Thanks a lot for the clarification @alexswan10k! If we establish that only variables captured by async/task CEs will be thread safe, this can be detected in the AST. Would that help? Not sure if it's expected that devs will use custom functions that need to capture variables in a thread safe manner. If that's the case maybe we could use a mechanism similar to the InlineIfLambda attribute.

@ncave
Copy link
Collaborator

ncave commented Jun 30, 2022

@alexswan10k
I like that using features gives back the control to the library consumer to decide which type of pointers should be used, instead of choosing once and for all at compile time using a compiler flag. I believe that was the original intent from the beginning. Previously it was hard to do custom Rc types before the imports were fixed, but it should be possible now.

We can start small with just doing the custom Rc type and making it work.

Naming things is hard, but:

  • Perhaps we can use Lrc or Grc, instead of Frc. I like Lrc as it conveys the intent, and I like Grc as it's not widely used.
  • Perhaps fable-library-rust-sync instead of fable-library-rust-ts (in cargo.toml).
  • Perhaps atomic-pointers instead of thread-safe, as it conveys the intent for this feature. Thread safety is a more general concept.

@alexswan10k
Copy link
Contributor Author

@alfonsogarciacaro I think that could definitely work for some scenarios. I guess the slight risk here is some might prefer to use some kind of function composition to bind 'a -> Async<'b>'s together, which may be comprised of closures not in a CE. It would definitely solve some of the problem though.

@ncave these are better names, happy to go with your suggestions.

The interesting thing I like about features is it allows you to conditionally bring in transitive dependencies, so you only pay for what you use dependency bloat wise. It also means we can actually use Lrc to abstract over any future Rc/Gc style pointer types just by adding a new feature (and optionally take on a feature conditional dependency to that crate). They will also need to have some kind of bridge implementation in fable-library-rust, but I imagine the overhead will be quite small. (Gc is awkward though because it requires traits to be implemented too).

I think the best next steps are probably to get the features working globally across the library as we covered above, and leave all the clever stuff till some time down the road. We can set up two test scripts that pass different features down to enable and disable async. I do think the clever stuff has merit, but it is clearly going to be quite challenging to iron out all of the edge cases. What do you both think?

@ncave
Copy link
Collaborator

ncave commented Jun 30, 2022

@alexswan10k +1 for starting simple.

I believe I might even have a plan

Sounds like a cunning plan, I like it already! :)

@alfonsogarciacaro
Copy link
Member

@alexswan10k @ncave I'm a bit rusty (pun unintended) on thread-sharing mechanisms in .NET because I usually just use MailboxProcessor, but there are things like System.Threading.Interlocked. Maybe we should try to support them instead of magically making captured mutable values thread-safe? In "normal" F# you wouldn't expect that captured mutable values automatically become thread-safe either, although there are no compiler checks for this, that's why the example above works.

@alexswan10k
Copy link
Contributor Author

alexswan10k commented Jul 1, 2022

@alfonsogarciacaro you are absolutely correct. I was trying to make this point in a convoluted way above, but the bottom line is that rust has guide rails (send and sync) to guarantee correctness whereas dotnet just assumes you know what you are doing. I believe the most accurate translation would be to just mark MutCell send and sync even though it probably isn't. The problem is we have to do something here to satisfy the rust compiler and end up capturing a thing that has send and sync traits. Currently this just doesn't build.

For rust purists maybe we can even allow a feature switch to optionally enable locking wrappers (rwlock or mutex) im place of MutCell down the road, although this would be unlike dotnet as you pointed out. There is also a performance penalty to this so I think it is probably an inappropriate default.

I believe equivalent of interleaved is
https://doc.rust-lang.org/std/sync/atomic/

The ref counting container (Arc/RC) absolutely must be threadsafe though, or you could end up with memory leaks. It may be that this also applies to mutable ownership transfer as contention could leave allocated memory with no owner, where a gc may be smarter.

@alfonsogarciacaro
Copy link
Member

Maybe it's not a bad thing that something that it's potentially risky doesn't compile in Rust. So instead of trying to make all cases compilable we can force devs to use an explicit Mutex when necessary. It's unfortunate that Fable async tests overuse mutable captured values, but we could change this for Rust, as I don't think it's a common pattern in standard F# code (except maybe when you're accumulating a result on an async for loop).

I had a similar situation with Dart because the compiler is now more strict with nullability. Most of the time, null is not an issue in F#, but for the sake of C# interoperability, null is accepted for reference types not declared in F#, like strings. So if you pass null to an argument accepting a string, F# won't complain but Dart will (something similar happens with Unchecked.defaultof<_>). I haven't tried to make these cases work, forcing users to make changes to avoid this situation often results in safer code.

@alexswan10k
Copy link
Contributor Author

Agree. Maybe how it is today is actually the best default. We could make implicit send and sync on MutCell an opt in feature, perhaps "unsafe-cells" or something.

I do like the idea of explicit wrappers but there is nothing in dotnet that maps to Mutex or RwLock so presumably we would need an F# only Implementation for parity?

@ncave
Copy link
Collaborator

ncave commented Jul 1, 2022

@alexswan10k

there is nothing in dotnet that maps to Mutex or RwLock

There is System.Threading.Mutex and ReaderWriterLock , unless you mean something else.

We could make implicit send and sync on MutCell an opt in feature

Makes sense.

The ref counting container (Arc/RC) absolutely must be threadsafe

Absolutely, so either a compiler flag is needed to choose atomic ref count pointers, or (better) a custom RC pointer type + a Rust feature to allow the library consumer to choose later whether to use atomic rc pointers.

@alexswan10k
Copy link
Contributor Author

alexswan10k commented Jul 1, 2022

Sure but they are non generic and work a little differently. They do not guard a value as the rust one does. I'm not sure how we can make this fit?

Oh one point to add as to why unsafe-cell might be useful: because rust is very strict you would not currently be able to define a non mutable array (it uses MutCell internally) outside of a thread closure and capture it once, even though it technically is only used by one thread. Passing to the closure requires send as it may run on a different thread. Lifting the restriction allows more freedom here to allow things that should compile to just work.

@ncave
Copy link
Collaborator

ncave commented Jul 1, 2022

@alexswan10k Then perhaps I'm missing your point, do you mean mapping (implementing) F# lock (i.e System.Threading.Monitor) with Rust std::sync::Mutex?

Perhaps we can enumerate the use cases / features we want to cover, and start simple.

@alexswan10k
Copy link
Contributor Author

alexswan10k commented Jul 2, 2022

Sure. So monitor has the same problem in that it holds a lock independent from the object('s) it is protecting. You could probably implement this with a rust Mutex<unit> but the associated mutable cell('s) would then have to be marked as Send and Sync even though they can not be guaranteed to be thread safe when used out of context (we did discuss allowing this with the unsafe-cell opt in feature anyway). I think that would allow the locking patterns you see in dotnet.

There is no direct analog to rust's Mutex<T> though, which is arguably a superior and more robust api, which can also guarantee safety when send and sync traits are considered. Do we want to support this kind of container api? If so we probably need something else in the F# world.

@ncave
Copy link
Collaborator

ncave commented Jul 2, 2022

@alexswan10k

Do we want to support this kind of container api?

I assume we would eventually auto-generate bindings for many standard Rust libraries, using some sort of rust2fable tool.
For now those bindings can be manually emitted (with Emit) where needed.

In that sense, perhaps we don't need to find an analog to Rust mutex in F#, just a way to implement the .NET synchronization primitives and F# computation expressions task {…} and async {…}.

I like your approach of incrementally adding the bare minimum to make a feature work, before going to the next one.
Perhaps we can do the custom Rc type first, and see how it works.

It's possible that part of your reasoning went over my head, if so I apologize in advance for asking you to repeat it, where necessary.

@alexswan10k
Copy link
Contributor Author

alexswan10k commented Jul 2, 2022

Not a problem at all, it's good to not just barrel in and take some time to think through consequences.

So in summary

  • Redirect rc calls and type defs to fable core lib
  • Make a feature of core "atomic-cells" that switches redirected calls to arc
  • Make a feature of core "unsafe-cell" that marks MutCell as send and sync
  • make a feature of tests "threadsafe" which enables async and task tests, and enables thread features on core. Make a second script test-rust-async in build.fsx to trigger tests with thread stuff on, so both paths can be tested in CI.

At this point I feel we have an mvp.

Beyond that

  • Implement dotnet mutex or other locking constructs using something like rust mutex<unit>. With unsafe-cell we should be able to do dotnet locking patterns
  • (way down the road) generate bindings for rust mutex along with other std lib stuff

@ncave
Copy link
Collaborator

ncave commented Jul 2, 2022

@alexswan10k Possibly a typo, perhaps the naming can be:

  • Make a feature of core atomic-cells atomic-pointers that switches redirected calls to arc
  • Make a feature of core unsafe-cell atomic-cells that marks MutCell as send and sync

Perhaps down the road we can even make MutCell actually thread-safe if the feature is enabled.

@alexswan10k
Copy link
Contributor Author

@ncave I have updated this ticket with the latest progress. We are nearly there, if we are happy with the proposed MutCell feature switches I can add these as unfinished checkboxes.

@ncave
Copy link
Collaborator

ncave commented Jul 27, 2022

@alexswan10k I assume you mean this?

  • Make MutCell Send and Sync via opt-in rust feature futures even though it is unsafe.
  • Implement some .NET synchronization and locking mechanisms like Interlocked, F# lock etc.
  • It is a user's responsibility to correctly implement these patterns, as is expected in .NET - there are no guards.

Yes, I agree that looks like a sensible scope and we should proceed.

@alexswan10k
Copy link
Contributor Author

Nice one. Updated the TBD to your comment.

@alexswan10k
Copy link
Contributor Author

alexswan10k commented Aug 12, 2022

I think the core of this is complete.

I been going back and forth in my head about the implementation of Monitor and I believe it is actually thread safe in the sense you are already going through a global RWLock, so it is probably sufficient for an MVP.

My hope with stuff like this is if it ever does gain traction, a Rust threading expert will probably look at this and go WTF, and then feel compelled to improve it :)

Closing for now.

@alexswan10k
Copy link
Contributor Author

alexswan10k commented Aug 17, 2022

Another couple of discoveries for future real GC impls

https://github.com/fitzgen/bacon-rajan-cc
https://github.com/redradist/ferris-gc
https://github.com/Others/shredder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants