Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread-safety of Ginkgo #996

Open
upsj opened this issue Mar 24, 2022 · 6 comments
Open

Thread-safety of Ginkgo #996

upsj opened this issue Mar 24, 2022 · 6 comments
Labels
is:help-wanted Need ideas on how to solve this.

Comments

@upsj
Copy link
Member

upsj commented Mar 24, 2022

In the discussion on #993, but also some previous issues like #740 and #565, as well as future plans like #805, there is an underlying thing we have not yet talked about before, and that is what thread-safety guarantees different parts of Ginkgo can give.

I could imagine different levels of thread-safety guarantees we can give

  1. We don't give any guarantees, i.e. each thread needs to use its own executor and independent objects. This means we would only need to make some guarantees on things like cross-executor copies, but otherwise should have no issues. While easiest, I think this might be a bit overly restrictive?
  2. We promise that Executor itself is thread-safe (which it should be already currently), but require different threads to not use the same object at the same time, e.g. two threads can't call apply on a solver at the same time. This would be relevant when we want to decide how to deal with DenseCache types that avoid reallocations by storing a mutable Dense object internally.
  3. Everything inside Ginkgo can be used in a thread-safe fashion, except for concurrent modification of data. This would mean that we need to make sure that a DenseCache provides a different thread-local object for each thread in something like a mutable std::unordered_map<std::thread::id, std::unique_ptr<Dense<...>>>. We would then need to make sure write accesses to the cache are mutually exclusive, but could otherwise provide pretty strong guarantees.

The whole area of temporary storage looks to me like the most important/difficult to handle part, but can you imagine any other areas that could have issues with thread-safety?

@upsj upsj added the is:help-wanted Need ideas on how to solve this. label Mar 24, 2022
@upsj upsj added this to the Ginkgo 1.5.0 milestone Mar 24, 2022
@Slaedr
Copy link
Contributor

Slaedr commented Mar 25, 2022

@tcojean How does StarPU manage different tasks? Does it spawn threads?

I feel like the internal complication arising from either options 2 or especially 3 needs good justification. In general, it may be reasonable to expect each host thread to have its own independent executor and objects, or otherwise leave it to the application to synchronize at a very coarse level.

@tcojean
Copy link
Member

tcojean commented Mar 25, 2022

There are long lived (p)threads on every core. Tasks are just structures which have a code tied to it and data. The tasks are moved to the cores/pthreads by the scheduler and its policy (work stealing or whatever else the user sets).

@Slaedr
Copy link
Contributor

Slaedr commented Mar 25, 2022

@tcojean I see. Is there a concept of ownership of data among threads that StarPU itself manages? Is that the case in general for tasking systems? If that is the case, perhaps there's no pressing need for us to manage the task-level coarse-grained parallelism. It's good you brought up StarPU, because I think this topic (of host thread safety) needs to be discussed in the context of any possible task runtime integration.

@tcojean
Copy link
Member

tcojean commented Mar 25, 2022

I'm not sure what you mean, but the data can move between cores/threads depending on the scheduler. Whether there will be concurrent access depends on the code itself as well:

  • If we consider two tasks A and B, each using some Ginkgo functionality, but there is a data dependency between the two (maybe A writes some data and B reads that same data), then there will be a dependency of course between them added by StarPU and they will not run concurrently (no thread safety issue there from the Ginkgo side)
  • If we still have two tasks, tasks A and B, which both call Ginkgo functionality, but for some reason they can run concurrently (maybe same solver but different b,x?) then there can be a thread safety issue if the scheduler decides to execute them at the same time on different cores.

@Slaedr
Copy link
Contributor

Slaedr commented Mar 25, 2022

In the second case you mentioned, perhaps cached data for the matrix, preconditioner etc. needs to be considered while setting task dependencies.

@upsj
Copy link
Member Author

upsj commented Mar 25, 2022

I guess that is the crux of 2 vs. 3: Does solver->apply(...) have a read/write dependency on itself? With 3. we could avoid that, with 2 we need to consider it.

Just for clarification, I think we already fulfill 2. right now, we just don't codify or check it in any way (manually or in unit tests)

@upsj upsj removed this from the Ginkgo 1.5.0 milestone Apr 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is:help-wanted Need ideas on how to solve this.
Projects
None yet
Development

No branches or pull requests

3 participants