Fix an issue with `sync::Barrier` leader selection with reuse #102

lukenels · 2023-04-10T17:51:39Z

The Rust standard library documentation for sync::Barrier states the following about Barrier::wait:

Blocks the current thread until all threads have rendezvoused here.
Barriers are re-usable after all threads have rendezvoused once,
and can be used continuously. A single (arbitrary) thread will
receive a BarrierWaitResult that returns true from
BarrierWaitResult::is_leader() when returning from this function,
and all other threads will receive a result that will return false
from BarrierWaitResult::is_leader().

This documentation is a bit ambiguous. It says that "a single thread" becomes the leader after returning from wait, but it isn't clear if that means a single thread for the existence of the Barrier, or a single thread for each "batch" of threads released by the barrier. The latter interpretation seems to make the most sense given the context of being able to reuse barriers, and it seems more useful to be able to assume that a leader is set for each batch. That is in fact the behavior implemented by the standard library as evidenced by the implementation and this unit test, which reuses a barrier on a new thread:

let barrier = Barrier::new(1);
assert!(barrier.wait().is_leader());

let thd = thread::spawn(move || {
    assert!(barrier.wait().is_leader());
});

thd.join().unwrap();

Let's assume this behavior is correct, and that ambiguity in the Rust documentation would be resolved in favor of the existing implementation. Then the Shuttle behavior for Barrier is incorrect and causes this test to fail, because there is only a single, global leader field on the barrier which is set once the first time it's used.

This commit fixes the problem by adding the notion of an "epoch" to the Barrier implementation, which allows there to be different leaders for each batch / group of threads released by the barrier. (This is similar to the standard library implementation's notion of generation).

The commit also adds some new unit tests to capture the issue with the old behavior and prevent regressions. It also tweaks Barrier::wait to become an explicit thread yield point, in order to vary which thread becomes the leader rather than just choosing the first one that happens to call `wait.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

jamesbornholt

I ended up writing this test because I wasn't sure about the case where 1 < barrier_size < num_threads; I think it subsumes some of yours:

// Test that any combination of threads can be chosen as leader of an epoch.
fn barrier_test_nondeterministic_leader(batch_size: usize, num_threads: usize) {
    // Don't test cases that might deadlock; every thread needs to be released eventually
    assert!(num_threads % batch_size == 0);

    let seen_leaders = Arc::new(std::sync::Mutex::new(HashSet::new()));

    {
        let seen_leaders = Arc::clone(&seen_leaders);
        check_dfs(
            move || {
                let barrier = Arc::new(Barrier::new(batch_size));
                let leaders = Arc::new(std::sync::Mutex::new(vec![]));
                let handles = (0..num_threads).map(|i| {
                    let barrier = Arc::clone(&barrier);
                    let leaders = Arc::clone(&leaders);

                    thread::spawn(move || {
                        let result = barrier.wait();
                        if result.is_leader() {
                            leaders.lock().unwrap().push(i);
                        }
                    })
                }).collect::<Vec<_>>();

                for thd in handles {
                    thd.join().unwrap();
                }

                let leaders = Arc::try_unwrap(leaders).unwrap().into_inner().unwrap();
                seen_leaders.lock().unwrap().insert(leaders);
            },
            None,
        );
    }

    let seen_leaders = Arc::try_unwrap(seen_leaders).unwrap().into_inner().unwrap();
    let num_batches = num_threads / batch_size;
    let fact = |i: usize| (2..i+1).reduce(|acc, i| acc * i).unwrap_or(1);

    // Every execution should have one leader per batch
    assert!(seen_leaders.iter().all(|l| l.len() == num_batches));
    // There should be `num_threads permute num_batches` different leader vectors
    assert_eq!(seen_leaders.len(), fact(num_threads) / fact(num_threads - num_batches));
}

#[test]
fn barrier_test_nondeterministic_leader_1() {
    barrier_test_nondeterministic_leader(1, 4);
}

#[test]
fn barrier_test_nondeterministic_leader_2() {
    barrier_test_nondeterministic_leader(2, 4);
}

#[test]
fn barrier_test_nondeterministic_leader_4() {
    barrier_test_nondeterministic_leader(4, 4);
}

jamesbornholt · 2023-04-11T00:46:21Z

src/sync/barrier.rs

+        // Mark the task as requesting an explicit yield and schedule a thread switch, otherwise
+        // some schedulers would nearly always pick to continue executing the current task.
+        // Requesting a yield allows other threads to be the first one to wake up and grab the
+        // leader token.
+        ExecutionState::request_yield();


Unlike #101 I think we don't want to yield in this case. We needed it there for fairness because park gets used in spinloops, which are doomed by unfair schedules. But barriers aren't used like that—threads generally do useful work once unblocked by a barrier. (I do wish we had better rules for thinking about this, but it's all random in the end anyway, so it gets a bit handwave-y).

My intention here was to prevent an unfair schedule from making it so that it's (almost) always the last thread that gets to the barrier that becomes the leader, by giving every other thread a chance to wake up first and become the leader instead. In most situations I can think of though, it's probably already random enough which thread happens to call wait last, so the "extra" fairness here won't help much. I'll remove the request_yield here.

src/sync/barrier.rs

jamesbornholt · 2023-04-11T00:51:02Z

src/sync/barrier.rs

-            // Block current thread
-            assert!(state.waiters.insert(me)); // current thread shouldn't already be in the set
+        // The current thread shouldn't already be in `waiters`.
+        assert!(state.waiters.insert(me));


I think I marginally preferred the existing code that didn't needlessly add the current thread to the waiters set, so that the invariant was that all waiters are blocked. It ends up having roughly the same number of special cases for me, just different ones. But we should still assert here that me isn't already a waiter.

It ends up having roughly the same number of special cases for me, just different ones.

What special cases for me do you mean? Assuming I kill assert_eq!(t.blocked(), tid != me) (which is really just for debugging), then the only use of me in the function is in the line you commented on here which I could just inline as:

assert!(state.waiters.insert(ExecutionState::me()));

[...] the invariant was that all waiters are blocked.

Arguably, that still is an invariant. The barrier's state is never release while this isn't true (and every thread in waiters is always one that called wait).

jamesbornholt · 2023-04-11T00:54:12Z

src/sync/barrier.rs

@@ -22,9 +22,19 @@ impl BarrierWaitResult {



Could you write a comment like the one we have for Condvar explaining how/why the implementation works?

shuttle/src/sync/condvar.rs

Line 40 in 94b9d8b

// We implement `Condvar` by tracking the `CondvarWaitStatus` of each thread currently waiting on

It should probably be a doc comment on BarrierState? (the Condvar one should probably have been a doc comment on CondvarState).

Added a doc comment to BarrierState in latest revision.

The Rust standard library documentation for `sync::Barrier` (https://doc.rust-lang.org/std/sync/struct.Barrier.html) states the following about `Barrier::wait`: > Blocks the current thread until all threads have rendezvoused here. > Barriers are re-usable after all threads have rendezvoused once, > and can be used continuously. A single (arbitrary) thread will > receive a BarrierWaitResult that returns true from > BarrierWaitResult::is_leader() when returning from this function, > and all other threads will receive a result that will return false > from BarrierWaitResult::is_leader(). This documentation is a bit ambiguous. It says that "a single thread" becomes the leader after returning from `wait`, but it isn't clear if that means a single thread for the existence of the `Barrier`, or a single thread for each "batch" of threads released by the barrier. The latter interpretation seems to make the most sense given the context of being able to reuse barriers, and it seems more useful to be able to assume that a leader is set for each batch. That is in fact the behavior implemented by the standard library as evidenced by the implementation (https://doc.rust-lang.org/src/std/sync/barrier.rs.html#126-143) and this unit test, which reuses a barrier on a new thread: ``` let barrier = Barrier::new(1); assert!(barrier.wait().is_leader()); let thd = thread::spawn(move || { assert!(barrier.wait().is_leader()); }); thd.join().unwrap(); ``` Let's assume this behavior is correct, and that ambiguity in the Rust documentation would be resolved in favor of the existing implementation. Then the Shuttle behavior for `Barrier` is incorrect and causes this test to fail, because there is only a single, global `leader` field on the barrier which is set once the first time it's used. This commit fixes the problem by adding the notion of an "epoch" to the `Barrier` implementation, which allows there to be different leaders for each batch / group of threads released by the barrier. (This is similar to the standard library implementation's notion of generation). The commit also adds some new unit tests to capture the issue with the old behavior and prevent regressions. It also tweaks `Barrier::wait` to become an explicit thread yield point, in order to vary which thread becomes the leader rather than just choosing the first one that happens to call `wait`. Co-authored-by: James Bornholt <bornholt@amazon.com>

jamesbornholt reviewed Apr 11, 2023

View reviewed changes

lukenels force-pushed the barrier-fix branch from 994a57a to 8ca2883 Compare April 11, 2023 03:13

lukenels force-pushed the barrier-fix branch from 8ca2883 to b4f936a Compare April 11, 2023 03:14

lukenels requested a review from jamesbornholt April 13, 2023 17:42

jamesbornholt approved these changes Apr 15, 2023

View reviewed changes

jamesbornholt merged commit e939a38 into awslabs:main Apr 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix an issue with `sync::Barrier` leader selection with reuse #102

Fix an issue with `sync::Barrier` leader selection with reuse #102

lukenels commented Apr 10, 2023

jamesbornholt left a comment

jamesbornholt Apr 11, 2023

lukenels Apr 11, 2023

jamesbornholt Apr 11, 2023 •

edited

Loading

lukenels Apr 11, 2023

jamesbornholt Apr 11, 2023

lukenels Apr 11, 2023

Fix an issue with sync::Barrier leader selection with reuse #102

Fix an issue with sync::Barrier leader selection with reuse #102

Conversation

lukenels commented Apr 10, 2023

jamesbornholt left a comment

Choose a reason for hiding this comment

jamesbornholt Apr 11, 2023

Choose a reason for hiding this comment

lukenels Apr 11, 2023

Choose a reason for hiding this comment

jamesbornholt Apr 11, 2023 • edited Loading

Choose a reason for hiding this comment

lukenels Apr 11, 2023

Choose a reason for hiding this comment

jamesbornholt Apr 11, 2023

Choose a reason for hiding this comment

lukenels Apr 11, 2023

Choose a reason for hiding this comment

Fix an issue with `sync::Barrier` leader selection with reuse #102

Fix an issue with `sync::Barrier` leader selection with reuse #102

jamesbornholt Apr 11, 2023 •

edited

Loading