Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide more useful feedback for deadlocks #67

Open
jamesbornholt opened this issue May 5, 2022 · 2 comments
Open

Provide more useful feedback for deadlocks #67

jamesbornholt opened this issue May 5, 2022 · 2 comments

Comments

@jamesbornholt
Copy link
Member

jamesbornholt commented May 5, 2022

From @bkragl in #66:

One thing I was thinking about is whether it makes sense to have a feature in Shuttle that does precise on-the-fly deadlock detection. Right now there might be a deadlock among some tasks early on in an execution, but we need to wait until the end of the execution to detect it. It might be helpful for debugging to stop the execution right when the deadlock happens.

The nice thing about the current check is that it is completely generic. What I'm proposing would require knowledge about the synchronization primitives (e.g., a combined resource allocation graph for both Mutex and RwLock).

An intermediate step would be to print stack traces for deadlocked threads, so at least you know where to start the debugging process (e.g., which locks are involved).

@jorajeev
Copy link
Member

jorajeev commented May 5, 2022

@bkragl Currently, Shuttle only checks full system deadlocks (i.e., no task/thread is runnable). It seems you're suggesting we check partial deadlocks (i.e., a cycle in the Resource Allocation Graph)? That would require identifying all the shared resources (not just Mutex and RwLock).

@bkragl
Copy link
Contributor

bkragl commented May 6, 2022

Yes, I understand all that. To be clear, I don't think the current check is limited in its "deadlock detecting power". In case of a partial deadlock, the execution will eventually unfold to a full deadlock, unless it panics for a different reason or is in an infinite loop (which also indicates a different issue, as Shuttle test harnesses are usually expected to terminate).

There are two separate points: the cost of deadlock detection (i.e., waiting for a partial deadlock to become a full deadlock) and easy of debugging. To simplify the debugging of a deadlock, James's suggestion with stack traces sounds good. What we could also do is to print a truncated schedule that stops right after the last step of any task in the deadlock set. So when replaying the schedule, we get the experience of detecting the deadlock immediately (and won't accidentally debug ahead of the partial deadlock). Now the only remaining question is, whether it makes sense to optimize the initial detection of the partial deadlock (in practice). I don't have any evidence, so I'm not sure. In any event, the tracking in an RAG could be limited to some resources that we identify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants