Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: fix possible deadlock in accounts_db #10469

Merged
merged 1 commit into from
Jun 10, 2020

Conversation

BurtonQin
Copy link
Contributor

@BurtonQin BurtonQin commented Jun 9, 2020

Problem

Similar to #10466

In runtime/src/accounts_db.rs:
  1. conflicting lock order:
    storage is before accounts_index in generate_index()
    pub fn generate_index(&self) {
    let storage = self.storage.read().unwrap();
    let mut slots: Vec<Slot> = storage.0.keys().cloned().collect();
    slots.sort();
    let mut accounts_index = self.accounts_index.write().unwrap();

    However, the lock order conflicts with any other place involving the two locks, e.g.
    let accounts_index = self.accounts_index.read().unwrap();
    // Calculate store counts as if everything was purged
    // Then purge if we can
    let mut store_counts: HashMap<AppendVecId, usize> = HashMap::new();
    let storage = self.storage.read().unwrap();

    The fix is to swap the order of the two locks.
  2. double read lock:
    The first read lock is in generate_index()
    pub fn generate_index(&self) {
    let storage = self.storage.read().unwrap();

    Then scan_account_storage() is called at L1823
    .scan_account_storage(

    The second lock is at L969
    let storage_maps: Vec<Arc<AccountStorageEntry>> = self
    .storage
    .read()

    The fix is to use a lock propagation way to avoid recursively acquiring the same lock.
    Separate the lock acquisition from scan_account_storage and pass the ref to read lock guard into the new scan_account_storage_inner.

There remains one possible double read lock that I find really tricky to tackle.

In ledger/src/blockstore.rs:
  1. The first read lock is at 1501 in get_confirmed_block():
    let lowest_cleanup_slot = self.lowest_cleanup_slot.read().unwrap();
  2. Two places where get_slot_entries() is called:
    let slot_entries = self.get_slot_entries(slot, 0)?;

    let parent_slot_entries = self
    .get_slot_entries(slot_meta.parent_slot, 0)
  3. get_slot_entries_with_shred_info():
    pub fn get_slot_entries(&self, slot: Slot, shred_start_index: u64) -> Result<Vec<Entry>> {
    self.get_slot_entries_with_shred_info(slot, shred_start_index, false)
  4. get_completed_ranges():
    let (completed_ranges, slot_meta) = self.get_completed_ranges(slot, start_index)?;
  5. The second read lock is at L1915:
    let lowest_cleanup_slot = self.lowest_cleanup_slot.read().unwrap();

Summary of Changes

  1. Swap the order of storage and accounts_index in generate_index().
  2. Add an inner version of scan_account_storage to separate the lock acquisition.

@ryoqun ryoqun added the CI Pull Request is ready to enter CI label Jun 9, 2020
@solana-grimes solana-grimes removed the CI Pull Request is ready to enter CI label Jun 9, 2020
@codecov
Copy link

codecov bot commented Jun 9, 2020

Codecov Report

Merging #10469 into master will increase coverage by 0.0%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##           master   #10469   +/-   ##
=======================================
  Coverage    81.6%    81.6%           
=======================================
  Files         296      296           
  Lines       69320    69323    +3     
=======================================
+ Hits        56622    56629    +7     
+ Misses      12698    12694    -4     

@mvines mvines requested a review from ryoqun June 9, 2020 14:43
@ryoqun
Copy link
Member

ryoqun commented Jun 10, 2020

@BurtonQin Thanks for continued contribution! I'll take a look at this and the previous one today. Also, are you planning to submit more PRs? I want to know that, so that I'll be prepared to review other upcoming PRs as well. :)

@BurtonQin
Copy link
Contributor Author

No, these are all that the current detector can find.

Copy link
Member

@ryoqun ryoqun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!! I'm merging!

There remains one possible double read lock that I find really tricky to tackle.
...

For this, I'll tackle later. Thanks for reporting all of these!

@ryoqun ryoqun merged commit 0a638a8 into solana-labs:master Jun 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants