Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race on peer state gather #303

Merged
merged 3 commits into from
Dec 9, 2021
Merged

Conversation

hannahhoward
Copy link
Collaborator

Goals

Fix final race condition on peer stats gathering. Note that there's already a lock on all the methods of PeerTaskQueue itself, this just wraps one more level so that we can hold the lock a bit longer when gathering peer state

Copy link
Member

@rvagg rvagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a layer of protection, but I don't know how safe range rm.inProgressRequestStatuses is without a lock - if it's possible for a range query like this to hit a synchronization error then I guess this is needed - I just don't know Go well enough to know this.

@hannahhoward
Copy link
Collaborator Author

hannahhoward commented Dec 9, 2021

@rvagg the issue is actually making sure no changes happen to the peer task queue while reading the status.

the simplified version of the issue goes like this:

PeerState method (inside main request manager go routine):

  1. statuses = gather statuses
  2. queueState = get queuestate

worker queue:

  1. PopTasks()
    -----*** at this moment, queue state is out of sync with request state ***---
  2. ExecuteTask(task) {
    send to request manager which will resynhronizes state
    }

The issue here is PopTasks could be called between step 1 and step 2 of PeerState

@rvagg
Copy link
Member

rvagg commented Dec 9, 2021

oh, of course, even better. I didn't notice the fromPeerTopics call in there but that solves the full sync problem quite nicely.

@hannahhoward hannahhoward merged commit 9bd9ea9 into main Dec 9, 2021
hannahhoward added a commit that referenced this pull request Dec 9, 2021
feat(responsemanager): clarify response completion (#304)

only delete requests when they finish going over the network. put requests that are not processing
but still going over the network in a state of CompletingSend

feat(taskqueue): fix race on peer state gather (#303)
@mvdan mvdan deleted the feat/peer-diagnostics-race branch December 15, 2021 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants