Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: correctly track CPLs of never refreshed buckets #71

Merged
merged 2 commits into from
Apr 6, 2020

Conversation

Stebalien
Copy link
Member

To determine how many buckets we should refresh, we:

  1. Look at the max bucket.
  2. Look at the peer with the greatest common prefix in the max bucket.
  3. Return the "last refresh" times for all buckets between 0 and min(maxPeer, 15)

BREAKING: this returns a slice of times instead of CplRefresh objects because we want to refresh all buckets between 0 and the max CPL.

To determine how many buckets we should refresh, we:

1. Look at the max bucket.
2. Look at the peer with the greatest common prefix in the max bucket.
3. Return the "last refresh" times for all buckets between 0 and `min(maxPeer, 15)`

BREAKING: this returns a slice of times instead of CplRefresh objects because we
want to refresh _all_ buckets between 0 and the max CPL.
@Stebalien Stebalien requested review from aarshkshah1992 and aschmahmann and removed request for aarshkshah1992 April 5, 2020 23:47
defer rt.tabLock.RUnlock()

for i := len(rt.buckets) - 1; i >= 0; i-- {
if rt.buckets[i].len() > 0 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#72

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While #72 is a valid point are we sure we actually want to return a smaller number of tracked buckets and ramp back up? Upsides: Scales nicely as the network grows without us touching anything. Downsides: Maybe we want some minimal number of buckets to ramp up our scale.

Feel free to ignore this comment or just say "meh, this was is probably fine"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of buckets shouldn't matter in practice, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of buckets we have shouldn't matter. We might care about the number of refreshes as long, but as long as it's ~15-20 and not ~100) it's probably fine.

I'm not sure if there's any tweaking to be done over how we do initial bootstrapping so that we fill our buckets fairly quickly. I suspect in the real network this will be much less problematic then the test network though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that we:

  1. Start with one big bucket.
  2. Split the big bucket as we add peers.
  3. Never shrink back down to 1 bucket.

In terms of refreshing the routing table, the logic in this PR will refresh every CPL, not regardless of how many buckets we actually have. In practice, the number of buckets we have shouldn't affect anything but memory usage, unless we're doing something wrong.

Copy link
Contributor

@aschmahmann aschmahmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM aside from the comment I added about a potential off by one error

table_refresh.go Outdated Show resolved Hide resolved
defer rt.tabLock.RUnlock()

for i := len(rt.buckets) - 1; i >= 0; i-- {
if rt.buckets[i].len() > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While #72 is a valid point are we sure we actually want to return a smaller number of tracked buckets and ramp back up? Upsides: Scales nicely as the network grows without us touching anything. Downsides: Maybe we want some minimal number of buckets to ramp up our scale.

Feel free to ignore this comment or just say "meh, this was is probably fine"

table_refresh.go Show resolved Hide resolved

// maxCommonPrefix returns the maximum common prefix length between any peer in
// the bucket with the target ID.
func (b *bucket) maxCommonPrefix(target ID) uint {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting... it means that if we remove the maxCpl (because we fix the RPCs in the DHT to allow for querying random KadIDs) then if there's 2^10 peers in the network and bucket number 10 has someone really close to us (e.g. 20 shared bits) that I'm now going to be querying 20 buckets. Not sure if that's really what we want, is it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably query until we fail to fill a bucket.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean do a query, wait for it to finish and stop if the percentage of the bucket that is full after the query drops below x%? That seems reasonable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, something like that. But I'm fine cutting an RC without that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem at all since we max out at 15 anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll file an issue to track it

@Stebalien
Copy link
Member Author

LGTM from @aschmahmann over slack.

@Stebalien Stebalien merged commit 7438bac into master Apr 6, 2020
@Stebalien
Copy link
Member Author

@aarshkshah1992 I'd like your thoughts on this post-merge. I'm merging now so I can cut an RC, but we may want to tweak the behavior here.

@aarshkshah1992
Copy link
Contributor

aarshkshah1992 commented Apr 6, 2020

@Stebalien While this looks good to me for now, we should revisit/refactor this for 0.6. I've created a meta-issue to track it and will have more thoughts on how to improve it once I start working on the issue and dig deeper.

libp2p/go-libp2p-kad-dht#556

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants