Skip to content
This repository has been archived by the owner on Feb 8, 2023. It is now read-only.

DHT 2.0 #291

Open
Stebalien opened this issue Mar 29, 2018 · 12 comments
Open

DHT 2.0 #291

Stebalien opened this issue Mar 29, 2018 · 12 comments

Comments

@Stebalien
Copy link
Member

So, we've accumulated quite a few "if only"s in the DHT. We should start thinking about what we'd like in a DHT 2.0.

  • Request/message oriented. We should have the Network provide some form of (possibly unreliable) request + message system.
  • Only three requests: PUT_VALUES (batch), FIND_VALUES (batch), FIND_NODES (batch).
  • Explicitly stores signed PeerInfo records (possibly including the public key?).
  • Ideally, real IPRS.
@fahrradflucht
Copy link

Related to but not a duplicate of #162

@daviddias
Copy link
Member

While at it, let's make sure to consider how to make it private.

@Stebalien
Copy link
Member Author

While at it, let's make sure to consider how to make it private.

@diasdavid, @jbenet and I discussed this yesterday and he had several really nice ideas.

The core idea is that, instead of calling Put(key, value), we can call Put(Hash(key), Encrypt(DeriveSymmetricKey(key), value)). This obviously doesn't work for records with one value like IPNS records but works really well for both provider records and PeerInfo records. This obviously assumes that key has sufficient randomness.

He also proposed an extension where the record author can put an additional "challenge" key that the record author can prove knowledge of to fetch the record. However, that may not be that useful given that DHTs are supposed to allow neighbors to fetch keys from each other on join.

I also proposed an additional privacy enhancing extension where one can use Hash(key || date) so that provider records get moved around. Juan extended this with Hash(key || date || counter) allowing users to post multiple records in different areas of the DHT (helps both with privacy and, possibly, reliability).

This should make the network significantly harder to enumerate.


We can also use these features to get basic capabilities.

  1. It allows "hidden" peers where you need to know the peer's ID to find them. See: Multiple peer IDs, ephemeral IDs, and permanent/private IDs. libp2p/libp2p#37
  2. With some modifications to bitswap, it could avoid leaking information about what a peer is looking for to the peers from which it is requesting blocks.

@daviddias
Copy link
Member

//cc @mafintosh here who has been thinking a lot on how to make DHTs private and is currently building one (or something to add this functionality).

@Stebalien
Copy link
Member Author

Additional notes:

DoS

Because these records are encrypted, it will be impossible for DHT nodes to filter out "bad" peer routing records (or content routing records).

Solution: Allow peers to explicitly expose the unhashed key so that DHTs can perform additional validation.

DHT Delete

More generally, we've found that we'd like a DELETE function for the DHT.

  1. We could use it for the DoS issue above to clear out bad records without exposing the key. This would, unfortunately, require some additional crypto magic (unless we're fine exposing the key).
  2. We'd also like this to implement an "inbox" protocol. When peer A is offline, peer B could post an encrypted message for peer A to the DHT under /inbox/peerA. When peer A comes back online, it would download and then clear all messages on the DHT (not currently possible).

@Stebalien
Copy link
Member Author

Stebalien commented Aug 20, 2018

One issue that has been plaguing us in the DHT is adding new datatypes. Our plan is to fix this with IPRS and arbitrary WASM functions however, at the moment, this has some pretty significant drawbacks:

  1. Security. The VM will likely be really complex.
  2. Simplicity. Any DHT will now need access to a WASM VM. This will become less of an issue over time as WASM reaches greater adoption but is a major problem in the short term.
  3. Performance. At the moment, doing this would require spinning up a new WASM VM each time we want to validate something. Worse, we'd need a VM that can quickly verify signatures.

When thinking about libp2p/go-libp2p-kad-dht#189 (comment), I realized that there's a way to significantly improve the current situation without necessarily adding full IPRS: We can use one protocol per record type.

That is, we can have a "separate" DHT for each key type. At the end of the day, this won't really cost us anything, we're just creating multiple overlay networks. To add a new key type, we'd:

  1. Register a new "sub" DHT with a description of acceptable values/keys (validators/selectors).
  2. This new "sub" DHT would speak a new protocol (e.g., /p2p/dht/MyTypeName/1.0.0).

This would allow users to:

  1. Choose which DHTs they want to participate in.
  2. Add new "types" without convincing the entire network to support their type.

This should add almost no overhead as the nodes will keep the same IDs in all the DHTs (so the routing tables will be almost identical).

Note: This'll also allow applications to register external DHTs using the HTTP API (kind of like the ipfs p2p feature).


The difficult issue here will be finding these "sub" DHTs. That is, finding peers participating in them. This is basically the rendezvous/discovery issue but the tricky part is that these "sub" DHTs may be massive (the size of the DHT) to tiny (a few nodes).

@florianlenz
Copy link

@Stebalien for the messaging system quasar might is of interest for the DHT. It's very effective in the terms of routing, give it a quick read if you have the time. The only problem I see right now with quasar is the filter propagation which take a few minutes with the current mechanism (but I guess we can come up with sometime more effective).

@sanderpick
Copy link

Sounds pretty slick @Stebalien

@Stebalien
Copy link
Member Author

@florianlenz that looks like a pubsub system. We actually have a separate pubsub implementation and even have a "value store" adapter for pubsub (which we use for IPNS over pubsub).

The difference here is that a DHT stores values (temporarily) so members can join and leave and still get published values.

@daviddias daviddias added the topic/libp2p Topic libp2p label Nov 23, 2018
@jhiesey
Copy link

jhiesey commented Dec 22, 2018

So I finally got around to writing up a proposal for a refactor followed by a new protocol: libp2p/research-dht#8

Thoughts @Stebalien @daviddias @anacrolix

@daviddias
Copy link
Member

Ton of DHT notes at gpestana/notes#8 by @gpestana

@Stebalien
Copy link
Member Author

Looks like the TOR project had put a lot of thought into this: https://github.com/torproject/torspec/blob/master/rend-spec-v3.txt

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants