Skip to content
This repository has been archived by the owner on Feb 8, 2023. It is now read-only.

PubSub notes from the IPFS Workshop - recursion and corecursion, PubSub API and Self Certified Streams #154

Open
daviddias opened this issue Aug 5, 2016 · 4 comments
Labels

Comments

@daviddias
Copy link
Member

daviddias commented Aug 5, 2016

tl;dr; This issue serves to capture the ideas and observations we made and discussed during the IPFS Workshop, with regards to PubSub. The goal was to nail down the interface that apps might expect first (focus on the user) and then create the pubsub stack to support that.

note 1: The names of things might change.
note 2: This is based on the notes I took from all the discussions, my apologies in advanced if I got anything wrong, I'll update if that happens. Everyone that was present is welcome to edit, thank you

PubSub API - Self Certified Streams

A Self Certified Stream (SCS) is an authenticated and certified data structure uses Self Certified Links (SCL) to create an chain of of blocks.

Another type of a Self Certified Link is IPNS (also known as Self Certified Names), described by the IPRS spec.

In contrast with IPLD and its Selector, SCS use coSelectors to query the IPFS network to find what are the next blocks in the chain.

image

While selectors offer a way to do recursion(traversal) of any IPLD graph, coSelectors are form to show interest blocks that will follow (i.e append-only logs, blockchains)

This way, we get a coBitswap, that instead of sending a wantlist for specific blocks, sends a list for interest of blocks that point to the ones in the list. (coBitswap is a way to describe it, in the end, might all just be part of bitswap)

Participants Discovery (Peer Discovery)

To find the peers participating in the feed, namely: publishers and subscribers; we can use:

  • DHT - Storing records in the DHT that tell us which peers are the publishers and which peers subscribed, so that we can request the blocks that compose the Self Certified Stream. (Structured)
    • pros:
      • Deterministic - You know you will find the publisher
    • cons:
      • Slower as it requires to traverse the DHT each time we need to update who is publishing/subscrtibing
  • Probing - Send probes through our peers with the matching function to inform the network in what our peer is interested (Unstructured)
    • pros:
      • Can find peers that are close with the information that we need quicker
      • Leaves more room to make more interesting matching functions (that can be run on the receiver nodes, a la multicast matching)
    • cons:
      • This is also known as flooding and the reason behind that is that if TTL aren’t low, it has a huge bandwidth cost. Small TTL lead to false negatives.

image

There isn’t quite a one perfect size solution, most probably the best is using something hybrid or letting the application developer select what makes more sense for their app requirements. libp2p already supports multiple discovery mechanisms for Peer Discovery to live at the same time.

Subscription Models

There are 4 categories of subscription models in the pubsub literature, these are:

  • Topic - Each message has one topic
  • Content - Subscribers can show their interest by the content of the message instead of their topic
  • Type - Multiple topics, also known as tags
  • Concept - Semantic subscriptions (e.g interested in A if B happens)

We can achieve Type based subscriptions, the most requested ones, with CoSelectors, where a CoSelector can represent a whole chain of events that follow the same tag:

image

Reliability

One of the cruxes to achieve scalable pubsub, is the rule in how Acknowledgements are performed.
With Self Certified Streams, the receiver can infer if it has received all the packets in one chain by validating its ancestry, requesting again only the ones it missed. In some sense, it is the strategy of NACK, but where the packets hashes are the NACK and these are authenticated, giving the received the possibility to ask for any peer in the network for the missing packets.

Tree forming

Getting a sound algorithm for tree forming, would give IPFS Self Certified Streams a way to scale up without increasing insanely the network traffic generated. Our plan is to make the API solid first and then involve the networking and treeforming bits of the implementation, so that prototypes that need PubSub today can start testing it.

Other notes

Authenticated vs Certified

An authenticated data structure is something like an hash chain (merkledag). For pubsub, we need our data structure to be Certified (Cryptographically signed), otherwise it would be very easy to spam the network with unwanted events, since every node could publish to that stream.

——

Previous pubsub discussions

@jbenet
Copy link
Member

jbenet commented Aug 6, 2016

Very nice post!

Good summary of many discussions.

Some nits here.

  • Your tags diagram shows a "topic path", a hierarchy of topics. when we discussed, we agreed to treat "tags" as different values, such as /a, /b/, /c, /d, and "paths" as a hierarchy of topics (not necessarily tags) (/earth/portugal/lisbon).
  • Your "coselectors" are just paths. just call that a path. IIRC "coselector" as described by mikola/nicola was meant to be something different, an expression (dual to an IPLD Selector) that can express a filter that defines what events to listen to (similar to a subscription).
  • Note: an IPLD Selector includes a path, but the point is to be much more than a path. specifically, it is supposed to select an authenticated subgraph from a given authenticated dag root. This can be a path, but it's most useful to discuss in terms of non-chain subgraphs.
  • I think there are several superflous abstractions here, that define pieces that don't need to be defined (i.e. are not actually different).
  • For example, I disagree that "Self-certified streams" means anything special, i think you just mean "certified streams" (which just means signed streams) using a "self-certified name" (the link to the value which happens to be a stream). You can call that a "self-certified stream", but i think "authenticated stream" covers it much nicer.
  • Self-certified means "self-signed", meaning something similar to "self-signed certificates". It's useful in "Self-certified names" because we give the user authenticated data for a given name -- the "specialness" comes from leveraging given trust in the name to also mean trust in the value. I dont think "Self-certified streams" is a useful abstraction beyond just saying they're "authenticated streams", some of which are signed, and some of with are just hash-linked.
  • "Self Certified Links (SCL)" this is not different at all from a Self Certified Name. It's not useful to call it Self Certified Link. It's superfluous. (my reaction: "now you're just trolling")
  • Your "Authenticated vs Certified" section is incorrect. "Authenticated" has a formal definition, which includes hashing AND digital signatures. So "authenticated data structure" covers the signed ones too.
  • I'm not sure what the formal definition of "Certified" is, but i also have been using it to just mean signed (i.e."certificates" are signed attestations). It may be broader, not sure.
  • And we do want to be able to create chains that every node can publish to. that's okay, it is useful.

Authenticated I would put it like this, but i'm sure there are more succinct and correct definitions: A value is said to be authenticated when one can reliably prove it is the intended value (authentic). It is usually used in relation to values or messages created by a sender; the values are authenticated if we can reliably prove whether a value was originally constructed by the sender, or it is a different or modified value, not created by the sender.

  • This can be done with hashes (eg the sender can send a message m in an insecure channel, then give us the hash in a secure side channel. We can then prove whether a given byte sequence is really m).
  • Or it can be done with digital signatures (the sender digitally signs m, producing sm, then sends sm. Assuming we have sender's signing key, we can reliably prove sm is authentic).
  • Or it can be done with proofs. (eg sender agrees to send the sequence of all prime numbers. we can reliably prove -- though expensively -- that the sequence received is the sequence of all primes.)

@jbenet
Copy link
Member

jbenet commented Aug 6, 2016

Also-- this is not everything mikolalysenko and nicola talked about. I would really like to see their own original writeups, before they read this.

@daviddias
Copy link
Member Author

Thank your reviewing on the spot!

Your "Authenticated vs Certified" section is incorrect. "Authenticated" has a formal definition, which includes hashing AND digital signatures. So "authenticated data structure" covers the signed ones too.

I kind of got this definition from one of our chats (Authenticated -> Merkle links ; Certified -> Mazieres links). Could you double confirm this one? I'm good calling it either way, but better to have this as a 'glossary', so we do not leave exploits in our future meme transfers :). If "Authenticated -> Mazieres links", then what would be Certified for you?

And yes, I would really like to have @mikolalysenko and @nicola drop their notes and thoughts to move forward :) @whyrusleeping I'm sure you will have a say here too.

@jbenet
Copy link
Member

jbenet commented Aug 7, 2016

No, authenticated means BOTH signed and Merkle-linked. And really more:
anything you can prove is authentic (truly the originally intended thing).

Certified just means signed, I think. I'm not 100% sure.

It's not up to me, it's the terms used in cryptography.
On Sat, Aug 6, 2016 at 12:17 David Dias notifications@github.com wrote:

Thank your reviewing on the spot!

Your "Authenticated vs Certified" section is incorrect. "Authenticated"
has a formal definition, which includes hashing AND digital signatures. So
"authenticated data structure" covers the signed ones too.

I kind of got this definition from one of our chats (Authenticated ->
Merkle links ; Certified -> Mazieres links). Could you double confirm this
one? I'm good calling it either way, but better to have this as a
'glossary', so we do not leave exploits in our future meme transfers :). If
"Authenticated -> Mazieres links", then what would be Certified for you?

And yes, I would really like to have @mikolalysenko
https://github.com/mikolalysenko and @nicola https://github.com/nicola
drop their notes and thoughts to move forward :) @whyrusleeping
https://github.com/whyrusleeping I'm sure you will have a say here too.


You are receiving this because you commented.

Reply to this email directly, view it on GitHub
#154 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAIcof20-hqQB1O7W2Dw00VIH2DlGmjqks5qdLOvgaJpZM4Jd6IN
.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants