Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out how to do stream positions in the syncserver #211

Closed
erikjohnston opened this issue Sep 6, 2017 · 4 comments
Closed

Figure out how to do stream positions in the syncserver #211

erikjohnston opened this issue Sep 6, 2017 · 4 comments

Comments

@erikjohnston
Copy link
Member

erikjohnston commented Sep 6, 2017

Right now its a hack, so we should look into getting something proper in place before we build too much on top of it.

Do we want to allow users to switch from one instance of the sync server to another? This would allow us to auto fail over if one of the sync servers died, but would entail having stream positions be globally defined, rather than specific to a particular instance.

We also need to figure out how to do this efficient with respect to database design.

@NegativeMjark
Copy link
Contributor

For more context the current syncserver uses a single auto-incrementing integer to represent the position a client is at in the stream of events. This integer is incremented whenever it receives a message from kafka.

However if we ran multiple syncserver instances then they could receive the messages from kafka in a different order. (kafka guarantees the order within a partition of a topic, but not the order between those partitions) This would result in the different instances assigning different client API stream positions to the same message.

So it would be impossible for a client to change which server it was querying from because that server wouldn't know what position in the stream it was at.

@erikjohnston
Copy link
Member Author

However if we ran multiple syncserver instances then they could receive the messages from kafka in a different order.

Presumably this is also applicable if we partitioned the room server?

@kegsay
Copy link
Member

kegsay commented Aug 26, 2020

As Erik points out, this is not unique to sharded syncservers, but also applies to sharded roomservers. This was considered when device keys were added, whereby I added the format $topic_id-$partition-$offset to sync tokens e.g dl-0-1452 with the intention that:

  • Sharded key servers would write to different partitions resulting in extra positions in the token e.g dl-0-1452.dl-1-1322,dl-2-784.
  • The token's IsAfter function returns true if any offset is higher (or if there are additional partitions).

This guarantees that we observe all updates from all sharded upstream components but does nothing to guarantee ordering. This is a larger piece of work fundamentally around graph linearisation -- matrix-org/gomatrixserverlib#187 . Assuming we do things "properly", we will have a deterministic algorithm to linearise the DAG, meaning the ordering we observe from Kafka is irrelevant from syncapis point of view. Using a similar technique for things outside the room DAG (e.g key updates) would probably work well, but ordering for key updates is less important than just being told about them in the first place.

This does mean that linearisation is not fixed and is instead fluid depending on subsequent messages, making it harder to pre-calculate and optimise.

@neilalexander any thoughts?

@kegsay
Copy link
Member

kegsay commented Dec 5, 2022

Years later, Dendrite is not focused on providing sharded syncapis at present. When we do, we will likely pin users to a specific syncapi instance, similar to Synapse. In a sliding sync future, this becomes much easier as there is no long-term positional information being kept on the client.

@kegsay kegsay closed this as completed Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants