Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC3885: Sliding Sync Extension: To-Device messages #3885

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 81 additions & 0 deletions proposals/3885-sliding-sync-to-device.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# MSC3885: Sliding Sync Extension: To-Device

This MSC is an extension to [MSC3575](https://github.com/matrix-org/matrix-spec-proposals/pull/3575)
which includes support for to-device messages in the `/sync` response.

## Proposal

MSC3575 does not include support for to-device events. This extension will add support for
to-device events. There is a **critical** problem in the current to-device implementation, which is
that events are implicitly acknowledged when the user advances the `/sync` token. This causes problems
when clients need to have 2 or more sync streams open at a time, e.g a push notification process _and_
a main process. This can cause the two processes to race to fetch the to-device events, resulting in
the need for complex synchronisation rules to ensure the token is correctly and atomically exchanged
between processes. Sliding Sync's implementation of to-device events removes this requirement by
associating an _explicit_ token **just for to-device events**. This token uses the same terminology
of the current `/sync` implementation: `since` and `next_batch`.

The proposal is to introduce a new extension called `to_device` with the following request parameters:
```js
{
"enabled": true, // sticky
"limit": 100, // sticky, max number of events to return, server can override
"since": "some_token" // optional, can be omitted on initial sync / when extension is enabled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh?

Suggested change
"since": "some_token" // optional, can be omitted on initial sync / when extension is enabled
"since": "some_token" // optional, can be omitted on initial sync / when extension is disabled

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could get away with not having a separate token.

For example, in Synapse, the pos sync token is a vector clock and already has the to-device stream position in it. Since the Sliding Sync extension interface allows selectively enabling an extension, you can choose to acknowledge to-device messages by whether you enable the to_device extension ("enabled": true).

}
```
If `enabled` is `true`, then the sliding sync response includes the following response fields in
the `to_device` extension response:
```js
{
"next_batch": "some_token", // REQUIRED: even if no changes
"events": [ // optional, only if there are events since the last `since` value.
{ ... to device event ... }
]
}
```

The semantics of the `events` field is exactly the same as the current `/sync` implementation, as implemented
in v1.3 of the Client-Server Specification. The server MUST NOT send more than `limit` events: it may send
less than `limit` events if the value provided was too large.

The `events` are treated as "acknowledged" when the server receives a new request with the `since`
value set to the previous response's `next_batch` value. When this occurs, acknowledged events are
permanently deleted from the server, and MUST NOT be returned to the client should another request
with an older `since` value be sent.

If a response is lost, the client will retry the request with the same `since` value. When this happens,
the server MAY bundle together more events if they arrived in the interim.

## Potential issues

This proposal introduces a new sync token type for clients to manage. It is done for good reasons but
it does slightly increase the complexity of designing a correct client implementation.

## Alternatives

The alternative is to include to-device events like normal events in a different section, without
making use of dedicated `since` and `next_batch` tokens, instead relying on the `pos` value. This
would revert to-device events to be implicitly acknowledged, which has caused numerous [issues](https://github.com/vector-im/element-ios/issues/3817) in
the past.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand how this setup differs, in a meaningful way, with regards to the linked issue. Not that the issue defines the problem completely. Matter of fact the following definition sounds exactly the same as for sync v2:

The events are treated as "acknowledged" when the server receives a new request with the since value set to the previous response's next_batch value. When this occurs, acknowledged events are permanently deleted from the server, and MUST NOT be returned to the client should another request with an older since value be sent.

We're still going to have two processes that will try to fetch the to-device events. One of them might advance further then the other, resulting in the events being acknowledged and thus removed from the server.

If we have a bunch of tokens denoted by Tₙ. Where n is used to indicate the order in our sync token sequence. Once we use the token Tₙ₊₁ for a sync request, to-device events, that were part of the previous Tₙ sync response, will be deleted from the server.

This means if process A uses token Tₙ₊₁ before process B manages to use token Tₙ, then process B will miss all events that were part of the sync response for which token Tₙ was used by process A. This leads to a discrepancy in state of cryptographic objects of process A and process B. In the linked issue process B overwrites the newer state inserted by process A with an older and incomplete version of it.

Doesn't this scenario happen with this MSC as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes of course, but that has never been the point it has been trying to solve.

This MSC solves the problem by allowing a process to choose to opt in to to-device messages, and explicitly decide when to acknowledge said messages whilst still getting other unrelated events. Processes will need to agree which one will be in charge of processing them. You cannot safely have multiple sync streams at all in Sliding Sync.

I consulted the iOS folks when designing this MSC and this was all they needed from their end when I asked.


## Security considerations

No additional security considerations beyond what the current `/sync` implementation provides.

## Unstable prefix

No unstable prefix as Sliding Sync is still in review. To enable this extension, just add this to
your request JSON:
```js
{
"extensions": {
"to_device": {
"enabled": true
}
}
}
```

## Dependencies

This MSC builds on MSC3575, which at the time of writing has not yet been accepted into the spec.