Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Fix federated joins when the first server in the list is not in the room #15074

Merged
merged 1 commit into from
Feb 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/15074.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix a long-standing bug where federated joins would fail if the first server in the list of servers to try is not in the room.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this happen if you follow an outdated link with (now) incorrect via declared? I'm struggling to figure out the user impact.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One might also have a valid via, but the first server in the list is temporarily offline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One might also have a valid via, but the first server in the list is temporarily offline.

Oh, but that would result in a timeout and be handled differently by the joining server. I.e. I don't think this comment is relevant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this happen if you follow an outdated link with (now) incorrect via declared? I'm struggling to figure out the user impact.

That's one way you can end up with a stale via list. Another way might be through space listings? I seem to remember a list of vias being stored in the space's state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the children state events store vias in them that could get out of date.

11 changes: 5 additions & 6 deletions synapse/federation/federation_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -884,7 +884,7 @@ async def _try_destination_list(
if 500 <= e.code < 600:
failover = True

elif e.code == 400 and synapse_error.errcode in failover_errcodes:
elif 400 <= e.code < 500 and synapse_error.errcode in failover_errcodes:
failover = True

elif failover_on_unknown_endpoint and self._is_unknown_endpoint(
Expand Down Expand Up @@ -999,14 +999,13 @@ async def send_request(destination: str) -> Tuple[str, EventBase, RoomVersion]:

return destination, ev, room_version

failover_errcodes = {Codes.NOT_FOUND}
# MSC3083 defines additional error codes for room joins. Unfortunately
# we do not yet know the room version, assume these will only be returned
# by valid room versions.
failover_errcodes = (
(Codes.UNABLE_AUTHORISE_JOIN, Codes.UNABLE_TO_GRANT_JOIN)
if membership == Membership.JOIN
else None
)
if membership == Membership.JOIN:
failover_errcodes.add(Codes.UNABLE_AUTHORISE_JOIN)
failover_errcodes.add(Codes.UNABLE_TO_GRANT_JOIN)

return await self._try_destination_list(
"make_" + membership,
Expand Down