Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow MQTT QoS 0 subscribers to reconnect (backport #10244) #10252

Merged
merged 2 commits into from
Dec 28, 2023

Conversation

mergify[bot]
Copy link

@mergify mergify bot commented Dec 28, 2023

This is an automatic backport of pull request #10244 done by Mergify.
Cherry-pick of 78b4fcc has failed:

On branch mergify/bp/v3.12.x/pr-10244
Your branch is up to date with 'origin/v3.12.x'.

You are currently cherry-picking commit 78b4fcc899.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   deps/rabbit/src/rabbit_amqqueue.erl
	modified:   deps/rabbitmq_mqtt/test/shared_SUITE.erl

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   deps/rabbitmq_mqtt/src/rabbit_mqtt_qos0_queue.erl

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/github/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally


Mergify commands and options

More conditions and actions can be found in the documentation.

You can also trigger Mergify actions by commenting on this pull request:

  • @Mergifyio refresh will re-evaluate the rules
  • @Mergifyio rebase will rebase this PR on its base branch
  • @Mergifyio update will merge the base branch into this PR
  • @Mergifyio backport <destination> will backport this PR on <destination> branch

Additionally, on Mergify dashboard you can:

  • look at your merge queues
  • generate the Mergify configuration with the config editor.

Finally, you can contact us on https://mergify.com

The solution in #10203 has the following issues:
1. Bindings can be left ofter in Mnesia table rabbit_durable_queue.
One solution to 1. would be to first delete the old queue via
`rabbit_amqqueue:internal_delete(Q, User, missing_owner)`
and subsequently declare the new queue via
`rabbit_amqqueue:internal_declare(Q, false)`
However, even then, it suffers from:
2. Race conditions between `rabbit_amqqueue:on_node_down/1`
and `rabbit_mqtt_qos0_queue:declare/2`:
`rabbit_amqqueue:on_node_down/1` could first read the queue records that
need to be deleted, thereafter `rabbit_mqtt_qos0_queue:declare/2` could
re-create the queue owned by the new connection PID, and `rabbit_amqqueue:on_node_down/1`
could subsequently delete the re-created queue.

Unfortunately, `rabbit_amqqueue:on_node_down/1` does not delete
transient queues in one isolated transaction. Instead it first reads
queues and subsequenlty deletes queues in batches making it prone to
race conditions.

Ideally, this commit deletes all rabbit_mqtt_qos0_queue queues of the
node that has crashed including their bindings.
However, doing so in one transaction is risky as there may be millions
of such queues and the current code path applies the same logic on all
live nodes resulting in conflicting transactions and therefore a long
database operation.

Hence, this commit uses the simplest approach which should still be
safe:
Do not remove rabbit_mqtt_qos0_queue queues if a node crashes.
Other live nodes will continue to route to these dead queues.
That should be okay, given that the rabbit_mqtt_qos0_queue clients auto
confirm.
Continuing routing however has the effect of counting as routing result
for AMQP 0.9.1 `mandatory` property.
If an MQTT client re-connects to a live node with the same client ID,
the new node will delete and then re-create the queue.
Once the crashed node comes back online, it will clean up its leftover
queues and bindings.

(cherry picked from commit 78b4fcc)

# Conflicts:
#	deps/rabbitmq_mqtt/src/rabbit_mqtt_qos0_queue.erl
@mergify mergify bot added the conflicts label Dec 28, 2023
@mergify mergify bot assigned ansd Dec 28, 2023
@michaelklishin
Copy link
Member

The conflict is with #10203, #10205

@michaelklishin
Copy link
Member

The Selenium failure in v3.12.x is known and not related. It was addressed in main but v3.12.x needs some more work (likely because some Selenium suite changes were not previously backported).

@michaelklishin michaelklishin added this to the 3.12.12 milestone Dec 28, 2023
@michaelklishin michaelklishin merged commit 91b2964 into v3.12.x Dec 28, 2023
15 of 16 checks passed
@michaelklishin michaelklishin deleted the mergify/bp/v3.12.x/pr-10244 branch December 28, 2023 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants