Skip to content

MDEV-20065 parallel replication for galera slave #459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 10.11
Choose a base branch
from

Conversation

temeo
Copy link

@temeo temeo commented Dec 2, 2024

When replicating transactions from parallel slave replication
processing, Galera must respect the commit order of the parallel
slave replication. In the current implementation this is done by
calling wait_for_prior_commit() before the write set is
replicated and certified in before-prepare processing. This
however establishes a critical section which is held over
whole Galera replication step, and the commit rate will be
limited by Galera replication latency.

In order to allow concurrency in Galera replication step, the
critical section must be released at earliest point where Galera
can guarantee sequential consistency for replicated write sets.
This change passes a callback to release the critical section
by calling wakeup_subsequent_commits() to Galera library, which will
call the callback once the correct replication order can be established.
This functionality will be available from Galera 26.4.22 onwards.

Note that call to wakeup_subsequent_commits() at this stage is
safe from group commit point of view as Galera uses separate
wait_for_commit context to control commit ordering.

Copy link
Member

@ayurchen ayurchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice

@ayurchen
Copy link
Member

ayurchen commented Dec 3, 2024

test_pr

@temeo temeo force-pushed the 10.11-galera-as-parallel-slave branch 2 times, most recently from 7f2c170 to 68e2085 Compare December 28, 2024 10:25
@temeo temeo changed the title WIP: 10.11 galera as parallel slave MDEV-20065 parallel replication for galera slave Dec 28, 2024
if ((ret= thd->wsrep_cs().before_prepare()) == 0)
wsrep::provider::seq_cb_t seq_cb{
thd, wsrep_parallel_slave_wakeup_subsequent_commits};
if ((ret= thd->wsrep_cs().before_prepare(&seq_cb)) == 0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@temeo Here we provide callback to Galera library but how the sequential consistency is maintained because this is called out-of-order? When the actual critical section is released, is it then on before_commit?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The callback is effective only for parallel slave workers, see https://github.com/codership/mariadb-server/pull/459/files#diff-adadb13c7a06c1a6d6be490308b4cb130848c6e0b2ccca48dde12efeda10a498R3926. The workers call wait_for_prior_commit() from thd->wsrep_parallel_slave_wait_for_prior_commit() couple of lines above this, which makes workers' calls to before_prepare() serialized.

Copy link

@janlindstrom janlindstrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are test failures on debug build.

@temeo
Copy link
Author

temeo commented Dec 30, 2024

There are test failures on debug build.

This looks like a regression in 10.11 head, not specific to this PR. The set of failing tests from latest 10.11 head is

Failing test(s): galera.galera_as_master galera.MDEV-20616 galera.galera_FK_duplicate_client_insert galera.galera_bf_kill_debug galera.galera_sequences galera.galera_restart_replica galera_sr.GCF-1018B galera_sr.mysql-wsrep-features#165 wsrep.MDEV-23081 wsrep.wsrep-recover-v25 wsrep.wsrep-recover

@janlindstrom
Copy link

Based on https://buildbot.mariadb.net/buildbot/grid?category=main&branch=10.11 galera_sequences failed after merge but I could not find others in this list.

@sjaakola
Copy link

sjaakola commented Jan 8, 2025

mtr testing locally current 10.11 HEAD vs this branch, shows that th PR has three excessive test failures:

wsrep.MDEV-23081 wsrep.wsrep-recover-v25 wsrep.wsrep-recover

@temeo temeo force-pushed the 10.11-galera-as-parallel-slave branch from 68e2085 to e5f4806 Compare January 8, 2025 11:13
When replicating transactions from parallel slave replication
processing, Galera must respect the commit order of the parallel
slave replication. In the current implementation this is done by
calling `wait_for_prior_commit()` before the write set is
replicated and certified in before-prepare processing. This
however establishes a critical section which is held over
whole Galera replication step, and the commit rate will be
limited by Galera replication latency.

In order to allow concurrency in Galera replication step, the
critical section must be released at earliest point where Galera
can guarantee sequential consistency for replicated write sets.
This change passes a callback to release the critical section
by calling `wakeup_subsequent_commits()` to Galera library, which will
call the callback once the correct replication order can be established.
This functionality will be available from Galera 26.4.22 onwards.

Note that call to `wakeup_subsequent_commits()` at this stage is
safe from group commit point of view as Galera uses separate
`wait_for_commit` context to control commit ordering.
@temeo temeo force-pushed the 10.11-galera-as-parallel-slave branch from e5f4806 to d2fa3d5 Compare February 8, 2025 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants