Cherry pick hot standby commits from gpdb #1152

fanfuxiaoran · 2025-06-10T10:05:10Z

Fixes #ISSUE_Number

What does this PR do?

This pr cherry-picks a bunch of commits from gpdb about hot standby feature.
The main changes are:

Enable to do read-only query on hot standby
Support to build the correct distributed snapshot on hot standby
Support read-committed/repeatable-read dtx isolation for hot standby

Also cherry-pick the commit about restore point.
Refactor restore point pausing logic for continuous archive recovery

Due to those commits are not continuous, so there are some conflicts with cbdb.

As some perl refactor code are not cherry-picked, fix the tab test in
current style.
Fix the isolation tests in current style
Fix system_views_gp.in.
Some pg_ views have been modified by cbdb: the gp_segment_id
colmun has been added to them. So they are failed to be transformed
from the pg_ views to gp_ views . So just remove them from
system_vies_gp.in. Maybe better to fix them later.
Put XLOG_LATESTCOMPLETED_GXID into STANDBY rmgr in xlog
in commit "Add XLOG_LATESTCOMPLETED_GXID"
Change the rmgr from XLOG to STANDBY as there is no room in the 4
high bits in xl_info. And it makes sense to put it into STANDBY rmgr
since it is used to make hot_standby snapshot

Type of Change

Bug fix (non-breaking change)
New feature (non-breaking change)
Breaking change (fix or feature with breaking changes)
Documentation update

Breaking Changes

Test Plan

Unit tests added/updated
Integration tests added/updated
Passed make installcheck
Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Followed contribution guide
Added/updated documentation
Reviewed code for security implications
Requested review from cloudberry committers

Additional Context

CI Skip Instructions

yjhjstz · 2025-06-11T03:13:26Z

Revert "Include distributed xid in transaction commit WAL in all cases"

this pr affact tpcc .

fanfuxiaoran · 2025-06-11T03:49:18Z

Revert "Include distributed xid in transaction commit WAL in all cases"

this pr affact tpcc .

Hmm... Could you explain more about it ? @yjhjstz The hot standby needs this commit to build distributed snapshot.

yjhjstz · 2025-06-11T04:06:24Z

due to increased size of wal, that make tpcc performance degradation.

fanfuxiaoran · 2025-06-11T06:01:02Z

due to increased size of wal, that make tpcc performance degradation.

Yes, It will indeed increase the size of the wal-log. Does it impact performance due to increasing the io of flushing wal-log? or the race of writing the wal-log?

my-ship-it · 2025-06-11T06:59:23Z

due to increased size of wal, that make tpcc performance degradation.

Yes, It will indeed increase the size of the wal-log. Does it impact performance due to increasing the io of flushing wal-log? or the race of writing the wal-log?

Based on previous tests, adding dxid information has a relatively large impact on TPCC because of wal size. However, to support HOT Standby, we need to add this information, which is a contradiction.

my-ship-it · 2025-06-11T07:04:43Z

due to increased size of wal, that make tpcc performance degradation.

Yes, It will indeed increase the size of the wal-log. Does it impact performance due to increasing the io of flushing wal-log? or the race of writing the wal-log?

Based on previous tests, adding dxid information has a relatively large impact on TPCC because of wal size. However, to support HOT Standby, we need to add this information, which is a contradiction.

Maybe it's possible to add a GUC to control the behavior, but it increases the complexity of understanding

yjhjstz · 2025-06-18T04:24:12Z

after merge #1078, there is conflicts at src/backend/access/transam/xlog.c.

fanfuxiaoran · 2025-06-18T05:59:16Z

after merge #1078, there is conflicts at src/backend/access/transam/xlog.c.

Thanks for pointing it out. I will take a look.

yjhjstz · 2025-06-18T15:21:49Z

src/backend/access/transam/xact.c

 	bool isDtxPrepared = isPreparedDtxTransaction();
+	DistributedTransactionId distrib_xid = getDistributedTransactionId();


can we use EnableHotStandby to control distrib_xid usage ?

Actually no. EnableHotStandby only will be true on hot_standby cluster. On the primary cluster,
it's always false. But the distrib_xid is written to xlog by primary cluster.

Discussed with Max. We prefer to use a Var , may be named 'enable_distri_xid' , it is like the var '
NEXTGXID', it will be both store in a configure file and xlog. Then it will be synced to standby
cluster.

The standby cluster will check the variable 'enable_distri_xid', if it's true, then the cluster
cannot be configured as hot_standby.

How do you think about this idea?

maybe we can optimize wal to counteract this influence.

we should reslove conflict then add installcheck-hot-standby pipeline.

Yes. I'm thinking about to add a job to run installcheck-hot-standby in pipeline? @edespino Could you help to add it?

add like below in .github/workflows/build-cloudberry.yml

{"test":"ic-isolation2", "make_configs":["src/test/isolation2:installcheck-isolation2"] },

This is the initial commit to support hot standby dispatch in GPDB. In this commit, hot standby dispatch is enabled when the hot_standby GUC is set to ON, and the standby coordinator can be connected and run queries on. Basic query dispatching and error handling cases are covered, please see the isolation2/hot_standby tests for those cases. Current limitations that will be addressed in coming works: * No read-committed isolation from global transaction, so e.g. a SELECT on standby QD could see partial INSERT results on the primary QD. * No repeatable-read isolation, so e.g., a UDF that runs multiple SELECTs on the standby QD could see different results from the SELECTs even they are the same. * No transaction block BEGIN ... END, and as a result, no cursor support or other things that depend on BEGIN...END. * Query conflict between primary and standby has not been tested yet. This will be done with/after the isolation work. Co-authored-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com> Co-authored-by: Jimmy Yih <jyih@vmware.com>

…all cases"" This reverts commit e1c99e4.

We currently have the GPDB-specific gp_pause_on_restore_point_replay hidden developer GUC which allows us to pause when replaying a restore point record. The logic was a bit flawed and needed some refactoring to accommodate the current hot standby work. These are the changes that were made: * The gp_pause_on_restore_point_replay GUC has been changed from a boolean type to a string type. This allows us to set exactly which restore point to pause on (assuming the restore points provided are unique). The user/application can update the GUC, do a reload, and resume WAL replay to advance towards the next restore point to pause on. * The pausing logic has been moved out of the xlog_redo() function and into its own separate function. If WAL replay has reached the restore point designated in the gp_pause_on_restore_point_replay GUC, it will now pause near the end of the main redo apply loop. When resumed (via a `SELECT pg_wal_replay_resume()` call), we check if a promotion has been requested. If there is a promotion request, then the continuous recovery target has been reached where we will then stop recovery and go through promotion by piggybacking on the existing recovery target logic. Co-authored-by: Jimmy Yih <jyih@vmware.com> Co-authored-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com>

To support hot standby, we need to reconstruct the state of running dtx at the time of checkpoint on the standby. It is key to the correctness of distributed snapshot the standby will use. One key piece of information is latestCompletedGxid - it provides the xmax of the snapshot. But unlike primary who just sets latestCompletedGxid = nextGxid - 1, the standby cannot use nextGxid. This is because nextGxid was bumped in the checkpoint and cannot represent the xmax of running dtx (see CreateCheckPoint). It is OK for the primary since it does not need to reconstruct the running dtx. So now we introduce a new XLOG type XLOG_LATESTCOMPLETED_GXID which directly writes the latestCompletedGxid at the checkpoint time. It is only written on QD and when hot standby is active. P.S. the alternative is to bump nextGxid at the startup instead of checkpoint, so its value can be used for the standby to initialize latestCompletedGxid. But for the primary, it would be impossible to know the correct number of gxid to bump, since gp_gxid_prefetch_num can change before restart. CBDB: Change the rmgr from XLOG to STANDBY as there is no room in the 4 high bits in xl_info. And it makes sense to put it into STANDBY rmgr since it is used to make hot_standby snapshot.

The previous few commits have removed some road blocks for supporting it. This commit mainly just deals two more aspects wrt to distributed transactions: * Initialize latestCompletedGxid during StartupXLOG, and update it while the standby replays new transactions. * Construct an in-progress dtx array when creating distributed snapshot according to the shmCommittedGxidArray[] we already keep in the standby. It was pondered whether or not to add a new WAL type XLOG_RUNNING_DISTRIBUTED_XACTS similar to XLOG_RUNNING_XACTS. But it seems unnecessary at the moment: we already have the running dtx information in the checkpoint record. The other information in the XLOG_RUNNING_XACTS record does not seem to be needed to support read-committed isolation. There are a few other callers of ProcArrayApplyRecoveryInfo() that relies on the XLOG_RUNNING_XACTS, but it doesn't seem we have a need to emulate them for dtx.

In previous commits we've supported hot standby dispatch and read-committed isolation. In order to support repeatable-read isolation, the only real complication is just to support the BEGIN...END block. The snapshot selection and usage for repeatable-read on a hot standby is exactly the same as a primary. And, the main difference between a single-statement transaction and a BEGIN...END block is just the DTX context of the QEs: in the former case the QEs are DTX_CONTEXT_QE_AUTO_COMMIT_IMPLICIT, but in the latter case they are DTX_CONTEXT_QE_TWO_PHASE_EXPLICIT_WRITER (see setupQEDtxContext()). We had Assert/ERROR in the code to assume that for EXPLICIT_WRITER, there's always a valid distributed xid for that transaction. However, that is not the case for hot standby: a standby never allocates an xid and there's definitely no use of an xid in its BEGIN...END block. Therefore, all we need to do is just make sure to not apply this assumption to hot standby. After that, supporting repeatable-read is a no-op. Another small change is to rename IS_STANDBY_QE to IS_HOT_STANDBY_QE to better correspond to IS_HOT_STANDBY_QD.

Fixed result differences and non-runnable tests. The notable ones are: 1. Backward cursor fetch is not supported in GPDB. Move to the "_disallowed" test. 2. Some ERROR messages are replaced by "not supported" ones in GPDB which should be fine. 3. "cannot execute SELECT FOR SHARE in a read-only transaction" is replaced by "cannot acquire lock mode ExclusiveLock ... during recovery". The reason is that the QD needs to acquire the lock if GDD isn't enabled. If later we found it needed we may try to change the error message for standby just to be a little more informative. 4. The "test_setup" was added to the standby schedule by mistake. Removing it. With that, we can add this test schedule to the hot standby pipeline job.

For the most part, query conflict on standby works w/o any changes in GPDB. Add tests for the expected beavior. One notable issue is that we are not considering distributed snapshot in the snapshot conflict detection at this point. We added test for that behavior too. Add these tests: 1. All the query conflict types mentioned in https://www.postgresql.org/docs/12/hot-standby.html#HOT-STANDBY-CONFLICT. There is actually one that's not mentioned there which is deadlock conflict. Still yet to produce a test for that. 2. GUCs hot_standby_feedback and vacuum_defer_cleanup_age. 3. System view gp_stat_database_conflicts which is a cluster-wide view of pg_stat_database_conflicts. Note that, in the test we need to get the max of conflict count among all segments to avoid flakiness. Ideally we should just have something like gp_stat_database_conflicts_summary to print the max counts, but we are not allowed to change catalog now. So leaving that as a FIXME item. 4. A test case showing distributed snapshot isn't taken into account when detecting snapshot conflict. This is a limitation that we'll address with a restore-point based dtx snapshot creation approach later.

For a selected list of PG system views (started with 'pg_'prefix ), we will create a corresponding 'gp_' view for each one in the list. Each 'gp_' view is basically a UNION ALL of the results of running the corresponding 'pg_' view on all segments (including the coordinator). Note that, these views do not aggregate the results. The aggregate version of the views will be named with a '_summary' appendix (such as 'gp_stat_all_tables_summary'). To add a new 'pg_' view to this list, simply put the name in file 'src/backend/catalog/system_views_gp.in'. This commit adds an initial list of views that we think make sense to have 'gp_' views. With this change, we also remove the existing definition of gp_stat_archiver view and let it be generated automatically. We also had gp_stat_replication but it carries additional column than pg_stat_replication so it cannot use the automatic way.

Some pg_ views have been modified by cbdb: the gp_segment_id colmun has been added to them. So they are failed to be transformed from the pg_ views to gp_ views (see commit 5028222620d410fe3d4c60f732a599e269006968) So just remove them from system_vies_gp.in. Maybe better to fix them later.

yjhjstz

LGTM

fanfuxiaoran force-pushed the cherry_pick_hot_standby branch from f1d72f8 to 1ee2fcc Compare June 10, 2025 10:09

fanfuxiaoran changed the title ~~[WIP]:Cherry pick hot standby commits from gpdb~~ Cherry pick hot standby commits from gpdb Jun 11, 2025

fanfuxiaoran requested a review from yjhjstz June 11, 2025 03:54

fanfuxiaoran requested a review from weinan003 June 16, 2025 05:59

yjhjstz reviewed Jun 18, 2025

View reviewed changes

huansong and others added 11 commits June 20, 2025 10:06

Revert "Revert "Include distributed xid in transaction commit WAL in …

7b084c6

…all cases"" This reverts commit e1c99e4.

Fix hot_standby isolation2 and regress tests

745bdb5

fanfuxiaoran force-pushed the cherry_pick_hot_standby branch from 1ee2fcc to 9fc0c93 Compare June 20, 2025 02:09

This was referenced Jun 23, 2025

[WIP] Add guc gp_segment_configuration_file for hot standby #1182

Open

[WIP] Add guc gp_hot_standby_snapshot_restore_point_name #1184

Open

yjhjstz approved these changes Jun 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cherry pick hot standby commits from gpdb #1152

Cherry pick hot standby commits from gpdb #1152

Uh oh!

fanfuxiaoran commented Jun 10, 2025 •

edited

Loading

Uh oh!

yjhjstz commented Jun 11, 2025

Uh oh!

fanfuxiaoran commented Jun 11, 2025

Uh oh!

yjhjstz commented Jun 11, 2025

Uh oh!

fanfuxiaoran commented Jun 11, 2025

Uh oh!

my-ship-it commented Jun 11, 2025 •

edited

Loading

Uh oh!

my-ship-it commented Jun 11, 2025

Uh oh!

yjhjstz commented Jun 18, 2025

Uh oh!

fanfuxiaoran commented Jun 18, 2025

Uh oh!

yjhjstz Jun 18, 2025

Uh oh!

fanfuxiaoran Jun 19, 2025

Uh oh!

yjhjstz Jun 19, 2025

Uh oh!

yjhjstz Jun 19, 2025

Uh oh!

fanfuxiaoran Jun 20, 2025

Uh oh!

yjhjstz Jun 20, 2025

Uh oh!

yjhjstz left a comment

Uh oh!

Uh oh!

		bool isDtxPrepared = isPreparedDtxTransaction();
		DistributedTransactionId distrib_xid = getDistributedTransactionId();

Cherry pick hot standby commits from gpdb #1152

Are you sure you want to change the base?

Cherry pick hot standby commits from gpdb #1152

Uh oh!

Conversation

fanfuxiaoran commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of Change

Breaking Changes

Test Plan

Impact

Checklist

Additional Context

CI Skip Instructions

Uh oh!

yjhjstz commented Jun 11, 2025

Uh oh!

fanfuxiaoran commented Jun 11, 2025

Uh oh!

yjhjstz commented Jun 11, 2025

Uh oh!

fanfuxiaoran commented Jun 11, 2025

Uh oh!

my-ship-it commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

my-ship-it commented Jun 11, 2025

Uh oh!

yjhjstz commented Jun 18, 2025

Uh oh!

fanfuxiaoran commented Jun 18, 2025

Uh oh!

yjhjstz Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

fanfuxiaoran Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

yjhjstz Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

yjhjstz Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

fanfuxiaoran Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

yjhjstz Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

yjhjstz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fanfuxiaoran commented Jun 10, 2025 •

edited

Loading

my-ship-it commented Jun 11, 2025 •

edited

Loading