-
Notifications
You must be signed in to change notification settings - Fork 166
Cherry pick hot standby commits from gpdb #1152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
f1d72f8
to
1ee2fcc
Compare
Revert "Include distributed xid in transaction commit WAL in all cases" this pr affact tpcc . |
Hmm... Could you explain more about it ? @yjhjstz The hot standby needs this commit to build distributed snapshot. |
due to increased size of wal, that make tpcc performance degradation. |
Yes, It will indeed increase the size of the wal-log. Does it impact performance due to increasing the io of flushing wal-log? or the race of writing the wal-log? |
Based on previous tests, adding dxid information has a relatively large impact on TPCC because of wal size. However, to support HOT Standby, we need to add this information, which is a contradiction. |
Maybe it's possible to add a GUC to control the behavior, but it increases the complexity of understanding |
after merge #1078, there is conflicts at |
Thanks for pointing it out. I will take a look. |
bool isDtxPrepared = isPreparedDtxTransaction(); | ||
DistributedTransactionId distrib_xid = getDistributedTransactionId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use EnableHotStandby
to control distrib_xid usage ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually no. EnableHotStandby only will be true on hot_standby cluster. On the primary cluster,
it's always false. But the distrib_xid is written to xlog by primary cluster.
Discussed with Max. We prefer to use a Var , may be named 'enable_distri_xid' , it is like the var '
NEXTGXID', it will be both store in a configure file and xlog. Then it will be synced to standby
cluster.
The standby cluster will check the variable 'enable_distri_xid', if it's true, then the cluster
cannot be configured as hot_standby.
How do you think about this idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can optimize wal to counteract this influence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should reslove conflict then add installcheck-hot-standby pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I'm thinking about to add a job to run installcheck-hot-standby in pipeline? @edespino Could you help to add it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add like below in .github/workflows/build-cloudberry.yml
{"test":"ic-isolation2",
"make_configs":["src/test/isolation2:installcheck-isolation2"]
},
This is the initial commit to support hot standby dispatch in GPDB. In this commit, hot standby dispatch is enabled when the hot_standby GUC is set to ON, and the standby coordinator can be connected and run queries on. Basic query dispatching and error handling cases are covered, please see the isolation2/hot_standby tests for those cases. Current limitations that will be addressed in coming works: * No read-committed isolation from global transaction, so e.g. a SELECT on standby QD could see partial INSERT results on the primary QD. * No repeatable-read isolation, so e.g., a UDF that runs multiple SELECTs on the standby QD could see different results from the SELECTs even they are the same. * No transaction block BEGIN ... END, and as a result, no cursor support or other things that depend on BEGIN...END. * Query conflict between primary and standby has not been tested yet. This will be done with/after the isolation work. Co-authored-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com> Co-authored-by: Jimmy Yih <jyih@vmware.com>
…all cases"" This reverts commit e1c99e4.
We currently have the GPDB-specific gp_pause_on_restore_point_replay hidden developer GUC which allows us to pause when replaying a restore point record. The logic was a bit flawed and needed some refactoring to accommodate the current hot standby work. These are the changes that were made: * The gp_pause_on_restore_point_replay GUC has been changed from a boolean type to a string type. This allows us to set exactly which restore point to pause on (assuming the restore points provided are unique). The user/application can update the GUC, do a reload, and resume WAL replay to advance towards the next restore point to pause on. * The pausing logic has been moved out of the xlog_redo() function and into its own separate function. If WAL replay has reached the restore point designated in the gp_pause_on_restore_point_replay GUC, it will now pause near the end of the main redo apply loop. When resumed (via a `SELECT pg_wal_replay_resume()` call), we check if a promotion has been requested. If there is a promotion request, then the continuous recovery target has been reached where we will then stop recovery and go through promotion by piggybacking on the existing recovery target logic. Co-authored-by: Jimmy Yih <jyih@vmware.com> Co-authored-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com>
To support hot standby, we need to reconstruct the state of running dtx at the time of checkpoint on the standby. It is key to the correctness of distributed snapshot the standby will use. One key piece of information is latestCompletedGxid - it provides the xmax of the snapshot. But unlike primary who just sets latestCompletedGxid = nextGxid - 1, the standby cannot use nextGxid. This is because nextGxid was bumped in the checkpoint and cannot represent the xmax of running dtx (see CreateCheckPoint). It is OK for the primary since it does not need to reconstruct the running dtx. So now we introduce a new XLOG type XLOG_LATESTCOMPLETED_GXID which directly writes the latestCompletedGxid at the checkpoint time. It is only written on QD and when hot standby is active. P.S. the alternative is to bump nextGxid at the startup instead of checkpoint, so its value can be used for the standby to initialize latestCompletedGxid. But for the primary, it would be impossible to know the correct number of gxid to bump, since gp_gxid_prefetch_num can change before restart. CBDB: Change the rmgr from XLOG to STANDBY as there is no room in the 4 high bits in xl_info. And it makes sense to put it into STANDBY rmgr since it is used to make hot_standby snapshot.
The previous few commits have removed some road blocks for supporting it. This commit mainly just deals two more aspects wrt to distributed transactions: * Initialize latestCompletedGxid during StartupXLOG, and update it while the standby replays new transactions. * Construct an in-progress dtx array when creating distributed snapshot according to the shmCommittedGxidArray[] we already keep in the standby. It was pondered whether or not to add a new WAL type XLOG_RUNNING_DISTRIBUTED_XACTS similar to XLOG_RUNNING_XACTS. But it seems unnecessary at the moment: we already have the running dtx information in the checkpoint record. The other information in the XLOG_RUNNING_XACTS record does not seem to be needed to support read-committed isolation. There are a few other callers of ProcArrayApplyRecoveryInfo() that relies on the XLOG_RUNNING_XACTS, but it doesn't seem we have a need to emulate them for dtx.
In previous commits we've supported hot standby dispatch and read-committed isolation. In order to support repeatable-read isolation, the only real complication is just to support the BEGIN...END block. The snapshot selection and usage for repeatable-read on a hot standby is exactly the same as a primary. And, the main difference between a single-statement transaction and a BEGIN...END block is just the DTX context of the QEs: in the former case the QEs are DTX_CONTEXT_QE_AUTO_COMMIT_IMPLICIT, but in the latter case they are DTX_CONTEXT_QE_TWO_PHASE_EXPLICIT_WRITER (see setupQEDtxContext()). We had Assert/ERROR in the code to assume that for EXPLICIT_WRITER, there's always a valid distributed xid for that transaction. However, that is not the case for hot standby: a standby never allocates an xid and there's definitely no use of an xid in its BEGIN...END block. Therefore, all we need to do is just make sure to not apply this assumption to hot standby. After that, supporting repeatable-read is a no-op. Another small change is to rename IS_STANDBY_QE to IS_HOT_STANDBY_QE to better correspond to IS_HOT_STANDBY_QD.
Fixed result differences and non-runnable tests. The notable ones are: 1. Backward cursor fetch is not supported in GPDB. Move to the "_disallowed" test. 2. Some ERROR messages are replaced by "not supported" ones in GPDB which should be fine. 3. "cannot execute SELECT FOR SHARE in a read-only transaction" is replaced by "cannot acquire lock mode ExclusiveLock ... during recovery". The reason is that the QD needs to acquire the lock if GDD isn't enabled. If later we found it needed we may try to change the error message for standby just to be a little more informative. 4. The "test_setup" was added to the standby schedule by mistake. Removing it. With that, we can add this test schedule to the hot standby pipeline job.
For the most part, query conflict on standby works w/o any changes in GPDB. Add tests for the expected beavior. One notable issue is that we are not considering distributed snapshot in the snapshot conflict detection at this point. We added test for that behavior too. Add these tests: 1. All the query conflict types mentioned in https://www.postgresql.org/docs/12/hot-standby.html#HOT-STANDBY-CONFLICT. There is actually one that's not mentioned there which is deadlock conflict. Still yet to produce a test for that. 2. GUCs hot_standby_feedback and vacuum_defer_cleanup_age. 3. System view gp_stat_database_conflicts which is a cluster-wide view of pg_stat_database_conflicts. Note that, in the test we need to get the max of conflict count among all segments to avoid flakiness. Ideally we should just have something like gp_stat_database_conflicts_summary to print the max counts, but we are not allowed to change catalog now. So leaving that as a FIXME item. 4. A test case showing distributed snapshot isn't taken into account when detecting snapshot conflict. This is a limitation that we'll address with a restore-point based dtx snapshot creation approach later.
For a selected list of PG system views (started with 'pg_'prefix ), we will create a corresponding 'gp_' view for each one in the list. Each 'gp_' view is basically a UNION ALL of the results of running the corresponding 'pg_' view on all segments (including the coordinator). Note that, these views do not aggregate the results. The aggregate version of the views will be named with a '_summary' appendix (such as 'gp_stat_all_tables_summary'). To add a new 'pg_' view to this list, simply put the name in file 'src/backend/catalog/system_views_gp.in'. This commit adds an initial list of views that we think make sense to have 'gp_' views. With this change, we also remove the existing definition of gp_stat_archiver view and let it be generated automatically. We also had gp_stat_replication but it carries additional column than pg_stat_replication so it cannot use the automatic way.
Some pg_ views have been modified by cbdb: the gp_segment_id colmun has been added to them. So they are failed to be transformed from the pg_ views to gp_ views (see commit 5028222620d410fe3d4c60f732a599e269006968) So just remove them from system_vies_gp.in. Maybe better to fix them later.
1ee2fcc
to
9fc0c93
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes #ISSUE_Number
What does this PR do?
This pr cherry-picks a bunch of commits from gpdb about hot standby feature.
The main changes are:
Also cherry-pick the commit about restore point.
Refactor restore point pausing logic for continuous archive recovery
Due to those commits are not continuous, so there are some conflicts with cbdb.
current style.
Some pg_ views have been modified by cbdb: the gp_segment_id
colmun has been added to them. So they are failed to be transformed
from the pg_ views to gp_ views . So just remove them from
system_vies_gp.in. Maybe better to fix them later.
in commit "Add XLOG_LATESTCOMPLETED_GXID"
Change the rmgr from XLOG to STANDBY as there is no room in the 4
high bits in xl_info. And it makes sense to put it into STANDBY rmgr
since it is used to make hot_standby snapshot
Type of Change
Breaking Changes
Test Plan
make installcheck
make -C src/test installcheck-cbdb-parallel
Impact
Performance:
User-facing changes:
Dependencies:
Checklist
Additional Context
CI Skip Instructions