-
Notifications
You must be signed in to change notification settings - Fork 879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use consistent snapshots when scanning metadata #5428
Merged
erimatnor
merged 1 commit into
timescale:main
from
erimatnor:invalidate-catalog-snapshot-in-scanner
Mar 21, 2023
Merged
Use consistent snapshots when scanning metadata #5428
erimatnor
merged 1 commit into
timescale:main
from
erimatnor:invalidate-catalog-snapshot-in-scanner
Mar 21, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
erimatnor
force-pushed
the
invalidate-catalog-snapshot-in-scanner
branch
from
March 10, 2023 14:36
af95d2e
to
413d12f
Compare
Codecov Report
@@ Coverage Diff @@
## main #5428 +/- ##
==========================================
- Coverage 90.72% 90.68% -0.05%
==========================================
Files 228 228
Lines 53096 53092 -4
==========================================
- Hits 48172 48146 -26
- Misses 4924 4946 +22
... and 5 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
I think this might also be the real reason for #4900 which occasionally still happens in CI |
svenklemm
approved these changes
Mar 11, 2023
erimatnor
force-pushed
the
invalidate-catalog-snapshot-in-scanner
branch
from
March 17, 2023 08:37
413d12f
to
6bba6c2
Compare
jnidzwetzki
approved these changes
Mar 17, 2023
fabriziomello
approved these changes
Mar 18, 2023
erimatnor
force-pushed
the
invalidate-catalog-snapshot-in-scanner
branch
2 times, most recently
from
March 20, 2023 15:44
bc1ca48
to
d63ad8d
Compare
Invalidate the catalog snapshot in the scanner to ensure that any lookups into `pg_catalog` uses a snapshot that is consistent with the snapshot used to scan TimescaleDB metadata. This fixes an issue where a chunk could be looked up without having a proper relid filled in, causing an assertion failure (`ASSERT_IS_VALID_CHUNK`). When a chunk is scanned and found (in `chunk_tuple_found()`), the Oid of the chunk table is filled in using `get_relname_relid()`, which could return InvalidOid due to use of a different snapshot when scanning `pg_class`. Calling `InvalidateCatalogSnapshot()` before starting the metadata scan in `Scanner` ensures the pg_catalog snapshot used is refreshed. Due to the difficulty of reproducing this MVCC issue, no regression or isolation test is provided, but it is easy to hit this bug when doing highly concurrent COPY:s into a distributed hypertable.
erimatnor
force-pushed
the
invalidate-catalog-snapshot-in-scanner
branch
from
March 21, 2023 07:53
d63ad8d
to
57dc6bd
Compare
akuzm
added a commit
to akuzm/timescaledb
that referenced
this pull request
Apr 19, 2023
## 2.10.2 (2023-04-20) **Bugfixes** * timescale#5410 Fix file trailer handling in the COPY fetcher * timescale#5446 Add checks for malloc failure in libpq calls * timescale#5233 Out of on_proc_exit slots on guc license change * timescale#5428 Use consistent snapshots when scanning metadata * timescale#5499 Do not segfault on large histogram() parameters * timescale#5470 Ensure superuser perms during copy/move chunk * timescale#5500 Fix when no FROM clause in continuous aggregate definition * timescale#5433 Fix join rte in CAggs with joins * timescale#5556 Fix duplicated entries on timescaledb_experimental.policies view * timescale#5462 Fix segfault after column drop on compressed table * timescale#5543 Copy scheduled_jobs list before sorting it * timescale#5497 Allow named time_bucket arguments in Cagg definition * timescale#5544 Fix refresh from beginning of Continuous Aggregate with variable time bucket * timescale#5558 Use regrole for job owner * timescale#5542 Enable indexscan on uncompressed part of partially compressed chunks **Thanks** * @nikolaps for reporting an issue with the COPY fetcher * @S-imo-n for reporting the issue on Background Worker Scheduler crash * @geezhu for reporting issue on segfault in historgram() * @mwahlhuetter for reporting the issue with joins in CAggs * @mwahlhuetter for reporting issue with duplicated entries on timescaledb_experimental.policies view * @H25E for reporting error refreshing from beginning of a Continuous Aggregate with variable time bucket
akuzm
added a commit
that referenced
this pull request
Apr 19, 2023
## 2.10.2 (2023-04-20) **Bugfixes** * #5410 Fix file trailer handling in the COPY fetcher * #5446 Add checks for malloc failure in libpq calls * #5233 Out of on_proc_exit slots on guc license change * #5428 Use consistent snapshots when scanning metadata * #5499 Do not segfault on large histogram() parameters * #5470 Ensure superuser perms during copy/move chunk * #5500 Fix when no FROM clause in continuous aggregate definition * #5433 Fix join rte in CAggs with joins * #5556 Fix duplicated entries on timescaledb_experimental.policies view * #5462 Fix segfault after column drop on compressed table * #5543 Copy scheduled_jobs list before sorting it * #5497 Allow named time_bucket arguments in Cagg definition * #5544 Fix refresh from beginning of Continuous Aggregate with variable time bucket * #5558 Use regrole for job owner * #5542 Enable indexscan on uncompressed part of partially compressed chunks **Thanks** * @nikolaps for reporting an issue with the COPY fetcher * @S-imo-n for reporting the issue on Background Worker Scheduler crash * @geezhu for reporting issue on segfault in historgram() * @mwahlhuetter for reporting the issue with joins in CAggs * @mwahlhuetter for reporting issue with duplicated entries on timescaledb_experimental.policies view * @H25E for reporting error refreshing from beginning of a Continuous Aggregate with variable time bucket
akuzm
added a commit
that referenced
this pull request
Apr 19, 2023
## 2.10.2 (2023-04-20) **Bugfixes** * #5410 Fix file trailer handling in the COPY fetcher * #5446 Add checks for malloc failure in libpq calls * #5233 Out of on_proc_exit slots on guc license change * #5428 Use consistent snapshots when scanning metadata * #5499 Do not segfault on large histogram() parameters * #5470 Ensure superuser perms during copy/move chunk * #5500 Fix when no FROM clause in continuous aggregate definition * #5433 Fix join rte in CAggs with joins * #5556 Fix duplicated entries on timescaledb_experimental.policies view * #5462 Fix segfault after column drop on compressed table * #5543 Copy scheduled_jobs list before sorting it * #5497 Allow named time_bucket arguments in Cagg definition * #5544 Fix refresh from beginning of Continuous Aggregate with variable time bucket * #5558 Use regrole for job owner * #5542 Enable indexscan on uncompressed part of partially compressed chunks **Thanks** * @nikolaps for reporting an issue with the COPY fetcher * @S-imo-n for reporting the issue on Background Worker Scheduler crash * @geezhu for reporting issue on segfault in historgram() * @mwahlhuetter for reporting the issue with joins in CAggs * @mwahlhuetter for reporting issue with duplicated entries on timescaledb_experimental.policies view * @H25E for reporting error refreshing from beginning of a Continuous Aggregate with variable time bucket
akuzm
added a commit
that referenced
this pull request
Apr 19, 2023
## 2.10.2 (2023-04-20) **Bugfixes** * #5410 Fix file trailer handling in the COPY fetcher * #5446 Add checks for malloc failure in libpq calls * #5233 Out of on_proc_exit slots on guc license change * #5428 Use consistent snapshots when scanning metadata * #5499 Do not segfault on large histogram() parameters * #5470 Ensure superuser perms during copy/move chunk * #5500 Fix when no FROM clause in continuous aggregate definition * #5433 Fix join rte in CAggs with joins * #5556 Fix duplicated entries on timescaledb_experimental.policies view * #5462 Fix segfault after column drop on compressed table * #5543 Copy scheduled_jobs list before sorting it * #5497 Allow named time_bucket arguments in Cagg definition * #5544 Fix refresh from beginning of Continuous Aggregate with variable time bucket * #5558 Use regrole for job owner * #5542 Enable indexscan on uncompressed part of partially compressed chunks **Thanks** * @nikolaps for reporting an issue with the COPY fetcher * @S-imo-n for reporting the issue on Background Worker Scheduler crash * @geezhu for reporting issue on segfault in historgram() * @mwahlhuetter for reporting the issue with joins in CAggs * @mwahlhuetter for reporting issue with duplicated entries on timescaledb_experimental.policies view * @H25E for reporting error refreshing from beginning of a Continuous Aggregate with variable time bucket
akuzm
added a commit
to akuzm/timescaledb
that referenced
this pull request
Apr 19, 2023
## 2.10.2 (2023-04-20) **Bugfixes** * timescale#5410 Fix file trailer handling in the COPY fetcher * timescale#5446 Add checks for malloc failure in libpq calls * timescale#5233 Out of on_proc_exit slots on guc license change * timescale#5428 Use consistent snapshots when scanning metadata * timescale#5499 Do not segfault on large histogram() parameters * timescale#5470 Ensure superuser perms during copy/move chunk * timescale#5500 Fix when no FROM clause in continuous aggregate definition * timescale#5433 Fix join rte in CAggs with joins * timescale#5556 Fix duplicated entries on timescaledb_experimental.policies view * timescale#5462 Fix segfault after column drop on compressed table * timescale#5543 Copy scheduled_jobs list before sorting it * timescale#5497 Allow named time_bucket arguments in Cagg definition * timescale#5544 Fix refresh from beginning of Continuous Aggregate with variable time bucket * timescale#5558 Use regrole for job owner * timescale#5542 Enable indexscan on uncompressed part of partially compressed chunks **Thanks** * @nikolaps for reporting an issue with the COPY fetcher * @S-imo-n for reporting the issue on Background Worker Scheduler crash * @geezhu for reporting issue on segfault in historgram() * @mwahlhuetter for reporting the issue with joins in CAggs * @mwahlhuetter for reporting issue with duplicated entries on timescaledb_experimental.policies view * @H25E for reporting error refreshing from beginning of a Continuous Aggregate with variable time bucket
akuzm
added a commit
that referenced
this pull request
Apr 20, 2023
## 2.10.2 (2023-04-20) **Bugfixes** * #5410 Fix file trailer handling in the COPY fetcher * #5446 Add checks for malloc failure in libpq calls * #5233 Out of on_proc_exit slots on guc license change * #5428 Use consistent snapshots when scanning metadata * #5499 Do not segfault on large histogram() parameters * #5470 Ensure superuser perms during copy/move chunk * #5500 Fix when no FROM clause in continuous aggregate definition * #5433 Fix join rte in CAggs with joins * #5556 Fix duplicated entries on timescaledb_experimental.policies view * #5462 Fix segfault after column drop on compressed table * #5543 Copy scheduled_jobs list before sorting it * #5497 Allow named time_bucket arguments in Cagg definition * #5544 Fix refresh from beginning of Continuous Aggregate with variable time bucket * #5558 Use regrole for job owner * #5542 Enable indexscan on uncompressed part of partially compressed chunks **Thanks** * @nikolaps for reporting an issue with the COPY fetcher * @S-imo-n for reporting the issue on Background Worker Scheduler crash * @geezhu for reporting issue on segfault in historgram() * @mwahlhuetter for reporting the issue with joins in CAggs * @mwahlhuetter for reporting issue with duplicated entries on timescaledb_experimental.policies view * @H25E for reporting error refreshing from beginning of a Continuous Aggregate with variable time bucket
akuzm
added a commit
that referenced
this pull request
Apr 20, 2023
## 2.10.2 (2023-04-20) **Bugfixes** * #5410 Fix file trailer handling in the COPY fetcher * #5446 Add checks for malloc failure in libpq calls * #5233 Out of on_proc_exit slots on guc license change * #5428 Use consistent snapshots when scanning metadata * #5499 Do not segfault on large histogram() parameters * #5470 Ensure superuser perms during copy/move chunk * #5500 Fix when no FROM clause in continuous aggregate definition * #5433 Fix join rte in CAggs with joins * #5556 Fix duplicated entries on timescaledb_experimental.policies view * #5462 Fix segfault after column drop on compressed table * #5543 Copy scheduled_jobs list before sorting it * #5497 Allow named time_bucket arguments in Cagg definition * #5544 Fix refresh from beginning of Continuous Aggregate with variable time bucket * #5558 Use regrole for job owner * #5542 Enable indexscan on uncompressed part of partially compressed chunks **Thanks** * @nikolaps for reporting an issue with the COPY fetcher * @S-imo-n for reporting the issue on Background Worker Scheduler crash * @geezhu for reporting issue on segfault in historgram() * @mwahlhuetter for reporting the issue with joins in CAggs * @mwahlhuetter for reporting issue with duplicated entries on timescaledb_experimental.policies view * @H25E for reporting error refreshing from beginning of a Continuous Aggregate with variable time bucket
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Invalidate the catalog snapshot in the scanner to ensure that any lookups into
pg_catalog
uses a snapshot that is consistent with the snapshot used to scan TimescaleDB metadata.This fixes an issue where a chunk could be looked up without having a proper relid filled in, causing an assertion failure (
ASSERT_IS_VALID_CHUNK
). When a chunk is scanned and found (inchunk_tuple_found()
), the Oid of the chunk table is filled in usingget_relname_relid()
, which could return InvalidOid due to use of a different snapshot when scanningpg_class
. CallingInvalidateCatalogSnapshot()
before starting the metadata scan inScanner
ensures the pg_catalog snapshot used is refreshed.Due to the difficulty of reproducing this MVCC issue, there is not yet a regression or isolation test for it, but is easy to hit when doing repeated concurrent COPY into a distributed hypertable.