Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refresh correct partial during refresh on drop #2613

Merged
merged 1 commit into from
Nov 6, 2020

Conversation

k-rus
Copy link
Contributor

@k-rus k-rus commented Nov 3, 2020

When drop_chunks is called on a hypertable manually or through
a retention policy, it initiates refresh in hypertable's continuous
aggregates. The refresh includes all buckets of each dropped chunk.
Some buckets are inside a dropped chunk only partially. In such case,
this commit fixes to refresh only the correct partials of the buckets.

The fix passes chunk id to refresh on drop of a chunk. The chunk id is
used to identify and update correct partials. This is a workaround
that the invalidation outside of the refreshed chunk doesn't update
the other partial, which is outside the refreshed chunk.

The commit also adds the test demonstrating correct refresh.

Fixes #2592

@k-rus k-rus requested a review from a team as a code owner November 3, 2020 08:38
@k-rus k-rus requested review from mkindahl, svenklemm, gayyappan, erimatnor, pmwkaa, WireBaron and cevian and removed request for a team November 3, 2020 08:38
@k-rus k-rus added this to the 2.0.0 milestone Nov 3, 2020
tsl/test/expected/continuous_aggs_drop_chunks.out Outdated Show resolved Hide resolved
tsl/test/expected/continuous_aggs_drop_chunks.out Outdated Show resolved Hide resolved
tsl/test/expected/continuous_aggs_drop_chunks.out Outdated Show resolved Hide resolved
tsl/test/expected/continuous_aggs_drop_chunks.out Outdated Show resolved Hide resolved
tsl/test/expected/continuous_aggs_drop_chunks.out Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Nov 3, 2020

Codecov Report

Merging #2613 into master will increase coverage by 0.12%.
The diff coverage is 82.14%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2613      +/-   ##
==========================================
+ Coverage   90.02%   90.14%   +0.12%     
==========================================
  Files         212      212              
  Lines       34291    34259      -32     
==========================================
+ Hits        30869    30883      +14     
+ Misses       3422     3376      -46     
Impacted Files Coverage Δ
src/cross_module_fn.c 68.29% <0.00%> (-1.71%) ⬇️
tsl/src/data_node_dispatch.c 97.02% <ø> (-0.25%) ⬇️
tsl/src/nodes/decompress_chunk/planner.c 97.03% <ø> (ø)
tsl/src/data_node.c 94.90% <90.90%> (-0.11%) ⬇️
src/chunk.c 94.65% <100.00%> (+0.08%) ⬆️
tsl/src/continuous_aggs/materialize.c 67.61% <100.00%> (+1.61%) ⬆️
tsl/src/continuous_aggs/refresh.c 99.38% <100.00%> (ø)
tsl/src/dist_util.c 93.28% <100.00%> (+0.04%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f8f53aa...a2aebad. Read the comment docs.

SELECT time_bucket(7, time_int) as bucket,
SUM(value), COUNT(value)
FROM conditions GROUP BY bucket WITH DATA;
SELECT materialization_hypertable_schema||'.'||materialization_hypertable_name AS mat_hyper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think mat_hyper is ever used. Is this query used at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! It is left over after I queries the materialized table in the view, but it was redundant after I improved tests to cover corner cases better. I will remove this.

(1 row)

INSERT INTO conditions
SELECT time_val, time_val % 4, 3.14 FROM generate_series(0,100,1) AS time_val;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to use the new chunks view to show the ranges of the chunks before dropping them so that one can easily compare the ranges to the buckets in the cagg.

-- This is the simplest case, when the updated bucket is inside a chunk, so it is expected
-- that the update is always reflected after the refresh on drop.
UPDATE conditions SET value = 4.00 WHERE time_int = 2;
-- This case updates values in the bucket, which affects two different partials in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case I was referring to previously was to drop these chunks separately, to see that the same bucket is updated twice; first when dropping the first chunk, then updated again when dropping the second chunk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, your suggestion was to drop from the chunks, which are dropped together. See your comment with your example: #2613 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion was to change the test so that the chunks are not dropped together. But I see that the requested case is now covered in a different way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mislead by your example, which was in the bucket of the chunks dropped together. Yes, the other test was also added.

bucket | sum | count
--------+-------+-------
0 | 22.84 | 7
7 | 23.7 | 7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you drop the two first chunks separately, I would expect bucket 7 to first refresh to 22.84 and then when the second chunk is dropped it would be 23.7. Why not test that case as I suggested?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, I think this is tested below... sorry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your case is already tested and I commented two times that I am adding it. Please, check the code

Copy link
Contributor

@gayyappan gayyappan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments requested.


void
continuous_agg_update_materialization(SchemaAndName partial_view,
SchemaAndName materialization_table, Name time_column_name,
InternalTimeRange new_materialization_range,
InternalTimeRange invalidation_range, int64 bucket_width)
InternalTimeRange invalidation_range, int64 bucket_width,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to follow the code path to figure out what it means to pass INVALID_CHUNK_ID as a parameter. Could you please add a comment stating the difference in behavior for valid/invalid chunk ids?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment

When drop_chunks is called on a hypertable manually or through
a retention policy, it initiates refresh in hypertable's continuous
aggregates. The refresh includes all buckets of each dropped chunk.
Some buckets are inside a dropped chunk only partially. In such case,
this commit fixes to refresh only the correct partials of the buckets.

The fix passes chunk id to refresh on drop of a chunk. The chunk id is
used to identify and update correct partials. This is a workaround
that the invalidation outside of the refreshed chunk doesn't update
the other partial, which is outside the refreshed chunk.

The commit also adds the test demonstrating correct refresh.

Fixes timescale#2592
@k-rus k-rus merged commit 307ac6a into timescale:master Nov 6, 2020
@k-rus k-rus deleted the cagg-dropchunks branch November 6, 2020 13:12
erimatnor added a commit to erimatnor/timescaledb that referenced this pull request Nov 11, 2020
**Minor Features**
* timescale#2627 Add optional user mappings support

**Bugfixes**
* timescale#2593 Set explicitly all lock parameters in alter_job
* timescale#2604 Fix chunk creation on hypertables with foreign key constraints
* timescale#2612 Optimize internal cagg_watermark function
* timescale#2613 Refresh correct partial during refresh on drop
* timescale#2617 Fix validation of available extensions on data node
* timescale#2619 Fix segfault in decompress_chunk for chunks with dropped columns
* timescale#2620 Fix DROP CASCADE for continuous aggregate
* timescale#2625 Fix subquery errors when using AsyncAppend
* timescale#2626 Fix incorrect total_table_pages setting for compressed scan
* timescale#2628 Stop recursion in cache invalidation
* timescale#2631 Fix segfault in decompress_chunk for chunks with dropped columns
erimatnor added a commit to erimatnor/timescaledb that referenced this pull request Nov 11, 2020
**Minor Features**
* timescale#2627 Add optional user mappings support
* timescale#2635 Add API to refresh continuous aggregate on chunk

**Bugfixes**
* timescale#2593 Set explicitly all lock parameters in alter_job
* timescale#2604 Fix chunk creation on hypertables with foreign key constraints
* timescale#2612 Optimize internal cagg_watermark function
* timescale#2613 Refresh correct partial during refresh on drop
* timescale#2617 Fix validation of available extensions on data node
* timescale#2619 Fix segfault in decompress_chunk for chunks with dropped columns
* timescale#2620 Fix DROP CASCADE for continuous aggregate
* timescale#2625 Fix subquery errors when using AsyncAppend
* timescale#2626 Fix incorrect total_table_pages setting for compressed scan
* timescale#2628 Stop recursion in cache invalidation
* timescale#2631 Fix segfault in decompress_chunk for chunks with dropped columns
erimatnor added a commit to erimatnor/timescaledb that referenced this pull request Nov 11, 2020
This release candidate contains bugfixes since the previous release
candidate.

**Minor Features**
* timescale#2627 Add optional user mappings support
* timescale#2635 Add API to refresh continuous aggregate on chunk

**Bugfixes**
* timescale#2593 Set explicitly all lock parameters in alter_job
* timescale#2604 Fix chunk creation on hypertables with foreign key constraints
* timescale#2612 Optimize internal cagg_watermark function
* timescale#2613 Refresh correct partial during refresh on drop
* timescale#2617 Fix validation of available extensions on data node
* timescale#2619 Fix segfault in decompress_chunk for chunks with dropped columns
* timescale#2620 Fix DROP CASCADE for continuous aggregate
* timescale#2625 Fix subquery errors when using AsyncAppend
* timescale#2626 Fix incorrect total_table_pages setting for compressed scan
* timescale#2628 Stop recursion in cache invalidation
* timescale#2631 Fix segfault in decompress_chunk for chunks with dropped columns
@erimatnor erimatnor mentioned this pull request Nov 11, 2020
erimatnor added a commit to erimatnor/timescaledb that referenced this pull request Nov 11, 2020
This release candidate contains bugfixes since the previous release
candidate, as well as additional minor features including support for
"user-mapping" authentication between access/data nodes and an
experimental API for refreshing continuous aggregates on individual
chunks.

**Minor Features**
* timescale#2627 Add optional user mappings support
* timescale#2635 Add API to refresh continuous aggregate on chunk

**Bugfixes**
* timescale#2593 Set explicitly all lock parameters in alter_job
* timescale#2604 Fix chunk creation on hypertables with foreign key constraints
* timescale#2612 Optimize internal cagg_watermark function
* timescale#2613 Refresh correct partial during refresh on drop
* timescale#2617 Fix validation of available extensions on data node
* timescale#2619 Fix segfault in decompress_chunk for chunks with dropped columns
* timescale#2620 Fix DROP CASCADE for continuous aggregate
* timescale#2625 Fix subquery errors when using AsyncAppend
* timescale#2626 Fix incorrect total_table_pages setting for compressed scan
* timescale#2628 Stop recursion in cache invalidation
* timescale#2631 Fix segfault in decompress_chunk for chunks with dropped columns
erimatnor added a commit to erimatnor/timescaledb that referenced this pull request Nov 11, 2020
This release candidate contains bugfixes since the previous release
candidate, as well as additional minor features including support for
"user-mapping" authentication between access/data nodes and an
experimental API for refreshing continuous aggregates on individual
chunks.

**Minor Features**
* timescale#2627 Add optional user mappings support
* timescale#2635 Add API to refresh continuous aggregate on chunk

**Bugfixes**
* timescale#2593 Set explicitly all lock parameters in alter_job
* timescale#2604 Fix chunk creation on hypertables with foreign key constraints
* timescale#2612 Optimize internal cagg_watermark function
* timescale#2613 Refresh correct partial during refresh on drop
* timescale#2617 Fix validation of available extensions on data node
* timescale#2619 Fix segfault in decompress_chunk for chunks with dropped columns
* timescale#2620 Fix DROP CASCADE for continuous aggregate
* timescale#2625 Fix subquery errors when using AsyncAppend
* timescale#2626 Fix incorrect total_table_pages setting for compressed scan
* timescale#2628 Stop recursion in cache invalidation
* timescale#2631 Fix segfault in decompress_chunk for chunks with dropped columns
erimatnor added a commit that referenced this pull request Nov 12, 2020
This release candidate contains bugfixes since the previous release
candidate, as well as additional minor features including support for
"user-mapping" authentication between access/data nodes and an
experimental API for refreshing continuous aggregates on individual
chunks.

**Minor Features**
* #2627 Add optional user mappings support
* #2635 Add API to refresh continuous aggregate on chunk

**Bugfixes**
* #2593 Set explicitly all lock parameters in alter_job
* #2604 Fix chunk creation on hypertables with foreign key constraints
* #2612 Optimize internal cagg_watermark function
* #2613 Refresh correct partial during refresh on drop
* #2617 Fix validation of available extensions on data node
* #2619 Fix segfault in decompress_chunk for chunks with dropped columns
* #2620 Fix DROP CASCADE for continuous aggregate
* #2625 Fix subquery errors when using AsyncAppend
* #2626 Fix incorrect total_table_pages setting for compressed scan
* #2628 Stop recursion in cache invalidation
erimatnor added a commit to erimatnor/timescaledb that referenced this pull request Nov 12, 2020
This release candidate contains bugfixes since the previous release
candidate, as well as additional minor features including support for
"user-mapping" authentication between access/data nodes and an
experimental API for refreshing continuous aggregates on individual
chunks.

**Minor Features**
* timescale#2627 Add optional user mappings support
* timescale#2635 Add API to refresh continuous aggregate on chunk

**Bugfixes**
* timescale#2593 Set explicitly all lock parameters in alter_job
* timescale#2604 Fix chunk creation on hypertables with foreign key constraints
* timescale#2610 Support analyze of internal compression table
* timescale#2612 Optimize internal cagg_watermark function
* timescale#2613 Refresh correct partial during refresh on drop
* timescale#2617 Fix validation of available extensions on data node
* timescale#2619 Fix segfault in decompress_chunk for chunks with dropped columns
* timescale#2620 Fix DROP CASCADE for continuous aggregate
* timescale#2625 Fix subquery errors when using AsyncAppend
* timescale#2626 Fix incorrect total_table_pages setting for compressed scan
* timescale#2628 Stop recursion in cache invalidation
erimatnor added a commit to erimatnor/timescaledb that referenced this pull request Nov 12, 2020
This release candidate contains bugfixes since the previous release
candidate, as well as additional minor features including support for
"user-mapping" authentication between access/data nodes and an
experimental API for refreshing continuous aggregates on individual
chunks.

**Minor Features**
* timescale#2627 Add optional user mappings support
* timescale#2635 Add API to refresh continuous aggregate on chunk

**Bugfixes**
* timescale#2560 Fix SCHEMA DROP CASCADE with continuous aggregates
* timescale#2593 Set explicitly all lock parameters in alter_job
* timescale#2604 Fix chunk creation on hypertables with foreign key constraints
* timescale#2610 Support analyze of internal compression table
* timescale#2612 Optimize internal cagg_watermark function
* timescale#2613 Refresh correct partial during refresh on drop
* timescale#2617 Fix validation of available extensions on data node
* timescale#2619 Fix segfault in decompress_chunk for chunks with dropped columns
* timescale#2620 Fix DROP CASCADE for continuous aggregate
* timescale#2625 Fix subquery errors when using AsyncAppend
* timescale#2626 Fix incorrect total_table_pages setting for compressed scan
* timescale#2628 Stop recursion in cache invalidation
erimatnor added a commit that referenced this pull request Nov 12, 2020
This release candidate contains bugfixes since the previous release
candidate, as well as additional minor features including support for
"user-mapping" authentication between access/data nodes and an
experimental API for refreshing continuous aggregates on individual
chunks.

**Minor Features**
* #2627 Add optional user mappings support
* #2635 Add API to refresh continuous aggregate on chunk

**Bugfixes**
* #2560 Fix SCHEMA DROP CASCADE with continuous aggregates
* #2593 Set explicitly all lock parameters in alter_job
* #2604 Fix chunk creation on hypertables with foreign key constraints
* #2610 Support analyze of internal compression table
* #2612 Optimize internal cagg_watermark function
* #2613 Refresh correct partial during refresh on drop
* #2617 Fix validation of available extensions on data node
* #2619 Fix segfault in decompress_chunk for chunks with dropped columns
* #2620 Fix DROP CASCADE for continuous aggregate
* #2625 Fix subquery errors when using AsyncAppend
* #2626 Fix incorrect total_table_pages setting for compressed scan
* #2628 Stop recursion in cache invalidation
erimatnor added a commit that referenced this pull request Nov 12, 2020
This release candidate contains bugfixes since the previous release
candidate, as well as additional minor features including support for
"user-mapping" authentication between access/data nodes and an
experimental API for refreshing continuous aggregates on individual
chunks.

**Minor Features**
* #2627 Add optional user mappings support
* #2635 Add API to refresh continuous aggregate on chunk

**Bugfixes**
* #2560 Fix SCHEMA DROP CASCADE with continuous aggregates
* #2593 Set explicitly all lock parameters in alter_job
* #2604 Fix chunk creation on hypertables with foreign key constraints
* #2610 Support analyze of internal compression table
* #2612 Optimize internal cagg_watermark function
* #2613 Refresh correct partial during refresh on drop
* #2617 Fix validation of available extensions on data node
* #2619 Fix segfault in decompress_chunk for chunks with dropped columns
* #2620 Fix DROP CASCADE for continuous aggregate
* #2625 Fix subquery errors when using AsyncAppend
* #2626 Fix incorrect total_table_pages setting for compressed scan
* #2628 Stop recursion in cache invalidation
erimatnor added a commit that referenced this pull request Nov 12, 2020
This release candidate contains bugfixes since the previous release
candidate, as well as additional minor features including support for
"user-mapping" authentication between access/data nodes and an
experimental API for refreshing continuous aggregates on individual
chunks.

**Minor Features**
* #2627 Add optional user mappings support
* #2635 Add API to refresh continuous aggregate on chunk

**Bugfixes**
* #2560 Fix SCHEMA DROP CASCADE with continuous aggregates
* #2593 Set explicitly all lock parameters in alter_job
* #2604 Fix chunk creation on hypertables with foreign key constraints
* #2610 Support analyze of internal compression table
* #2612 Optimize internal cagg_watermark function
* #2613 Refresh correct partial during refresh on drop
* #2617 Fix validation of available extensions on data node
* #2619 Fix segfault in decompress_chunk for chunks with dropped columns
* #2620 Fix DROP CASCADE for continuous aggregate
* #2625 Fix subquery errors when using AsyncAppend
* #2626 Fix incorrect total_table_pages setting for compressed scan
* #2628 Stop recursion in cache invalidation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refresh on drop can create outliers in continuous aggregates
3 participants