Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow up to INT16_MAX rows per batch #6734

Merged
merged 16 commits into from
Mar 14, 2024
Merged

Allow up to INT16_MAX rows per batch #6734

merged 16 commits into from
Mar 14, 2024

Conversation

akuzm
Copy link
Member

@akuzm akuzm commented Mar 5, 2024

We're getting requests to make the number of rows per batch configurable, so this PR makes some refactoring to make that easier. E.g. it removes some large stack allocations and change width of variables to accomodate this. It doesn't actually add an option to change the compressed batch size yet.

The new potential upper limit is INT16_MAX rows per batch, because dictionary compression uses signed int16 indexes. The actual target compressed batch size is kept hardcoded at 1000 rows for now.

I also had to make some tweaks to memory management to avoid performance regressions, but after that there are no changes in tsbench that I could reproduce: https://grafana.ops.savannah-dev.timescale.com/d/fasYic_4z/compare-akuzm?orgId=1&var-branch=All&var-run1=3331&var-run2=3341&var-threshold=0&var-use_historical_thresholds=true&var-threshold_expression=2.5%20%2A%20percentile_cont%280.90%29&var-exact_suite_version=false

Disable-check: force-changelog-file

Copy link

codecov bot commented Mar 5, 2024

Codecov Report

Attention: Patch coverage is 94.44444% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 80.97%. Comparing base (59f50f2) to head (b56e88d).
Report is 70 commits behind head on main.

Files Patch % Lines
tsl/src/compression/simple8b_rle_bitmap.h 94.11% 0 Missing and 2 partials ⚠️
tsl/src/compression/simple8b_rle_decompress_all.h 80.00% 0 Missing and 2 partials ⚠️
tsl/src/compression/gorilla_impl.c 85.71% 0 Missing and 1 partial ⚠️
tsl/src/nodes/decompress_chunk/exec.c 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6734      +/-   ##
==========================================
+ Coverage   80.06%   80.97%   +0.90%     
==========================================
  Files         190      191       +1     
  Lines       37181    36383     -798     
  Branches     9450     9447       -3     
==========================================
- Hits        29770    29462     -308     
- Misses       2997     3162     +165     
+ Partials     4414     3759     -655     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@akuzm akuzm marked this pull request as draft March 5, 2024 15:07
@akuzm akuzm marked this pull request as ready for review March 7, 2024 08:57
@akuzm akuzm self-assigned this Mar 8, 2024
Comment on lines +98 to +99
int ts_guc_debug_toast_tuple_target = 128;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need a guc for this for testing, we can just set the corresponding option directly in tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked when we had inheritance, but now the compressed chunk is re-created at compression w/o inheritance, so I'm not sure it's even possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GUC is debug-only, in release builds it's just a static variable that is never changed.

* We use this limit for sanity checks in case the compressed data is corrupt.
*/
#define GLOBAL_MAX_ROWS_PER_COMPRESSION 1015
#define GLOBAL_MAX_ROWS_PER_COMPRESSION INT16_MAX
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if these large batches would be compatible with the TAM implementation indexes. As far as I remember, we have only a limited number of bits to address the tuples.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to tell until we have a design doc at least. This refactoring doesn't really enforce a lower limit on batch sizes, so TAM tables can use a low limit if they want to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit worried about the migration path. If we have larger batches, the compressed table might not be processable using TAM (or we could not build an index). So, it might be good to know more about the requirements. What do you think @mkindahl / @erimatnor?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For TAM indexing we need to encode a value's index in a compressed batch into the TID, which has limited number of bits (and we also need block and offset in there). So, yes, there's a limit for sure. However, we could always enforce a lower batch size in TAM if we rewrite the table.

It would be interesting to know the real-world benefits of larger batch sizes.

@jnidzwetzki
Copy link
Contributor

In build_decompressor we do .decompressed_slots = palloc0(sizeof(void *) * GLOBAL_MAX_ROWS_PER_COMPRESSION),. I am wondering if this should be made more dynamic to avoid the large memory allocation for smaller batches.

@jnidzwetzki
Copy link
Contributor

Since the batches can become larger and consume more space in memory, does the cost model for the sorted merged optimization need to be adjusted as well? What about batch decompression and memory limits?

@akuzm
Copy link
Member Author

akuzm commented Mar 11, 2024

Since the batches can become larger and consume more space in memory, does the cost model for the sorted merged optimization need to be adjusted as well? What about batch decompression and memory limits?

This PR doesn't even allow bigger batches, so this is mostly for the future consideration. But yes, I think when we make the size configurable, we'll have to look at the batch size configured for a given table, instead of the upper limit. For now, I'll double-check the costs, I think I forgot to update the max size with current target size there.

In build_decompressor we do .decompressed_slots = palloc0(sizeof(void *) * GLOBAL_MAX_ROWS_PER_COMPRESSION),. I am wondering if this should be made more dynamic to avoid the large memory allocation for smaller batches.

Good point, I'll change it to use the target compressed batch size instead of the max allowed size.

@akuzm akuzm enabled auto-merge (squash) March 12, 2024 10:57
@akuzm akuzm disabled auto-merge March 12, 2024 10:57
@jnidzwetzki
Copy link
Contributor

Rest LGTM, will approve as soon as the question regarding TAM has been solved.

@akuzm
Copy link
Member Author

akuzm commented Mar 12, 2024

Rest LGTM, will approve as soon as the question regarding TAM has been solved.

The main question with TAM is whether it can provide meaningful improvement of customer use cases at a reasonable engineering cost, compared e.g. to the existing sparse index implementation. This doesn't seem certain at the moment. We should demonstrate this, publish a design for index support and so on. Until this is done, and at the moment we don't even have a finalized design for TAM index support, I don't think we should be blocking other work based on the remote possibility that it might have conflicts with a purported TAM index implementation.

I am a bit worried about the migration path. If we have larger batches, the compressed table might not be processable using TAM (or we could not build an index).

If we make the batch size configurable, the migration path would be to recompress all chunks with the smaller batch size. Probalby we're going to apply TAM per-chunk, for the same reason we made the settings per-chunk. So I think it shouldn't be a big problem.

It would be interesting to know the real-world benefits of larger batch sizes.

There is some benefit, e.g. vectorized aggregation can be 2x faster with 4k batches. This needs more careful benchmarking if we decide to increase the default batch size, because for small limit queries it can actually slow them down. We might require a cost-based choice of iterator-based vs bulk decompression, which is not very easy to implement correctly. As for the other products, IIRC DuckDB uses 2k batches and ClickHouse uses 65k.

Here's some old benchmark I did: https://grafana.ops.savannah-dev.timescale.com/d/fasYic_4z/compare-akuzm?orgId=1&var-branch=All&var-run1=3263&var-run2=3269&var-threshold=0.02&var-use_historical_thresholds=true&var-threshold_expression=2.5%20%2A%20percentile_cont%280.90%29&var-exact_suite_version=true

I did not measure the compression ratios, probably they also become better with larger batches, especially for json or dictionary-compressed text.

But mostly I was thinking about smaller sizes that are multiples of an hour/minute, e.g. 720 or 1440 or 3600. I remember Rob asked for this on Slack, can't find this now.

@akuzm akuzm enabled auto-merge (squash) March 14, 2024 09:46
@akuzm akuzm merged commit ae9b83b into timescale:main Mar 14, 2024
53 of 55 checks passed
@akuzm akuzm deleted the rows branch March 14, 2024 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants