-
Notifications
You must be signed in to change notification settings - Fork 879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow up to INT16_MAX rows per batch #6734
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6734 +/- ##
==========================================
+ Coverage 80.06% 80.97% +0.90%
==========================================
Files 190 191 +1
Lines 37181 36383 -798
Branches 9450 9447 -3
==========================================
- Hits 29770 29462 -308
- Misses 2997 3162 +165
+ Partials 4414 3759 -655 ☔ View full report in Codecov by Sentry. |
int ts_guc_debug_toast_tuple_target = 128; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need a guc for this for testing, we can just set the corresponding option directly in tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It worked when we had inheritance, but now the compressed chunk is re-created at compression w/o inheritance, so I'm not sure it's even possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GUC is debug-only, in release builds it's just a static variable that is never changed.
* We use this limit for sanity checks in case the compressed data is corrupt. | ||
*/ | ||
#define GLOBAL_MAX_ROWS_PER_COMPRESSION 1015 | ||
#define GLOBAL_MAX_ROWS_PER_COMPRESSION INT16_MAX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if these large batches would be compatible with the TAM implementation indexes. As far as I remember, we have only a limited number of bits to address the tuples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard to tell until we have a design doc at least. This refactoring doesn't really enforce a lower limit on batch sizes, so TAM tables can use a low limit if they want to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit worried about the migration path. If we have larger batches, the compressed table might not be processable using TAM (or we could not build an index). So, it might be good to know more about the requirements. What do you think @mkindahl / @erimatnor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For TAM indexing we need to encode a value's index in a compressed batch into the TID, which has limited number of bits (and we also need block and offset in there). So, yes, there's a limit for sure. However, we could always enforce a lower batch size in TAM if we rewrite the table.
It would be interesting to know the real-world benefits of larger batch sizes.
In |
Since the batches can become larger and consume more space in memory, does the cost model for the sorted merged optimization need to be adjusted as well? What about batch decompression and memory limits? |
This PR doesn't even allow bigger batches, so this is mostly for the future consideration. But yes, I think when we make the size configurable, we'll have to look at the batch size configured for a given table, instead of the upper limit. For now, I'll double-check the costs, I think I forgot to update the max size with current target size there.
Good point, I'll change it to use the target compressed batch size instead of the max allowed size. |
Rest LGTM, will approve as soon as the question regarding TAM has been solved. |
The main question with TAM is whether it can provide meaningful improvement of customer use cases at a reasonable engineering cost, compared e.g. to the existing sparse index implementation. This doesn't seem certain at the moment. We should demonstrate this, publish a design for index support and so on. Until this is done, and at the moment we don't even have a finalized design for TAM index support, I don't think we should be blocking other work based on the remote possibility that it might have conflicts with a purported TAM index implementation.
If we make the batch size configurable, the migration path would be to recompress all chunks with the smaller batch size. Probalby we're going to apply TAM per-chunk, for the same reason we made the settings per-chunk. So I think it shouldn't be a big problem.
There is some benefit, e.g. vectorized aggregation can be 2x faster with 4k batches. This needs more careful benchmarking if we decide to increase the default batch size, because for small limit queries it can actually slow them down. We might require a cost-based choice of iterator-based vs bulk decompression, which is not very easy to implement correctly. As for the other products, IIRC DuckDB uses 2k batches and ClickHouse uses 65k. Here's some old benchmark I did: https://grafana.ops.savannah-dev.timescale.com/d/fasYic_4z/compare-akuzm?orgId=1&var-branch=All&var-run1=3263&var-run2=3269&var-threshold=0.02&var-use_historical_thresholds=true&var-threshold_expression=2.5%20%2A%20percentile_cont%280.90%29&var-exact_suite_version=true I did not measure the compression ratios, probably they also become better with larger batches, especially for json or dictionary-compressed text. But mostly I was thinking about smaller sizes that are multiples of an hour/minute, e.g. 720 or 1440 or 3600. I remember Rob asked for this on Slack, can't find this now. |
We're getting requests to make the number of rows per batch configurable, so this PR makes some refactoring to make that easier. E.g. it removes some large stack allocations and change width of variables to accomodate this. It doesn't actually add an option to change the compressed batch size yet.
The new potential upper limit is INT16_MAX rows per batch, because dictionary compression uses signed int16 indexes. The actual target compressed batch size is kept hardcoded at 1000 rows for now.
I also had to make some tweaks to memory management to avoid performance regressions, but after that there are no changes in tsbench that I could reproduce: https://grafana.ops.savannah-dev.timescale.com/d/fasYic_4z/compare-akuzm?orgId=1&var-branch=All&var-run1=3331&var-run2=3341&var-threshold=0&var-use_historical_thresholds=true&var-threshold_expression=2.5%20%2A%20percentile_cont%280.90%29&var-exact_suite_version=false
Disable-check: force-changelog-file