-
Notifications
You must be signed in to change notification settings - Fork 879
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implement SkipScan to speed up SELECT DISTINCT
This patch implements a skip-scan; an optimization for SELECT DISTINCT ON. Usually for SELECT DISTINCT ON postgres will plan either a UNIQUE over a sorted path, or some form of aggregate. In either case, it needs to scan the entire table, even in cases where there are only a few unique values. A skip-scan optimizes this case when we have an ordered index. Instead of scanning the entire table and deduplicating after, the scan remembers the last value returned, and searches the index for the next value after that one. This means that for a table with k keys, with u distinct values, a skip-scan runs in time u * log(k) as opposed to scanning then deduplicating, which takes time k. We can write the number of unique values u as of function of k by dividing by the number of repeats r i.e. u = k/r this means that a skip-scan will be faster if each key is repeated more than a logarithmic number of times, i.e. if r > log(k) then u * log(k) < k/log(k) * log(k) < k. Co-authored-by: Joshua Lockerman <josh@timescale.com>
- Loading branch information
1 parent
639aef7
commit bddcbba
Showing
29 changed files
with
13,405 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
add_subdirectory(compress_dml) | ||
add_subdirectory(decompress_chunk) | ||
add_subdirectory(gapfill) | ||
add_subdirectory(skip_scan) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
set(SOURCES | ||
${CMAKE_CURRENT_SOURCE_DIR}/planner.c | ||
${CMAKE_CURRENT_SOURCE_DIR}/exec.c | ||
) | ||
target_sources(${TSL_LIBRARY_NAME} PRIVATE ${SOURCES}) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
# SkipScan # | ||
|
||
This module implements a skip-scan; an optimization for `SELECT DISTINCT ON`. | ||
Usually for `SELECT DISTINCT ON` Postgres will plan either a `UNIQUE` over a | ||
sorted path, or some form of aggregate. In either case, it needs to scan the | ||
entire table, even in cases where there are only a few unique values. | ||
|
||
A skip-scan optimizes this case when we have an ordered index. Instead of | ||
scanning the entire table and deduplicating after, the scan remembers the last | ||
value returned, and searches the index for the next value after that one. This | ||
means that for a table with `k` keys, with `u` distinct values, a skip-scan runs | ||
in time `u * log(k)` as opposed to scanning then deduplicating, which takes time | ||
`k`. We can write the number of unique values `u` as of function of `k` by | ||
dividing by the number of repeats `r` i.e. `u = k/r` this means that a skip-scan | ||
will be faster if each key is repeated more than a logarithmic number of times, | ||
i.e. if `r > log(k)` then `u * log(k) < k/log(k) * log(k) < k`. | ||
|
||
|
||
## Implementation ## | ||
|
||
We plan our skip-scan with a tree something like | ||
|
||
```SQL | ||
Custom Scan (SkipScan) on table | ||
-> Index Scan using table_key_idx on table | ||
Index Cond: (key > NULL) | ||
``` | ||
|
||
After each iteration through the `SkipScan` we replace the `key > NULL` with | ||
a `key > [next value we are returning]` and restart the underlying `IndexScan`. | ||
There are some subtleties around `NULL` handling, see the source file for more | ||
detail. | ||
|
||
|
||
## Planning Heuristics ## | ||
|
||
To plan our SkipScan we look for a compatible plan, for instance | ||
|
||
```SQL | ||
Unique | ||
-> Index Scan | ||
``` | ||
|
||
or | ||
|
||
```SQL | ||
Unique | ||
-> Merge Append | ||
-> Index Scan | ||
... | ||
``` | ||
|
||
given such a plan, we know the index is sorted in an order with the distinct | ||
key(s) first, so we can add quals to the `IndexScan` representing the previous | ||
key returned, and thus skip over the repeated values. The `Unique` node tells us | ||
which columns are relevant. | ||
|
||
We use this to create plans that look like | ||
|
||
```SQL | ||
Unique | ||
-> Custom Scan (SkipScan) on skip_scan | ||
-> Index Scan using skip_scan_dev_name_idx on skip_scan | ||
``` | ||
|
||
or | ||
|
||
```SQL | ||
Unique | ||
-> Merge Append | ||
Sort Key: _hyper_2_1_chunk.dev_name | ||
-> Custom Scan (SkipScan) on _hyper_2_1_chunk | ||
-> Index Scan using _hyper_2_1_chunk_idx on _hyper_2_1_chunk | ||
-> Custom Scan (SkipScan) on _hyper_2_2_chunk | ||
-> Index Scan using _hyper_2_2_chunk_idx on _hyper_2_2_chunk | ||
``` | ||
|
||
respectively. | ||
|
||
## Postgres-Native Skip Scan ## | ||
|
||
Upstream postgres is also working on a skip-scan implementation, see e.g. | ||
https://commitfest.postgresql.org/32/1741/ | ||
As when this document was first written, it is not yet merged. Their strategy | ||
involves integrating this functionality into the btree searching code, | ||
and will be available in PG15 at the earliest. The two | ||
implementations should not interfere with eachother. |
Oops, something went wrong.