Dramatically improve postgresql performance #323

sambhav · 2024-09-20T22:23:16Z

Contributors

Co-Authored by @adriangonz who paired with me on this change and also tested/benched all the changes 🎉

This builds on top of the great work done in #151 by @moio (included as a co-author on this PR since it builds on top of the ideas introduced in the said PR)

Benchmarks

For a paginanted API Server list query (which uses CountRevision and List with a filter query) we were able to see speed-ups from 70+ seconds to 2 seconds (35x speed-up) for a kine DB with 1M+ resources that match the list query. The speed-up might be larger for larger kine DBs.

Individually we were able to bring the count query down to 1.5s from ~40s and the list query down to 0.5s from ~30s on our test bench.

This is fairly significant as it makes kine usable/scalable at such high volume of objects.

Without the above changes we would usually see timeouts and context cancelled errors most of the times.

Changes

Datatype Change

In Postgres, text tends to give better performance and use less space. When using varchar,

Values get padded to the max size (whereas text values can be sparse).
Each update / insert has an extra check to validate the length of the value.
The query planner casts varchar entries to text anyway.

The main benefit provided by varchar is making sure that all values are shorter than N characters. However, in the case of kine, that's already validated upstream by kube-apiserver.

See the Postgres docs for more info.

Collate Change

When using the C.UTF-8 locale (which comes as default on many Postgres setups), indices working on text (or varchar) columns can't be used for LIKE operations. The workaround in this case is to create the index with a text_pattern_ops operator. However, the resulting index can't then be used for < / > queries. It's possible to work around this second issue by creating a second index that uses the default operator for the text column, however that then introduces unnecessary overhead.

Instead, a simpler fix is to change the collation of the name column to use the C locale. This means that the strings saved in this column are treated as ASCII values (i.e. each character is a byte), which in turn lets queries use indices for all operations. Since the values in the name column will be a concatenation of DNS-like segments, we shouldn't get any UTF-8 values there.

See the Postgres docs on text indices for more info. This SO answer a lot of useful context around this issue.

Note: This might also explain the past MySQL v/s PostgreSQL differences that have been observed in kine as MySQL schema stores the name as an ASCII column.

kine/pkg/drivers/mysql/mysql.go

Line 33 in 70a99f8

name VARCHAR(630) CHARACTER SET ascii,

Query changes

Most of the original List/Count queries are derived from #151 and have been adapted to support the new CountCurrent and CountRevision. There were also some minor bugs in the original PR WRT to the order of filter queries which has been fixed.

Even without the query changes, the PR achieves a significant speed up with the existing queries as all of them start using the indices for name in the LIKE and > queries.

Co-authored-by: Adrian Gonzalez-Martin <agonzalezma5@bloomberg.net> Co-authored-by: Adrian Gonzalez-Martin <adrian.gonz.mar@gmail.com> Co-authored-by: Sambhav Kothari <skothari44@bloomberg.net> Co-authored-by: Silvio Moioli <moio@suse.com> Signed-off-by: Sambhav Kothari <skothari44@bloomberg.net>

brandond

Thanks! One nit on this, and a question: Does this require all servers to be upgraded to a new release of Kine before the schema can be changed, or is this change safe to make without updating all the nodes?

pkg/drivers/pgsql/pgsql.go

Signed-off-by: Sambhav Kothari <skothari44@bloomberg.net>

sambhav · 2024-09-20T22:46:52Z

Does this require all servers to be upgraded to a new release of Kine before the schema can be changed, or is this change safe to make without updating all the nodes?

It is safe to make the change without updating all the nodes. The queries are b/w compatible since we didn't create any new columns.

With the new schema, even if we do not change the queries, the existing queries get significantly faster due to the index usage. Note, that it may take some time to rebuild the indices after the name schema change depending on the db size.

sambhav · 2024-09-20T23:11:34Z

Looks like cockroachdb does not support per column collations?

brandond · 2024-09-20T23:13:25Z

Yeah, was just noticing that too. I'm not sure how to best handle that, as we currently expect cockroachdb to support the same SQL features as postgres.

sambhav · 2024-09-20T23:16:20Z

Actually on a closer look it looks like it supports column collation, it just does not support C as a collation?

brandond · 2024-09-20T23:18:33Z

It looks like it's using golang.org/x/text/language and ends up calling something like v, err := language.Parse("C")?

sambhav · 2024-09-20T23:19:47Z

I might end up doing something along the lines of

select version();

and changing the column type based on whether that contains cockroach vs postgres. Is that reasonable? For unknown version strings, I will just assume postgres.

brandond · 2024-09-20T23:26:37Z

Yeah, worth a try I guess? I don't know why they decided to parse that as a BCP47 language tag instead of a collation, that seems pretty broken. Not the first weird decision that project has made though.

sambhav · 2024-09-20T23:33:14Z

Ok, hopefully my little hack works as expected.

brandond · 2024-09-20T23:40:08Z

You might consider opening an issue with cockroachdb, maybe they just want to special-case the C collation? The end result should just be bytewise string comparison.

sambhav · 2024-09-20T23:40:40Z

Some comparisons from PR perf tests -

List Before

[postgres-12.16] [PERF] 0.034 average postgres-12.16 request duration (seconds): list configmaps
[postgres-12.16] [PERF] 0.005 [ 32] █▌
[postgres-12.16] [PERF] 0.025 [ 296] ██████████████▏
[postgres-12.16] [PERF] 0.05 [1045] ██████████████████████████████████████████████████
[postgres-12.16] [PERF] 0.1 [ 212] ██████████▏

List After

[postgres-12.16] [PERF] 0.027 average postgres-12.16 request duration (seconds): list configmaps
[postgres-12.16] [PERF] 0.005 [ 35] █▉
[postgres-12.16] [PERF] 0.025 [560] █████████████████████████████
[postgres-12.16] [PERF] 0.05 [964] ██████████████████████████████████████████████████
[postgres-12.16] [PERF] 0.1 [ 16] ▉

As mentioned in the PR, the gap gets wider and wider as we add more items, but we can still see that most of the queries now complete in under 0.05s (p99). Previously even the p90 was 0.1s.

sambhav · 2024-09-21T00:03:57Z

Hmm, not sure why this is still happening, let me check

building kine: ERROR: at or near ",": syntax error: invalid locale C: language: tag is not well-formed (SQLSTATE 42601)

Signed-off-by: Sambhav Kothari <skothari44@bloomberg.net>

sambhav · 2024-09-21T00:46:34Z

Ok, looks like we are all good 🎉 I will deal with the cockroachdb issue creation later.

Signed-off-by: Sambhav Kothari <skothari44@bloomberg.net>

moio

Thanks for picking up where I left this and taking it over the finish line 🏁

sambhav · 2024-09-23T21:08:53Z

Thanks @brandond for the merge. Can you also cut a release please? Should this be 0.13.0 given the schema migrations needed?

brandond · 2024-09-24T00:04:51Z

I was waiting to merge another PR, but yes we can tag a 0.13.

sambhav requested a review from a team as a code owner September 20, 2024 22:23

brandond requested changes Sep 20, 2024

View reviewed changes

pkg/drivers/pgsql/pgsql.go Outdated Show resolved Hide resolved

Add an empty mysql migration

a163888

Signed-off-by: Sambhav Kothari <skothari44@bloomberg.net>

brandond approved these changes Sep 20, 2024

View reviewed changes

sambhav force-pushed the faster-pg-queries branch 2 times, most recently from ffdad1b to 0228965 Compare September 20, 2024 23:33

sambhav force-pushed the faster-pg-queries branch from 0228965 to e6208eb Compare September 20, 2024 23:33

brandond approved these changes Sep 20, 2024

View reviewed changes

sambhav force-pushed the faster-pg-queries branch from e6208eb to 163f906 Compare September 20, 2024 23:43

Add b/w compat for cockroachdb

4002bb9

Signed-off-by: Sambhav Kothari <skothari44@bloomberg.net>

sambhav force-pushed the faster-pg-queries branch from 163f906 to 4002bb9 Compare September 21, 2024 00:09

Add appropriate collation migration

08eaa2e

Signed-off-by: Sambhav Kothari <skothari44@bloomberg.net>

sambhav force-pushed the faster-pg-queries branch from da3426b to 08eaa2e Compare September 21, 2024 09:41

moio approved these changes Sep 23, 2024

View reviewed changes

vitorsavian approved these changes Sep 23, 2024

View reviewed changes

brandond merged commit 47d7636 into k3s-io:master Sep 23, 2024
3 checks passed

sambhav deleted the faster-pg-queries branch September 23, 2024 18:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dramatically improve postgresql performance #323

Dramatically improve postgresql performance #323

sambhav commented Sep 20, 2024 •

edited

Loading

brandond left a comment •

edited

Loading

sambhav commented Sep 20, 2024 •

edited

Loading

sambhav commented Sep 20, 2024

brandond commented Sep 20, 2024

sambhav commented Sep 20, 2024

brandond commented Sep 20, 2024 •

edited

Loading

sambhav commented Sep 20, 2024

brandond commented Sep 20, 2024

sambhav commented Sep 20, 2024

brandond commented Sep 20, 2024

sambhav commented Sep 20, 2024 •

edited

Loading

sambhav commented Sep 21, 2024

sambhav commented Sep 21, 2024

moio left a comment

sambhav commented Sep 23, 2024

brandond commented Sep 24, 2024

Dramatically improve postgresql performance #323

Dramatically improve postgresql performance #323

Conversation

sambhav commented Sep 20, 2024 • edited Loading

Contributors

Benchmarks

For a paginanted API Server list query (which uses CountRevision and List with a filter query) we were able to see speed-ups from 70+ seconds to 2 seconds (35x speed-up) for a kine DB with 1M+ resources that match the list query. The speed-up might be larger for larger kine DBs.

Changes

Datatype Change

Collate Change

Query changes

brandond left a comment • edited Loading

Choose a reason for hiding this comment

sambhav commented Sep 20, 2024 • edited Loading

sambhav commented Sep 20, 2024

brandond commented Sep 20, 2024

sambhav commented Sep 20, 2024

brandond commented Sep 20, 2024 • edited Loading

sambhav commented Sep 20, 2024

brandond commented Sep 20, 2024

sambhav commented Sep 20, 2024

brandond commented Sep 20, 2024

sambhav commented Sep 20, 2024 • edited Loading

List Before

List After

sambhav commented Sep 21, 2024

sambhav commented Sep 21, 2024

moio left a comment

Choose a reason for hiding this comment

sambhav commented Sep 23, 2024

brandond commented Sep 24, 2024

sambhav commented Sep 20, 2024 •

edited

Loading

brandond left a comment •

edited

Loading

sambhav commented Sep 20, 2024 •

edited

Loading

brandond commented Sep 20, 2024 •

edited

Loading

sambhav commented Sep 20, 2024 •

edited

Loading