Add an asychronous zstd stream compression class #5565

ballard26 · 2022-07-22T03:48:36Z

Cover letter

This PR adds a new Zstd streaming interface async_stream_zstd that is different from the existing stream_zstd in a few ways.

Firstly all compression/decompression methods return futures and allow for seastar to interrupt them if they are close to hitting the reactor stall timeout. This interface change is the main justification for a new class instead of modifying the existing one as a lot of non-futurized code in v/kafka expects compress/uncompress to be a blocking call that returns the actual result right away. Hopefully we'll be able to migrate users of stream_zstd to async_stream_zstd in time.

Secondly in compress currently stream_zstd allocates a large contiguous buffer for the entire expected size for the compressed data block this has shown to cause OOM errors when the data block is too large (see #5566 ). async_stream_zstd instead of allocating a large contiguous buffer for the compression output is outputting a Zstd block at a time (128KiB max size) and then appending that block to an iobuf. This change should probably be ported over to stream_zstd as well in case any kafka API messages prove to be large enough to cause OOM errors.

Fixes #5116, #5566

Release notes

none

emaxerrno · 2022-07-22T04:48:44Z

Is the gist to use thread local storage of 2mb as the scratch space. Can the cover letter have more deets.

ballard26 · 2022-07-22T21:03:25Z

Is the gist to use thread local storage of 2mb as the scratch space. Can the cover letter have more deets.

Updated the cover letter to explain what's going on in the PR. Sorry for the delay! This new class will allocated about 4mb of scratch space for compression/decompression on each thread at the start of RP. 2mb will be for decompression and 2mb will be for compression. By doing this we can guarantee that Zstd won't internally allocate any more space during any compression/decompression operations.

dotnwat · 2022-07-24T17:58:36Z

src/v/compression/tests/zstd_tests.cc

+SEASTAR_THREAD_TEST_CASE(async_stream_zstd_test) {
+    compression::async_stream_zstd fn;
+    auto test_sizes = get_test_sizes();
+    for (size_t i : sizes) {
+        iobuf buf = gen(i);
+
+        auto cbuf = fn.compress(buf.share(0, i)).get();
+        auto dbuf = fn.uncompress(std::move(cbuf)).get();
+
+        BOOST_CHECK_EQUAL(dbuf, buf);
+    }
+}


how about also expanding zstd_stream_bench? it would be nice too if it was setup to trigger a reactor stall without the async version (now that seastar has a 25 ms threshold it should be easier to trigger). seeing the difference in performance will be interesting.

dotnwat · 2022-07-24T17:59:15Z

src/v/redpanda/application.cc

+    ss::smp::invoke_on_all([] {
+        compression::async_stream_zstd::init_workspace(
+          config::shard_local_cfg().zstd_decompress_workspace_bytes());
+    }).get0();


is this a workspace that is now separate from the non-async version? should one be replaced or can they share? if they are separate what's the future of them in terms of being combined?

It is separated since they can't be shared without a mutex. Though I suppose we could have a compress/decompress mutex both classes share. The hope is that we can replace stream_zstd with async_stream_zstd in v/kafka as which point we can remove stream_zstd.

The hope is that we can replace stream_zstd with async_stream_zstd in v/kafka as which point we can remove stream_zstd.

yeh i think this makes sense as the preferred path forward. assuming we don't find any issues with performance we can probably make that switch soon

ballard26 · 2022-07-27T20:00:57Z

Results of performance testing vs the existing stream_zstd class are as follows:

test                                      iterations      median         mad         min         max      allocs       tasks        inst
streaming_zstd_1mb.compress                     6385   130.290us     1.513us   126.861us   137.173us      16.000       0.000         0.0
streaming_zstd_1mb.uncompress                   4185    78.486us     2.100us    75.342us    80.586us     108.616       0.000         0.0
streaming_zstd_10mb.compress                     653     1.118ms    24.680us     1.041ms     1.142ms     942.830       0.000         0.0
streaming_zstd_10mb.uncompress                   429   844.631us    80.656us   760.340us   976.986us    2095.301       0.000         0.0
async_stream_zstd.1mb_compress                  7616   100.022us   262.374ns    99.547us   100.598us     148.278       2.082         0.0
async_stream_zstd.1mb_uncompress                4804    73.661us   108.821ns    73.542us    73.819us     312.587       6.156         0.0
async_stream_zstd.10mb_compress                  772   979.403us     1.092us   978.266us   985.029us    2176.762      56.239         0.0
async_stream_zstd.10mb_uncompress                490   728.405us   720.304ns   726.690us   731.632us    4089.282     109.756         0.0

Some of the conclusions that can be made are;

async_stream_zstd is a bit faster in compression than stream_zstd .This is probably due to using a statically allocated workspace vs. the dynamically allocated one that stream_zstd uses.

In all tests async_stream_zstd has more allocations than stream_zstd. This is becuase async_stream_zstd allocates lots of small fragments to store the results of compressions and decompressions. This can be mitigated if seen as an issue by having iobuf create larger fragments when appending new data.

Beyond this stream_zstd causes several seastar reactor stalls during the benchmark. While async_stream_zstd does not cause any.

emaxerrno · 2022-07-27T22:01:42Z

@ballard26 we should make this the default in our internal RPC

…end` Adding additional scheduling points in `dispatch_send` to account for the newly futurized `as_scattered` results in a race condition where messages are sent out of order. As such the easiest solution is to call `as_scattered` outside of `dispatch_send` where adding a scheduling point won't result in this race condition. Then pass the result to `dispatch_send`.

Results of performance testing vs the existing `stream_zstd` class are as follows: ``` test iterations median mad min max allocs tasks inst streaming_zstd_1mb.compress 6385 130.290us 1.513us 126.861us 137.173us 16.000 0.000 0.0 streaming_zstd_1mb.uncompress 4185 78.486us 2.100us 75.342us 80.586us 108.616 0.000 0.0 streaming_zstd_10mb.compress 653 1.118ms 24.680us 1.041ms 1.142ms 942.830 0.000 0.0 streaming_zstd_10mb.uncompress 429 844.631us 80.656us 760.340us 976.986us 2095.301 0.000 0.0 async_stream_zstd.1mb_compress 7616 100.022us 262.374ns 99.547us 100.598us 148.278 2.082 0.0 async_stream_zstd.1mb_uncompress 4804 73.661us 108.821ns 73.542us 73.819us 312.587 6.156 0.0 async_stream_zstd.10mb_compress 772 979.403us 1.092us 978.266us 985.029us 2176.762 56.239 0.0 async_stream_zstd.10mb_uncompress 490 728.405us 720.304ns 726.690us 731.632us 4089.282 109.756 0.0 ```

…itions

travisdowns · 2022-12-01T19:59:52Z

LGTM, thanks for all the changes!

ballard26 · 2022-12-01T22:19:29Z

rptest.tests.schema_registry_test.test_delete_subject_version test failure is #6903 .

BenPope · 2022-12-19T11:49:41Z

Is this worth backporting?

piyushredpanda · 2022-12-19T14:30:10Z

Might be a lot of effort (it's a large-ish PR and might not be easy backport) and I would prefer we rather recommended users to upgrade, honestly, @BenPope

BenPope · 2022-12-19T14:40:18Z

I understand. This isn't in any released version, so that advice will have to wait until at least v23.1.1.

piyushredpanda · 2022-12-19T14:48:01Z

Ah I see what you mean, it isn't backported even to v22.3.x. Yeah, that we might want to do. @ballard26 ?

ballard26 · 2022-12-19T20:42:41Z

/backport v22.3.x

github-actions bot added the area/redpanda label Jul 22, 2022

ballard26 force-pushed the async-compression branch 2 times, most recently from ad24b22 to d7487ed Compare July 22, 2022 03:59

ballard26 requested review from dotnwat and jcsp July 22, 2022 04:02

mmedenjak added kind/enhance New feature or request performance labels Jul 22, 2022

ballard26 mentioned this pull request Jul 22, 2022

OOM when allocating large zstd buffer #5566

Closed

ballard26 linked an issue Jul 22, 2022 that may be closed by this pull request

OOM when allocating large zstd buffer #5566

Closed

ballard26 closed this Jul 22, 2022

ballard26 reopened this Jul 22, 2022

dotnwat reviewed Jul 24, 2022

View reviewed changes

ballard26 force-pushed the async-compression branch 2 times, most recently from 78b6baf to 59dbe87 Compare July 27, 2022 19:55

ballard26 force-pushed the async-compression branch from 59dbe87 to 773ae30 Compare September 27, 2022 03:19

github-actions bot added area/build area/k8s area/rpk labels Sep 27, 2022

ballard26 force-pushed the async-compression branch from 773ae30 to 228345d Compare September 27, 2022 20:29

github-actions bot removed area/k8s area/build area/rpk labels Sep 27, 2022

ballard26 force-pushed the async-compression branch 4 times, most recently from acb8c62 to ee1fc49 Compare October 4, 2022 17:18

ballard26 requested a review from r-vasquez as a code owner November 30, 2022 23:54

github-actions bot added area/k8s area/rpk labels Nov 30, 2022

ballard26 force-pushed the async-compression branch from db5bea6 to 3e6512f Compare November 30, 2022 23:56

github-actions bot removed area/k8s area/rpk labels Nov 30, 2022

ballard26 requested a review from travisdowns November 30, 2022 23:56

ballard26 added 11 commits December 1, 2022 14:17

v/compress: add async zstd class

9716f66

rpc/simple_protocol: coroutinize send_reply

e938dcb

rpc/netbuf: refactored as_scattered to be asynchronous

af1a079

rpc/netbuf: switch as_scattered to use async_stream_zstd

ae838bb

rpc/parse_utils: switch from stream_zstd to async_stream_zstd

1cee0bf

redpanda/application: init workspace for async_stream_zstd

2a00125

rpc/transport: add comment to dispatch_send warning about race cond…

12349cc

…itions

v/utils: add object_pool class

2922bd5

compression/async_stream_zstd: switch to using object pool

13223b6

ballard26 force-pushed the async-compression branch from 3e6512f to 13223b6 Compare December 1, 2022 19:17

travisdowns approved these changes Dec 1, 2022

View reviewed changes

dotnwat merged commit 7a3a924 into redpanda-data:dev Dec 1, 2022

This was referenced Dec 19, 2022

[v22.3.x] Reactor stall in rpc::netbuf::as_scattered with 40k partitions #7849

Closed

[v22.3.x] Add an asychronous zstd stream compression class #7850

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an asychronous zstd stream compression class #5565

Add an asychronous zstd stream compression class #5565

ballard26 commented Jul 22, 2022 •

edited

Loading

emaxerrno commented Jul 22, 2022

ballard26 commented Jul 22, 2022

dotnwat Jul 24, 2022

ballard26 Jul 24, 2022

dotnwat Jul 24, 2022

ballard26 Jul 24, 2022

dotnwat Jul 24, 2022

ballard26 commented Jul 27, 2022 •

edited

Loading

emaxerrno commented Jul 27, 2022

travisdowns commented Dec 1, 2022

ballard26 commented Dec 1, 2022 •

edited

Loading

BenPope commented Dec 19, 2022

piyushredpanda commented Dec 19, 2022

BenPope commented Dec 19, 2022

piyushredpanda commented Dec 19, 2022

ballard26 commented Dec 19, 2022

Add an asychronous zstd stream compression class #5565

Add an asychronous zstd stream compression class #5565

Conversation

ballard26 commented Jul 22, 2022 • edited Loading

Cover letter

Release notes

emaxerrno commented Jul 22, 2022

ballard26 commented Jul 22, 2022

dotnwat Jul 24, 2022

Choose a reason for hiding this comment

ballard26 Jul 24, 2022

Choose a reason for hiding this comment

dotnwat Jul 24, 2022

Choose a reason for hiding this comment

ballard26 Jul 24, 2022

Choose a reason for hiding this comment

dotnwat Jul 24, 2022

Choose a reason for hiding this comment

ballard26 commented Jul 27, 2022 • edited Loading

emaxerrno commented Jul 27, 2022

travisdowns commented Dec 1, 2022

ballard26 commented Dec 1, 2022 • edited Loading

BenPope commented Dec 19, 2022

piyushredpanda commented Dec 19, 2022

BenPope commented Dec 19, 2022

piyushredpanda commented Dec 19, 2022

ballard26 commented Dec 19, 2022

ballard26 commented Jul 22, 2022 •

edited

Loading

ballard26 commented Jul 27, 2022 •

edited

Loading

ballard26 commented Dec 1, 2022 •

edited

Loading