[improve] Handle PositionInfo that's too large to serialize as a single entry #22799

dlg99 · 2024-05-29T23:08:02Z

Motivation

In some cases cursor position info can be too large to serialize as a single entry, e.g. in case of too many deleted ranges. Also the serialization can be too slow.

cherry-picks of changes by @eolivelli @nicoloboschi and I.

Modifications

Cursor PositionInfo serialization is reworked to produce less garbage/serialize faster; serialized data can be compressed.
In case the serialized data too large it is chunked and saved as a sequence of entries.

Verifying this change

Make sure that the change passes the CI checks.

This change added tests.

Does this pull request potentially affect one of the following parts:

NO

If the box was checked, please highlight the changes

Documentation

doc
doc-required
doc-not-needed
doc-complete

Matching PR in forked repository

PR in forked repository: dlg99#17

nicoloboschi

LGTM

merlimat

Since this is touching on-disk formats it would be good to have a PIP discussion.

lhotari · 2024-08-16T10:00:03Z

I wonder if this is related to PIP-81 which doesn't seem to have an implementation: https://github.com/apache/pulsar/wiki/PIP-81%3A-Split-the-individual-acknowledgments-into-multiple-entries

There was a larger PR for PIP-81 that was closed: #10729
Some parts of it were split, such as #15425 and #15607.
@codelipenghui @315157973 any details about future PIP-81 plans to share?

lhotari · 2024-08-16T10:04:19Z

Btw. I'm currently investigating a Key_Shared subscription type issue where ordinary consumption of message leads to a very large number of "ack holes". The WIP test app where this is reproduced is https://github.com/lhotari/pulsar-playground/blob/master/src/main/java/com/github/lhotari/pulsar/playground/TestScenarioIssueKeyShared.java .
The test class is not yet simplified to contain the relevant parts. I started with a very complex test case and it seems that the "ack hole" problem shows up in all possible cases.

No messages get lost. It's just that some messages don't get delivered until all other messages have been processed.

lhotari · 2024-08-16T13:15:49Z

In the 1M message experiment, the number of ack holes goes down from about 150k ack holes to <500 with this experiment: lhotari@a3b0639

lhotari · 2024-08-27T13:53:23Z

It's possible that the root cause of this issue of large PositionInfo is #23200 and it is addressed with PRs #23231 and #23226. There's #23224 for observability. Large msgInReplay counts would confirm the root cause.

rdhabalia · 2024-09-20T23:48:45Z

this is a real problem and it has been solved with a simple and fundamentally proven solution with perf numbers : #9292

But again I am not sure some folks blocked this PR without saying the reason even after asking multiple times and blocked the progress on this PR.

315157973 · 2024-09-21T10:00:28Z

I wonder if this is related to PIP-81 which doesn't seem to have an implementation: https://github.com/apache/pulsar/wiki/PIP-81%3A-Split-the-individual-acknowledgments-into-multiple-entries

There was a larger PR for PIP-81 that was closed: #10729 Some parts of it were split, such as #15425 and #15607. @codelipenghui @315157973 any details about future PIP-81 plans to share?

Since a PR implemented the compression of PositionInfo, the size of PositionInfo can be greatly reduced, and the problem of Entry size exceeding the threshold will no longer occur, so this PIP was not further promoted.

lhotari

Great work Andrey!

If there's way to refactor the logic to avoid byte[] and use Netty ByteBuf when possible, the solution would be more aligned with the "no garbage" style in Pulsar.

lhotari · 2024-09-23T05:55:51Z

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java

-                if (log.isDebugEnabled()) {
-                    log.debug("[{}} readComplete rc={} entryId={}", ledger.getName(), rc1, lh1.getLastAddConfirmed());
+            LedgerEntry entry = seq.nextElement();
+            byte[] data = entry.getEntry();


Please replace this with the use of getEntryBuffer so that Netty ByteBuf is used instead of byte[].
Large arrays adds significant GC overhead and that's why Netty ByteBuf is preferred.
It's better to refactor the logic to operate with Netty ByteBufs.

lhotari · 2024-09-23T06:04:35Z

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java

+        lh.asyncReadEntries(startPos, endPos, new AsyncCallback.ReadCallback() {
+            @Override
+            public void readComplete(int rc, LedgerHandle lh, Enumeration<LedgerEntry> entries, Object ctx) {
+                ByteArrayOutputStream buffer = new ByteArrayOutputStream();


ByteArrayOutputStream adds a lot of GC overhead compared to the usage of Netty ByteBufs.
Please refactor this to use Netty ByteBufs instead of using ByteArrayOutputStream.

(btw. There are also a lot of gotchas when using Netty ByteBufs, I happened to learn quite a few when working on optimizing https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/stats/prometheus/PrometheusMetricsGenerator.java . The gotchas mainly apply when generating very huge response buffers like it's the case with metrics. The prometheus metrics results might be 500MB of text in certain worst cases with topic level metrics enabled in brokers. In this case we wouldn't have to be concerned about those challenges with Netty ByteBufs since I guess the size of the output isn't in that range.)

lhotari · 2024-09-23T06:06:55Z

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java

+        }
+    }
+
+    static byte[] decompressDataIfNeeded(byte[] data, LedgerHandle lh) {


Refactor this to use Netty ByteBufs instead of byte[]

lhotari · 2024-09-23T06:39:11Z

I wonder if this is related to PIP-81 which doesn't seem to have an implementation: https://github.com/apache/pulsar/wiki/PIP-81%3A-Split-the-individual-acknowledgments-into-multiple-entries
There was a larger PR for PIP-81 that was closed: #10729 Some parts of it were split, such as #15425 and #15607. @codelipenghui @315157973 any details about future PIP-81 plans to share?

Since a PR implemented the compression of PositionInfo, the size of PositionInfo can be greatly reduced, and the problem of Entry size exceeding the threshold will no longer occur, so this PIP was not further promoted.

@315157973 Are you referring to PIP-146: ManagedCursorInfo compression
or ManagedLedgerInfo compression (does that contain a PIP?) ? What if the size exceeds the threshold after compression?

(cherry picked from commit 1ef9664)

* serialize/compress without intermediate byte arrays * use lightproto for cursor serialization to the ledger * Reuse PositionInfo (cherry picked from commit 1887c44)

(cherry picked from commit 98a3d25)

* ManagedCursor: manually serialise PositionInfo * Add tests and save last serialized side to prevent reallocations (cherry picked from commit 8a365d0)

(cherry picked from commit 44ba614)

(cherry picked from commit f1323c6)

(cherry picked from commit d4b94ab)

(cherry picked from commit 5f07f0c)

(cherry picked from commit 6d2e494)

…pache#275) (cherry picked from commit 6a2a010)

(cherry picked from commit 4c5387d)

(cherry picked from commit c3fe80e)

…footer of the chunked data (apache#282) (cherry picked from commit 6e72ecb)

codecov-commenter · 2024-09-24T01:31:51Z

Codecov Report

Attention: Patch coverage is 38.98757% with 687 lines in your changes missing coverage. Please review.

Project coverage is 74.28%. Comparing base (bbc6224) to head (1397faf).
Report is 599 commits behind head on master.

Files with missing lines	Patch %	Lines
...che/bookkeeper/mledger/impl/PositionInfoUtils.java	26.32%	597 Missing and 16 partials ⚠️
...che/bookkeeper/mledger/impl/ManagedCursorImpl.java	74.00%	56 Missing and 16 partials ⚠️
...e/bookkeeper/mledger/impl/LedgerMetadataUtils.java	50.00%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #22799      +/-   ##
============================================
+ Coverage     73.57%   74.28%   +0.70%     
- Complexity    32624    34457    +1833     
============================================
  Files          1877     1935      +58     
  Lines        139502   146033    +6531     
  Branches      15299    15998     +699     
============================================
+ Hits         102638   108477    +5839     
- Misses        28908    29247     +339     
- Partials       7956     8309     +353

Flag	Coverage Δ
inttests	`27.56% <20.78%> (+2.98%)`	⬆️
systests	`24.59% <19.53%> (+0.26%)`	⬆️
unittests	`73.62% <38.98%> (+0.78%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
.../apache/bookkeeper/mledger/impl/MetaStoreImpl.java	`86.34% <100.00%> (+0.43%)`	⬆️
...e/bookkeeper/mledger/impl/LedgerMetadataUtils.java	`91.66% <50.00%> (-8.34%)`	⬇️
...che/bookkeeper/mledger/impl/ManagedCursorImpl.java	`79.05% <74.00%> (-0.25%)`	⬇️
...che/bookkeeper/mledger/impl/PositionInfoUtils.java	`26.32% <26.32%> (ø)`

... and 602 files with indirect coverage changes

dlg99 added the ready-to-test label May 29, 2024

dlg99 requested a review from eolivelli May 29, 2024 23:08

github-actions bot added the doc-not-needed Your PR changes do not impact docs label May 29, 2024

nicoloboschi approved these changes May 30, 2024

View reviewed changes

merlimat requested changes May 31, 2024

View reviewed changes

dlg99 force-pushed the cpick/cursor-large-state branch from 6fd14cc to 319ad5f Compare September 9, 2024 21:18

dlg99 force-pushed the cpick/cursor-large-state branch from 319ad5f to 1c405fe Compare September 16, 2024 22:48

dlg99 mentioned this pull request Sep 20, 2024

[improve][pip] PIP-381: Handle large PositionInfo state #23328

Open

4 tasks

lhotari reviewed Sep 23, 2024

View reviewed changes

eolivelli and others added 14 commits September 23, 2024 15:53

ManagedCursor: compress data written to BookKeeper

e014356

(cherry picked from commit 1ef9664)

serialize/compress without intermediate byte arrays (apache#268)

33e4c71

* serialize/compress without intermediate byte arrays * use lightproto for cursor serialization to the ledger * Reuse PositionInfo (cherry picked from commit 1887c44)

Print time

6d1b93a

(cherry picked from commit 98a3d25)

ManagedCursor: manually serialise PositionInfo (apache#270)

568d446

* ManagedCursor: manually serialise PositionInfo * Add tests and save last serialized side to prevent reallocations (cherry picked from commit 8a365d0)

Fix PositionInfoUtilsTest

0240250

(cherry picked from commit 44ba614)

PositionInfo Util serialization fix and test (apache#272)

ed8df4d

(cherry picked from commit f1323c6)

Remove auto reset of cursor in case of read error

c2f0908

(cherry picked from commit d4b94ab)

Revert removal of 'containsKey' in ManagedCursorImpl

27152ff

(cherry picked from commit 5f07f0c)

Prevent ZK connection loss in case of huge cursor status (apache#273)

564a668

(cherry picked from commit 6d2e494)

[managed-ledger] Compressed cursors: fix problem with little buffers (a…

0d23d5b

…pache#275) (cherry picked from commit 6a2a010)

[tests] Fix build after merge conflict

08af8fc

(cherry picked from commit 4c5387d)

Fix WriteCursorLedgerSize metric

e8d3930

(cherry picked from commit c3fe80e)

try ledger recovery from previous entries in case of corrupt/missing …

89adf38

…footer of the chunked data (apache#282) (cherry picked from commit 6e72ecb)

fix boken test after merge/resolve

43a5b31

dlg99 added 2 commits September 23, 2024 15:54

post-rebase fixes

8db72f4

removed usage of byte[] where possible

1397faf

dlg99 force-pushed the cpick/cursor-large-state branch from 1c405fe to 1397faf Compare September 24, 2024 00:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[improve] Handle PositionInfo that's too large to serialize as a single entry #22799

[improve] Handle PositionInfo that's too large to serialize as a single entry #22799

dlg99 commented May 29, 2024

nicoloboschi left a comment

merlimat left a comment

lhotari commented Aug 16, 2024

lhotari commented Aug 16, 2024

lhotari commented Aug 16, 2024

lhotari commented Aug 27, 2024 •

edited

Loading

rdhabalia commented Sep 20, 2024

315157973 commented Sep 21, 2024

lhotari left a comment

lhotari Sep 23, 2024

lhotari Sep 23, 2024

lhotari Sep 23, 2024

lhotari commented Sep 23, 2024 •

edited

Loading

codecov-commenter commented Sep 24, 2024

[improve] Handle PositionInfo that's too large to serialize as a single entry #22799

Are you sure you want to change the base?

[improve] Handle PositionInfo that's too large to serialize as a single entry #22799

Conversation

dlg99 commented May 29, 2024

Motivation

Modifications

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Matching PR in forked repository

nicoloboschi left a comment

Choose a reason for hiding this comment

merlimat left a comment

Choose a reason for hiding this comment

lhotari commented Aug 16, 2024

lhotari commented Aug 16, 2024

lhotari commented Aug 16, 2024

lhotari commented Aug 27, 2024 • edited Loading

rdhabalia commented Sep 20, 2024

315157973 commented Sep 21, 2024

lhotari left a comment

Choose a reason for hiding this comment

lhotari Sep 23, 2024

Choose a reason for hiding this comment

lhotari Sep 23, 2024

Choose a reason for hiding this comment

lhotari Sep 23, 2024

Choose a reason for hiding this comment

lhotari commented Sep 23, 2024 • edited Loading

codecov-commenter commented Sep 24, 2024

Codecov Report

lhotari commented Aug 27, 2024 •

edited

Loading

lhotari commented Sep 23, 2024 •

edited

Loading