-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improve] Handle PositionInfo that's too large to serialize as a single entry #22799
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is touching on-disk formats it would be good to have a PIP discussion.
I wonder if this is related to PIP-81 which doesn't seem to have an implementation: https://github.com/apache/pulsar/wiki/PIP-81%3A-Split-the-individual-acknowledgments-into-multiple-entries There was a larger PR for PIP-81 that was closed: #10729 |
Btw. I'm currently investigating a Key_Shared subscription type issue where ordinary consumption of message leads to a very large number of "ack holes". The WIP test app where this is reproduced is https://github.com/lhotari/pulsar-playground/blob/master/src/main/java/com/github/lhotari/pulsar/playground/TestScenarioIssueKeyShared.java . No messages get lost. It's just that some messages don't get delivered until all other messages have been processed. |
In the 1M message experiment, the number of ack holes goes down from about 150k ack holes to <500 with this experiment: lhotari@a3b0639 |
6fd14cc
to
319ad5f
Compare
319ad5f
to
1c405fe
Compare
this is a real problem and it has been solved with a simple and fundamentally proven solution with perf numbers : #9292 But again I am not sure some folks blocked this PR without saying the reason even after asking multiple times and blocked the progress on this PR. |
Since a PR implemented the compression of PositionInfo, the size of PositionInfo can be greatly reduced, and the problem of Entry size exceeding the threshold will no longer occur, so this PIP was not further promoted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work Andrey!
If there's way to refactor the logic to avoid byte[]
and use Netty ByteBuf when possible, the solution would be more aligned with the "no garbage" style in Pulsar.
if (log.isDebugEnabled()) { | ||
log.debug("[{}} readComplete rc={} entryId={}", ledger.getName(), rc1, lh1.getLastAddConfirmed()); | ||
LedgerEntry entry = seq.nextElement(); | ||
byte[] data = entry.getEntry(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace this with the use of getEntryBuffer
so that Netty ByteBuf
is used instead of byte[]
.
Large arrays adds significant GC overhead and that's why Netty ByteBuf
is preferred.
It's better to refactor the logic to operate with Netty ByteBufs.
lh.asyncReadEntries(startPos, endPos, new AsyncCallback.ReadCallback() { | ||
@Override | ||
public void readComplete(int rc, LedgerHandle lh, Enumeration<LedgerEntry> entries, Object ctx) { | ||
ByteArrayOutputStream buffer = new ByteArrayOutputStream(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ByteArrayOutputStream
adds a lot of GC overhead compared to the usage of Netty ByteBufs.
Please refactor this to use Netty ByteBufs instead of using ByteArrayOutputStream.
(btw. There are also a lot of gotchas when using Netty ByteBufs, I happened to learn quite a few when working on optimizing https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/stats/prometheus/PrometheusMetricsGenerator.java . The gotchas mainly apply when generating very huge response buffers like it's the case with metrics. The prometheus metrics results might be 500MB of text in certain worst cases with topic level metrics enabled in brokers. In this case we wouldn't have to be concerned about those challenges with Netty ByteBufs since I guess the size of the output isn't in that range.)
} | ||
} | ||
|
||
static byte[] decompressDataIfNeeded(byte[] data, LedgerHandle lh) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor this to use Netty ByteBufs instead of byte[]
@315157973 Are you referring to PIP-146: ManagedCursorInfo compression |
(cherry picked from commit 1ef9664)
* serialize/compress without intermediate byte arrays * use lightproto for cursor serialization to the ledger * Reuse PositionInfo (cherry picked from commit 1887c44)
(cherry picked from commit 98a3d25)
* ManagedCursor: manually serialise PositionInfo * Add tests and save last serialized side to prevent reallocations (cherry picked from commit 8a365d0)
(cherry picked from commit 44ba614)
(cherry picked from commit f1323c6)
(cherry picked from commit d4b94ab)
(cherry picked from commit 5f07f0c)
(cherry picked from commit 6d2e494)
…pache#275) (cherry picked from commit 6a2a010)
(cherry picked from commit 4c5387d)
(cherry picked from commit c3fe80e)
…footer of the chunked data (apache#282) (cherry picked from commit 6e72ecb)
1c405fe
to
1397faf
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #22799 +/- ##
============================================
+ Coverage 73.57% 74.28% +0.70%
- Complexity 32624 34457 +1833
============================================
Files 1877 1935 +58
Lines 139502 146033 +6531
Branches 15299 15998 +699
============================================
+ Hits 102638 108477 +5839
- Misses 28908 29247 +339
- Partials 7956 8309 +353
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Motivation
In some cases cursor position info can be too large to serialize as a single entry, e.g. in case of too many deleted ranges. Also the serialization can be too slow.
cherry-picks of changes by @eolivelli @nicoloboschi and I.
Modifications
Cursor PositionInfo serialization is reworked to produce less garbage/serialize faster; serialized data can be compressed.
In case the serialized data too large it is chunked and saved as a sequence of entries.
Verifying this change
This change added tests.
Does this pull request potentially affect one of the following parts:
NO
If the box was checked, please highlight the changes
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: dlg99#17