[Bug] A large backlog of Key_Shared subscription messages will result in fullgc and OOM #21045

jdfrozen · 2023-08-22T08:06:44Z

Search before asking

I searched in the issues and found nothing similar.

Version

2.7.x

Minimal reproduce step

1、A large backlog of Key_Shared subscription messages
2、The subscription has multiple consumers

What did you expect to see?

broker functioning

What did you see instead?

1、broker frequent gc
2、broker fullgc
3、broker OOM

This is broker gc monitoring

Add parameters to the use of boot “-XX:+HeapDumpOnOutOfMemoryError”, When fullgc is sent, the analysis is done through mat

Anything else?

Root cause: redeliveryMessages contains a large number of messages

PersistentStickyKeyDispatcherMultipleConsumers.java

@Override
protected synchronized Set<PositionImpl> getMessagesToReplayNow(int maxMessagesToRead) {
    if (isDispatcherStuckOnReplays) {
        // If we're stuck on replay, we want to move forward reading on the topic (until the overall max-unacked
        // messages kicks in), instead of keep replaying the same old messages, since the consumer that these
        // messages are routing to might be busy at the moment
        this.isDispatcherStuckOnReplays = false;
        return Collections.emptySet();
    } else {
        return super.getMessagesToReplayNow(maxMessagesToRead);
    }
}

Are you willing to submit a PR?

I'm willing to submit a PR!

The text was updated successfully, but these errors were encountered:

jdfrozen · 2023-08-22T08:11:23Z

So when this Key_Shared subscription has a lot of consumers, and some consumers are slow consumers, and some consumers start messaging and find out that stickyKeyHash is for slow consumers, Then these messages will add MessagetoReplay, and a large backlog will cause this problem

jdfrozen · 2023-08-22T08:13:33Z

Add parameters to the use of boot "-XX:+HeapDumpBeforeFullGC"

mattisonchao · 2023-08-23T07:22:18Z

The KEY_SHARE mode is a somewhat strict type. That is very sensitive to the consumption(acknowledgement) rate since it should ensure the message order. when adding some consumers to the subscription, the key hash should be recalculated, and some new messages index should keep in the broker memory to avoid breaking delivery order.(one key deliver to one consumer at the moment)

Therefore, It's expected behaviour. You can check why some of your consumers can't catch up or consider If you can try to use another subscription mode like SHARED.

But anyway. You are right. We should have a limit on this container's memory usage to avoid one topic affecting the whole broker.
:)

jdfrozen · 2023-08-24T09:44:25Z

I verified and tested the set-max-unacked-messages-per-subscription as small as 1000 to avoid fullgc.
When I verify, I use the namespace policy "pulsar-admin namespaces get-max-unacked messages-per-subscription"
We want to set the topics level policy. We are using version 2.7.4. Is the topics level policy stable enough?

mattisonchao · 2023-08-25T00:21:06Z

Hi, @jdfrozen
2.7.x is a kinda old version, I am unsure if it can work properly. But you can give it a try. :)

github-actions · 2023-09-24T01:47:54Z

The issue had no activity for 30 days, mark with Stale label.

lhotari · 2024-09-04T17:55:10Z

One of the root causes behind this issue is described in #23200 . It's addressed by #23231 and #23226.
I believe that the OOM issue got mitigated already by #17804.

jdfrozen added the type/bug The PR fixed a bug or issue reported a bug label Aug 22, 2023

mattisonchao assigned mattisonchao and jdfrozen and unassigned mattisonchao and jdfrozen Aug 23, 2023

mattisonchao added category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost area/broker and removed type/bug The PR fixed a bug or issue reported a bug labels Aug 23, 2023

mattisonchao added the help wanted label Aug 23, 2023

github-actions bot added the Stale label Sep 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] A large backlog of Key_Shared subscription messages will result in fullgc and OOM #21045

[Bug] A large backlog of Key_Shared subscription messages will result in fullgc and OOM #21045

jdfrozen commented Aug 22, 2023

jdfrozen commented Aug 22, 2023

jdfrozen commented Aug 22, 2023

mattisonchao commented Aug 23, 2023 •

edited

Loading

jdfrozen commented Aug 24, 2023

mattisonchao commented Aug 25, 2023 •

edited

Loading

github-actions bot commented Sep 24, 2023

lhotari commented Sep 4, 2024 •

edited

Loading

[Bug] A large backlog of Key_Shared subscription messages will result in fullgc and OOM #21045

[Bug] A large backlog of Key_Shared subscription messages will result in fullgc and OOM #21045

Comments

jdfrozen commented Aug 22, 2023

Search before asking

Version

Minimal reproduce step

What did you expect to see?

What did you see instead?

Anything else?

Are you willing to submit a PR?

jdfrozen commented Aug 22, 2023

jdfrozen commented Aug 22, 2023

mattisonchao commented Aug 23, 2023 • edited Loading

jdfrozen commented Aug 24, 2023

mattisonchao commented Aug 25, 2023 • edited Loading

github-actions bot commented Sep 24, 2023

lhotari commented Sep 4, 2024 • edited Loading

mattisonchao commented Aug 23, 2023 •

edited

Loading

mattisonchao commented Aug 25, 2023 •

edited

Loading

lhotari commented Sep 4, 2024 •

edited

Loading