Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Fail fast if the extensible load manager failed to start #23297

Conversation

BewareMyPower
Copy link
Contributor

Motivation

#23230 tries to fix the issue that "the pulsar broker cannot catch the exception" caused by #22977. However, the fix is incorrect because even before #22977, the broker also wouldn't fail if the extensible load manager failed to start.

It's destructive to have broker running with a failed load manager. Since BrokerRegistry#unregister will be called for failures, the issue broker will unregister itself from the metadata store and it could not be selected as the owner broker. Besides, all lookup requests sent to this broker will fail because the load manager is not started.

Modifications

  • Revert [improve][broker] Add retry for start service unit state channel (ExtensibleLoadManagerImpl only) #23230
  • Implement failStarting correctly:
    • If it has registered, just unregister the broker from the metadata store via BrokerRegistry#close and swallow the exception.
    • Complete initWaiter with false to tell background threads to exit directly. Otherwise, methods like playFollower will swallow the exception from initWaiter and continue the loop.
    • Propagate the exception by wrapping checked exception PulsarServerException into a unchecked exception CompletionException and unwrap it in PulsarService#start's catch block.
  • Add LoadManagerFailFastTest to verify the fail-fast behaviors and the broker will unregister itself from the metadata store.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@BewareMyPower BewareMyPower added type/bug The PR fixed a bug or issue reported a bug release/3.0.7 release/3.3.2 labels Sep 12, 2024
@BewareMyPower BewareMyPower added this to the 4.0.0 milestone Sep 12, 2024
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Sep 12, 2024
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codecov-commenter
Copy link

codecov-commenter commented Sep 13, 2024

Codecov Report

Attention: Patch coverage is 78.12500% with 7 lines in your changes missing coverage. Please review.

Project coverage is 74.54%. Comparing base (bbc6224) to head (db8cf29).
Report is 578 commits behind head on master.

Files with missing lines Patch % Lines
...dbalance/extensions/ExtensibleLoadManagerImpl.java 72.00% 5 Missing and 2 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #23297      +/-   ##
============================================
+ Coverage     73.57%   74.54%   +0.97%     
- Complexity    32624    33788    +1164     
============================================
  Files          1877     1927      +50     
  Lines        139502   145030    +5528     
  Branches      15299    15858     +559     
============================================
+ Hits         102638   108115    +5477     
+ Misses        28908    28648     -260     
- Partials       7956     8267     +311     
Flag Coverage Δ
inttests 27.86% <34.37%> (+3.27%) ⬆️
systests 24.66% <0.00%> (+0.34%) ⬆️
unittests 73.89% <78.12%> (+1.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...rg/apache/pulsar/broker/PulsarServerException.java 75.00% <100.00%> (+15.00%) ⬆️
...n/java/org/apache/pulsar/broker/PulsarService.java 83.63% <100.00%> (+1.26%) ⬆️
...dbalance/extensions/ExtensibleLoadManagerImpl.java 82.26% <72.00%> (+2.17%) ⬆️

... and 558 files with indirect coverage changes

@BewareMyPower BewareMyPower merged commit fc60ec0 into apache:master Sep 13, 2024
52 checks passed
@BewareMyPower BewareMyPower deleted the bewaremypower/extensible-lm-fail-fast branch September 13, 2024 03:14
lhotari pushed a commit to lhotari/pulsar that referenced this pull request Sep 13, 2024
@lhotari
Copy link
Member

lhotari commented Sep 13, 2024

There are issues in backporting to branch-3.0. I created a separate PR #23297

heesung-sn pushed a commit that referenced this pull request Sep 14, 2024
lhotari added a commit that referenced this pull request Sep 14, 2024
…iled to start (#23297) (#23302)

Co-authored-by: Yunze Xu <xyzinfernity@163.com>
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Sep 19, 2024
…iled to start (apache#23297) (apache#23302)

Co-authored-by: Yunze Xu <xyzinfernity@163.com>
(cherry picked from commit 6d8b15d)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Sep 19, 2024
…iled to start (apache#23297) (apache#23302)

Co-authored-by: Yunze Xu <xyzinfernity@163.com>
(cherry picked from commit 6d8b15d)
BewareMyPower added a commit that referenced this pull request Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants