-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure the client can get response when encounter producer busy exception. #13245
Make sure the client can get response when encounter producer busy exception. #13245
Conversation
…ception. When the producer with the same producer ID and same connection, Of course this usually doesn't happen. When cherry-picking apache#12846, the test using the same producer ID https://github.com/apache/pulsar/pull/12846/files#diff-b73845cd7706e03b89122a21b3e60c36c95ce6a0c7dd4574f8eda3fc67dd02b1R867-R876 And after we apply the change https://github.com/apache/pulsar/pull/8685/files#diff-1e0e8195fb5ec5a6d79acbc7d859c025a9b711f94e6ab37c94439e99b3202e84R1161 Only one request can get a response when encounter exceptions. The fix is make sure the broker don't miss the response to the client, we should make sure one request can get one response, otherwise the client side might get blocked.
/pulsarbot run-failure-checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should keep the current behavior. The only time that the producerFuture
is completed early is when the client sends a CloseProducer
command, and in that case, the client is explicitly telling the broker to stop tracking the prior request.
Note that one primary failure here is caused by client timeouts. The client tries to create the same producer multiple times (and closes it each time the producer creation times out), and then if the broker finally loads the topic, the futures will execute from the stack. The first one succeeds, and the rest fail because they all have the same name as the first producer.
I will be interested to see what other community members think here. Please feel free to dismiss my review if this PR is accepted.
Edit: a previous version of this comment assumed that this PR was solely for branch-2.7. After talking with @codelipenghui on slack, I realize my initial interpretation of the PR was wrong.
@merlimat @315157973 @hangc0276 Please help check this one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@eolivelli - did you see my above comment #13245 (review)? I am opposed to this change because it makes the broker send a response to a client request that is known to be timed out. The broker knows this because the client completed the In my opinion, the core contract in this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michaeljmarshall I missed your comment sorry
Thanks @codelipenghui Through tcpdump, it was found that Go SDK sent ProducerCommand to the broker every time it reconnected, but the broker did not reply any Response to the client side, which caused the Go SDK to fail to reconnect until the entire client service was restarted. The broker version: branch-2.7.2 And the issue refer to: apache/pulsar-client-go#697 |
@wolfstudy - note that a 2.7.2 broker has the behavior that @codelipenghui is proposing here. |
When the producer with the same producer ID and same connection,
Of course, this usually doesn't happen. When cherry-picking #12846,
the test using the same producer ID https://github.com/apache/pulsar/pull/12846/files#diff-b73845cd7706e03b89122a21b3e60c36c95ce6a0c7dd4574f8eda3fc67dd02b1R867-R876
And after we apply the change https://github.com/apache/pulsar/pull/8685/files#diff-1e0e8195fb5ec5a6d79acbc7d859c025a9b711f94e6ab37c94439e99b3202e84R1161
Only one request can get a response when encountering exceptions.
The fix is to make sure the broker doesn't miss the response to the client,
we should make sure one request can get one response, otherwise the client
side might get blocked.
Documentation
Check the box below and label this PR (if you have committer privilege).
Need to update docs?
doc-required
(If you need help on updating docs, create a doc issue)
no-need-doc
(Please explain why)
doc
(If this PR contains doc changes)