Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsqd: don't decrement RDY #404

Merged
merged 2 commits into from
Sep 15, 2014
Merged

Conversation

mreiferson
Copy link
Member

This is something @jehiah and I have discussed for some time, after writing a few client libraries and talking to other client library authors.

In retrospect, we've realized that it isn't actually necessary for nsqd to decrement the RDY count for a given client, forcing client libraries to resend RDY periodically.

It's not the actual decrementing of the RDY count that acts as flow control but rather the fact that nsqd will not send any more than the specified value in-flight coupled with the fact that a consumer needs to respond to each message.

The proposal is to remove this internal implementation detail, which should not impact nsqd, client libraries, or consumers in any meaningful way.

cc @jehiah

@mreiferson mreiferson added the 1.0 label Jul 15, 2014
@ploxiln
Copy link
Member

ploxiln commented Jul 16, 2014

ignorant bystander's opinion: 👍

@mreiferson
Copy link
Member Author

now that we've stamped stable... RFR 😁

@jehiah
Copy link
Member

jehiah commented Jul 25, 2014

In favor of this.

As i think about it, i think the situation this breaks existing expectations client wise is that RDY 1 only sends a single message to "test the waters" when in a backoff state and that there isn't a need to send RDY 0 as soon as you get that message to stop further messages.

That makes me feel like we should update clients to explicitly zero out their RDY states at the right points before this lands.

Another case this behaves differently is if you have a client that is stuck on a message. Say you have RDY 1, 10s msg timeouts and a client gets stuck on a message for 5 minutes. In that case when nsqd times out the message it will start pumping more messages to the client (and timing them out every 10s) where previously the client would have stalled until it finished it's work and renewed it's RDY count (unless it preemptively renewed RDY count upon receipt of the message in which case functionality is the same). We could possibly mitigate with server side backoff on client timeouts, but i'm not sure i like how things get handled in this situation.

@mreiferson
Copy link
Member Author

As i think about it, i think the situation this breaks existing expectations client wise is that RDY 1 only sends a single message to "test the waters" when in a backoff state and that there isn't a need to send RDY 0 as soon as you get that message to stop further messages.

It wouldn't continue to send messages unless that first message was responded to, so I don't think zero'ing out RDY is actually necessary here.

Another case this behaves differently is if you have a client that is stuck on a message. Say you have RDY 1, 10s msg timeouts and a client gets stuck on a message for 5 minutes. In that case when nsqd times out the message it will start pumping more messages to the client (and timing them out every 10s) where previously the client would have stalled until it finished it's work and renewed it's RDY count (unless it preemptively renewed RDY count upon receipt of the message in which case functionality is the same). We could possibly mitigate with server side backoff on client timeouts, but i'm not sure i like how things get handled in this situation.

This situation also occurred to me and I agree that it isn't ideal, but practically all of our client libraries already result in this behavior because the sending of RDY is essentially decoupled from responding to messages (it's performed "up front"), save for some edge cases.

@jehiah
Copy link
Member

jehiah commented Jul 25, 2014

For the first item, my point is that the whole point of client backoff is that you expect no messages and that won't happen any longer. You'll still get them streamed in as fast as you can respond (with no parallelism in the RDY 1 case i described).

@mreiferson
Copy link
Member Author

There's a small window for a single additional message, yes, but the client libraries immediately send a RDY 0 (or full throttle) based on the backoff state after each "test the water" message is responded to.

Which I think is perfectly OK.

@jehiah
Copy link
Member

jehiah commented Jul 25, 2014

hmm ok i'll re-familiarize myself with that handling.

@mreiferson
Copy link
Member Author

@jehiah hit the button?

@jehiah
Copy link
Member

jehiah commented Sep 15, 2014

🔥

jehiah added a commit that referenced this pull request Sep 15, 2014
@jehiah jehiah merged commit 2d44fc3 into nsqio:master Sep 15, 2014
@mreiferson mreiferson deleted the dont_decr_rdy_404 branch September 15, 2014 15:12
alpaker pushed a commit to alpaker/pynsq that referenced this pull request May 10, 2017
redistribution logic accordingly.

This brings reader behavior into agreement with nsqd behavior (compare nsqio/nsq#404)
and removes an opportunity for max_in_flight violations (nsqio#177).
alpaker pushed a commit to alpaker/pynsq that referenced this pull request May 10, 2017
redistribution logic accordingly.

This brings reader behavior into agreement with nsqd behavior (compare nsqio/nsq#404)
and removes an opportunity for max_in_flight violations (nsqio#177).
alpaker pushed a commit to alpaker/pynsq that referenced this pull request May 10, 2017
redistribution logic accordingly.

This brings reader behavior into agreement with nsqd behavior (compare nsqio/nsq#404)
and removes an opportunity for max_in_flight violations (nsqio#177).
alpaker pushed a commit to alpaker/pynsq that referenced this pull request May 10, 2017
redistribution logic accordingly.

This brings reader behavior into agreement with nsqd behavior (compare nsqio/nsq#404)
and removes an opportunity for max_in_flight violations (nsqio#177).
alpaker pushed a commit to alpaker/pynsq that referenced this pull request May 10, 2017
redistribution logic accordingly.

This brings reader behavior into agreement with nsqd behavior (compare nsqio/nsq#404)
and removes an opportunity for max_in_flight violations (nsqio#177).
alpaker pushed a commit to alpaker/pynsq that referenced this pull request May 11, 2017
redistribution logic accordingly.

This brings reader behavior into agreement with nsqd behavior (compare nsqio/nsq#404)
and removes an opportunity for max_in_flight violations (nsqio#177).
alpaker pushed a commit to alpaker/pynsq that referenced this pull request May 12, 2017
redistribution logic accordingly.

This brings reader behavior into agreement with nsqd behavior (compare nsqio/nsq#404)
and removes an opportunity for max_in_flight violations (nsqio#177).
alpaker pushed a commit to alpaker/pynsq that referenced this pull request May 12, 2017
redistribution logic accordingly.

This brings reader behavior into agreement with nsqd behavior (compare nsqio/nsq#404)
and removes an opportunity for max_in_flight violations (nsqio#177).
alpaker pushed a commit to alpaker/pynsq that referenced this pull request May 12, 2017
redistribution logic accordingly.

This brings reader behavior into agreement with nsqd behavior (compare nsqio/nsq#404)
and removes an opportunity for max_in_flight violations (nsqio#177).
alpaker pushed a commit to alpaker/pynsq that referenced this pull request May 12, 2017
redistribution logic accordingly.

This brings reader behavior into agreement with nsqd behavior (compare nsqio/nsq#404)
and removes an opportunity for max_in_flight violations (nsqio#177).
alpaker pushed a commit to alpaker/pynsq that referenced this pull request May 13, 2017
redistribution logic accordingly.

This brings reader behavior into agreement with nsqd behavior (compare nsqio/nsq#404)
and removes an opportunity for max_in_flight violations (nsqio#177).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants