-
Notifications
You must be signed in to change notification settings - Fork 213
RFC: Hash-based routing #1222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
RFC: Hash-based routing #1222
Conversation
Hi @beyhan, thanks for picking this up. We are still doing a team-internal review first and keeping this PR in "Draft" state. We will ping again once it's ready for the big stage. |
Co-authored-by: Maximilian Moehl <44866320+maxmoehl@users.noreply.github.com>
2479883
to
d996092
Compare
The application instance is considered overloaded when the current request load of this application exceeds the balance | ||
factor. Overflow traffic should always be directed to the same next instance rather than to a random one. | ||
|
||
A possible presentation of deterministic handling can be a ring like: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to round-robin
, but the "starting point" is always the hash's target instance, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review of the first part, before "Cloud Controller". Further review coming.
- The `hash_header` property is mandatory when load balancing is set to hash | ||
- The `hash_balance` property is optional when load balancing is set to hash. Leaving out `hash_balance` means the | ||
load situation will not be considered | ||
- To account for overload situations, `hash_balance` values should be greater than 110. During the implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some points on the value of hash_balance
:
- We should have a unit (i.e.
100%
, not just100
), if it's percentage. - Do we want to keep it a percentage, or could it be a factor?
- Should it be absolute or relative?
i.e. 100%
vs. 1.00
.
The % might be misleading/hard to understand, because the actual determination of "overload" depends on the number of available instances if I understood correctly.
Thinking of it as "100% of in-flight requests" is wrong, but "100% balance of the part of requests against N instances" is quite complicated to think about.
A factor for "allowed imbalance of requests" of e.g. 1.2 could be easier to understand. But that's my view and might not represent the majority.
We could also consider an "overcommit" or "imbalance" factor or percentage. "120%" hash_balance
would then be hash_imbalance
(or hash_rebalance
?) of 20%
or 0.2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is strongly inspired by this: https://docs.haproxy.org/2.8/configuration.html#4.2-hash-balance-factor.
I don't really get your explanation but this is how it works: It is only relative to the average load across all instances. Essentially avg. load = 100% and when you specify 120% you allow instances to have up to 120% of the average load but not more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great. Maybe we can link the source of inspiration then?
The HAProxy definition is also all over the place. A "factor" is an operand in a multiplication. They also don't use the "%" sign to indicate that it's not a factor of 125 but 125%, so 1.25. I would have the same comments to the HAProxy description to be honest.
For consistency we can keep it the same, but then also like to the place that we're consistent with (i.e. the HAProxy docs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I share your thought and do not think there is a need to be consistent with HAProxy. But for me, both solutions are fine, most important is a good (visual) explanation in the docs and CLI-help!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with both solutions too. I would keep this thread open to hear other opinions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd not link the HAProxy doc, it seems a bit arbitrary. I also don't have a strong preference for 125
vs. 1.25
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference is actually on the "overcommit" / allowed imbalance, so 25% "more", not 125% of the whole.
HAProxy docs would only make sense to link if we stick with exactly their semantics. From what I read, we don't necessarily want that, so the link to HAProxy doesn't make sense. Fully agree.
Co-authored-by: Alexander Lais <Alexander.lais@me.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx!
Some minor comments.
General remark (to the TOC): from a reviewers perspective, the newline-format is not the best choice as it would be simpler to make a proposal for a full sentence.
- The `hash_header` property is mandatory when load balancing is set to hash | ||
- The `hash_balance` property is optional when load balancing is set to hash. Leaving out `hash_balance` means the | ||
load situation will not be considered | ||
- To account for overload situations, `hash_balance` values should be greater than 110. During the implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I share your thought and do not think there is a need to be consistent with HAProxy. But for me, both solutions are fine, most important is a good (visual) explanation in the docs and CLI-help!
Co-authored-by: Patrick Lowin <patrick.lowin@sap.com>
and the balance factor | ||
- It MUST implement the validation of the following requirements: | ||
- The `hash_header` property is mandatory when load balancing is set to hash | ||
- The `hash_balance` property is optional when load balancing is set to hash. Leaving out `hash_balance` means the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that "hash_balance" is optional, but also "values should be greater than 110", how would someone unset their "hash_balance" property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the pattern we (somewhat) agreed on is that providing "property": null
will unset the property. It seems to be not stated in the RFC so maybe it was only implemented that way or my memory is failing me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have removed accidentally that hash_balance: 0 will unset the property, and in this case, the overflow situation will not be considered. I will add it here.
algorithm. Consequently, it will be configured exclusively as a per-route option for applications and will not be | ||
offered as a global setting. | ||
|
||
#### Minimal rehashing over all Gorouter VMs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During a deploy app instances will be changing all over the place. For large deployments this could take hours. Though now that I write this, I suppose the idea is that this is for very rare cases. This would never be on for all apps. So it is (hopefully) not going to be a big performance hit.
I would be interested in seeing a performance metrics around this once it is written to see the cost during a deployment.
Co-authored-by: Amelia Downs <amelia.downs@broadcom.com>
Cloud Foundry uses round-robin and least-connection algorithms for load balancing between Gorouters and backends. Still, they are unsuitable for specific use cases, prompting a proposal to introduce hash-based routing on a per-route basis.
🚀 Link for easy viewing.