RFC: Hash-based routing #1222

b1tamara · 2025-06-26T05:59:17Z

Cloud Foundry uses round-robin and least-connection algorithms for load balancing between Gorouters and backends. Still, they are unsuitable for specific use cases, prompting a proposal to introduce hash-based routing on a per-route basis.

🚀 Link for easy viewing.

b1tamara · 2025-06-26T17:27:33Z

Hi @beyhan, thanks for picking this up. We are still doing a team-internal review first and keeping this PR in "Draft" state. We will ping again once it's ready for the big stage.

toc/rfc/rfc-draft-hash-based-routing.md

Co-authored-by: Maximilian Moehl <44866320+maxmoehl@users.noreply.github.com>

toc/rfc/rfc-draft-hash-based-routing.md

peanball · 2025-06-30T09:33:27Z

toc/rfc/rfc-draft-hash-based-routing.md

+The application instance is considered overloaded when the current request load of this application exceeds the balance
+factor. Overflow traffic should always be directed to the same next instance rather than to a random one.
+
+A possible presentation of deterministic handling can be a ring like:


This is similar to round-robin, but the "starting point" is always the hash's target instance, right?

toc/rfc/rfc-draft-hash-based-routing.md

peanball

Review of the first part, before "Cloud Controller". Further review coming.

peanball · 2025-06-30T10:52:53Z

toc/rfc/rfc-draft-hash-based-routing.md

+    - The `hash_header` property is mandatory when load balancing is set to hash
+    - The `hash_balance` property is optional when load balancing is set to hash. Leaving out `hash_balance` means the
+      load situation will not be considered
+    - To account for overload situations, `hash_balance` values should be greater than 110. During the implementation


Some points on the value of hash_balance:

We should have a unit (i.e. 100%, not just 100), if it's percentage.

Do we want to keep it a percentage, or could it be a factor?

Should it be absolute or relative?

i.e. 100% vs. 1.00.

The % might be misleading/hard to understand, because the actual determination of "overload" depends on the number of available instances if I understood correctly.

Thinking of it as "100% of in-flight requests" is wrong, but "100% balance of the part of requests against N instances" is quite complicated to think about.

A factor for "allowed imbalance of requests" of e.g. 1.2 could be easier to understand. But that's my view and might not represent the majority.

We could also consider an "overcommit" or "imbalance" factor or percentage. "120%" hash_balance would then be hash_imbalance (or hash_rebalance?) of 20% or 0.2.

This is strongly inspired by this: https://docs.haproxy.org/2.8/configuration.html#4.2-hash-balance-factor.

I don't really get your explanation but this is how it works: It is only relative to the average load across all instances. Essentially avg. load = 100% and when you specify 120% you allow instances to have up to 120% of the average load but not more.

That's great. Maybe we can link the source of inspiration then?

The HAProxy definition is also all over the place. A "factor" is an operand in a multiplication. They also don't use the "%" sign to indicate that it's not a factor of 125 but 125%, so 1.25. I would have the same comments to the HAProxy description to be honest.

For consistency we can keep it the same, but then also like to the place that we're consistent with (i.e. the HAProxy docs).

I share your thought and do not think there is a need to be consistent with HAProxy. But for me, both solutions are fine, most important is a good (visual) explanation in the docs and CLI-help!

I am fine with both solutions too. I would keep this thread open to hear other opinions.

I'd not link the HAProxy doc, it seems a bit arbitrary. I also don't have a strong preference for 125 vs. 1.25.

My preference is actually on the "overcommit" / allowed imbalance, so 25% "more", not 125% of the whole.

HAProxy docs would only make sense to link if we stick with exactly their semantics. From what I read, we don't necessarily want that, so the link to HAProxy doesn't make sense. Fully agree.

toc/rfc/rfc-draft-hash-based-routing.md

Co-authored-by: Alexander Lais <Alexander.lais@me.com>

plowin

Thx!
Some minor comments.

General remark (to the TOC): from a reviewers perspective, the newline-format is not the best choice as it would be simpler to make a proposal for a full sentence.

toc/rfc/rfc-draft-hash-based-routing.md

plowin · 2025-07-02T06:37:26Z

toc/rfc/rfc-draft-hash-based-routing.md

+    - The `hash_header` property is mandatory when load balancing is set to hash
+    - The `hash_balance` property is optional when load balancing is set to hash. Leaving out `hash_balance` means the
+      load situation will not be considered
+    - To account for overload situations, `hash_balance` values should be greater than 110. During the implementation


I share your thought and do not think there is a need to be consistent with HAProxy. But for me, both solutions are fine, most important is a good (visual) explanation in the docs and CLI-help!

toc/rfc/rfc-draft-hash-based-routing.md

Co-authored-by: Patrick Lowin <patrick.lowin@sap.com>

ameowlia · 2025-07-03T17:51:47Z

toc/rfc/rfc-draft-hash-based-routing.md

+  and the balance factor
+- It MUST implement the validation of the following requirements:
+    - The `hash_header` property is mandatory when load balancing is set to hash
+    - The `hash_balance` property is optional when load balancing is set to hash. Leaving out `hash_balance` means the


Given that "hash_balance" is optional, but also "values should be greater than 110", how would someone unset their "hash_balance" property?

I think the pattern we (somewhat) agreed on is that providing "property": null will unset the property. It seems to be not stated in the RFC so maybe it was only implemented that way or my memory is failing me.

I might have removed accidentally that hash_balance: 0 will unset the property, and in this case, the overflow situation will not be considered. I will add it here.

ameowlia · 2025-07-03T17:54:03Z

toc/rfc/rfc-draft-hash-based-routing.md

+algorithm. Consequently, it will be configured exclusively as a per-route option for applications and will not be
+offered as a global setting.
+
+#### Minimal rehashing over all Gorouter VMs


During a deploy app instances will be changing all over the place. For large deployments this could take hours. Though now that I write this, I suppose the idea is that this is for very rare cases. This would never be on for all apps. So it is (hopefully) not going to be a big performance hit.

I would be interested in seeing a performance metrics around this once it is written to see the cost during a deployment.

toc/rfc/rfc-draft-hash-based-routing.md

Co-authored-by: Amelia Downs <amelia.downs@broadcom.com>

RFC: Hash-based routing

a9fef36

cf-foundation-community-automation bot moved this to Inbox in CF Community Jun 26, 2025

cf-foundation-community-automation bot added this to CF Community Jun 26, 2025

beyhan requested review from a team, rkoster, beyhan, stephanme, ameowlia and ChrisMcGowan and removed request for a team June 26, 2025 08:44

beyhan added the rfc CFF community RFC label Jun 26, 2025

maxmoehl reviewed Jun 27, 2025

View reviewed changes

b1tamara and others added 3 commits June 27, 2025 12:39

Apply suggestions from code review

0ead3ae

Co-authored-by: Maximilian Moehl <44866320+maxmoehl@users.noreply.github.com>

Reformat RFC

1c0e32b

Apply review feedback

d996092

b1tamara force-pushed the rfc-hash-based-routing branch from 2479883 to d996092 Compare June 30, 2025 08:20

peanball reviewed Jun 30, 2025

View reviewed changes

toc/rfc/rfc-draft-hash-based-routing.md Outdated Show resolved Hide resolved

b1tamara and others added 3 commits June 30, 2025 13:22

Apply suggestions from code review

459b7e6

Co-authored-by: Alexander Lais <Alexander.lais@me.com>

Small enhancements

b23adde

Rephrased version

f7d649f

plowin reviewed Jul 2, 2025

View reviewed changes

b1tamara and others added 4 commits July 2, 2025 08:42

Apply suggestions from code review

7f7c926

Co-authored-by: Patrick Lowin <patrick.lowin@sap.com>

Add more about minimal rehashing

9924541

Update diagrams

816fe0b

Minor rephrasing

3d1ae88

b1tamara marked this pull request as ready for review July 3, 2025 15:01

ameowlia reviewed Jul 3, 2025

View reviewed changes

toc/rfc/rfc-draft-hash-based-routing.md Show resolved Hide resolved

Update hash-based RFC

8966a9d

Co-authored-by: Amelia Downs <amelia.downs@broadcom.com>

RFC: Hash-based routing #1222

Are you sure you want to change the base?

RFC: Hash-based routing #1222

Conversation

b1tamara commented Jun 26, 2025 • edited by ameowlia Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

b1tamara commented Jun 26, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

peanball left a comment

Choose a reason for hiding this comment

Uh oh!

peanball Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

plowin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

b1tamara commented Jun 26, 2025 •

edited by ameowlia

Loading

peanball Jun 30, 2025 •

edited

Loading