Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make zone load balancing logic depend on local_cluster host distribution. #174

Merged
merged 13 commits into from
Oct 31, 2016

Conversation

RomanDzhabarov
Copy link
Member

Documentation around runtime control/stats/general overview are going to be on a separate commit.

I was running simulation tests from load_balancer_simulation_test.cc, results look very good.


// If number of hosts in a local zone big enough then route all requests to the same zone.
if (local_zone_healthy_hosts.size() * number_of_zones >= host_set_.healthyHosts().size()) {
std::vector<uint64_t> local_percentage =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a stack allocated array with the right size, rework the functions above to take the array and populate it if you want to do that. You can't allocate things like vectors in this code path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just do the pre-calculation work now. I would actually prefer you just do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i made stack alloc for now, i want make sure things works good in prod

// At this point we should route cross zone as we cannot route to the local zone.
stats_.zone_routing_no_sampled_.inc();

std::vector<uint64_t> capacity_left;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stack allocated array

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

stats_.zone_routing_no_sampled_.inc();

std::vector<uint64_t> capacity_left;
// Local zone does not have additional capacity (we already routed what we could), but
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More comments. Don't understand what is happening here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

if (total_hosts != 0) {
size_t i = 0;
for (const auto& zone_hosts : hosts_per_zone) {
ret[i++] = 10000ULL * zone_hosts.size() / total_hosts;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comment explaining the 10,000ULL.

Copy link
Member Author

@RomanDzhabarov RomanDzhabarov Oct 31, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, good point, will add more details into load_balancer_impl.h. Done.

// Now we need to figure out how much traffic we can route cross zone and to which exact zone
// we should route. Percentage of requests routed cross zone to a specific zone should be
// proportional to the residual capacity upstream zone has.
//
Copy link
Contributor

@ccaraman ccaraman Oct 31, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -192,10 +192,12 @@ class HostSet {
COUNTER(update_success) \
COUNTER(update_failure) \
COUNTER(zone_cluster_too_small) \
COUNTER(zone_over_percentage) \
COUNTER(zone_routing_all_directly) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you prefix all of the lb ones with "lb_" and then reorder this list.

Copy link
Member Author

@RomanDzhabarov RomanDzhabarov Oct 31, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, will do for consistency with panic threshold stats

stats_.zone_over_percentage_.inc();
return local_zone_healthy_hosts;
uint64_t local_percentage[number_of_zones];
calculateZonePercentage(local_host_set_->healthyHostsPerZone(), local_percentage);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ASSERT here also that local_host_set_->healthyHostsPerZone().size() == host_set_.healthyHostsPerZone().size()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

* In this case we'll route requests to hosts no matter if they are healthy or not.
*/
bool isGlobalPanic(const HostSet& host_set);
const std::vector<HostPtr>& tryChooseLocalZoneHosts();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add comment or newline before this line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put comments.

// Local zone (index 0) does not have residual capacity as we have routed all we could.
residual_capacity[0] = 0;
for (size_t i = 1; i < number_of_zones; ++i) {
// Only route to the zones that have additional capacity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still having trouble quickly understanding this logic. More comments, and potentially an example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more comments added

// Not zone routed requests will be distributed between all hosts and hence
// we need to route only fraction of req_percent_to_route to the local zone.
double actual_routing_ratio = (ratio_to_route - zone_host_ratio) / (1 - zone_host_ratio);
// Random sampling to select specific zone for cross zone traffic based on the additional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more comments. What type of number does residual_capacity[number_of_zones - 1] have in it. What range?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more comments added

// sampled value is, we accumulate values in residual capacity. This is what it will look like:
// residual_capacity: 0 10000 15000
// Now to find zone to route (bucket) we could simply iterate over residual_capacity searching
// where
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you fix line wrapping in this comment block. Keep to 100 col flow unless you need a new paragraph.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:( fix format broke this... fixing

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg after nit

@RomanDzhabarov RomanDzhabarov merged commit d4d2393 into master Oct 31, 2016
@RomanDzhabarov RomanDzhabarov deleted the use_local_cluster branch November 1, 2016 02:46
jplevyak referenced this pull request in jplevyak/envoy Sep 17, 2019
* Add Dockerfile for the C++ SDK.

Signed-off-by: John Plevyak <jplevyak@gmail.com>
jplevyak referenced this pull request in istio/envoy Sep 17, 2019
* Add Dockerfile for the C++ SDK.

Signed-off-by: John Plevyak <jplevyak@gmail.com>
PiotrSikora pushed a commit to PiotrSikora/envoy that referenced this pull request Oct 10, 2019
* Add Dockerfile for the C++ SDK.

Signed-off-by: John Plevyak <jplevyak@gmail.com>
PiotrSikora added a commit to PiotrSikora/envoy that referenced this pull request Mar 3, 2020
…#114) (envoyproxy#174)

Previously, the update callback was called only when the secret
was received for the first time or when its value changed.

This meant that if the same secret (e.g. trusted CA) was used in
multiple resources, then resources using it but configured after
the secret was already received, remained unconfigured until the
secret's value changed.

The missing callback should have resulted in transport factories
stuck in the "not ready" state, however, because of an incorrect
code, the available secret was processed like inlined validation
context, and only rules from the "secret" part of the validation
context were applied, leading to a complete bypass of rules from
the "default" part.

Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Co-authored-by: Oliver Liu <yonggangl@google.com>

Co-authored-by: Oliver Liu <yonggangl@google.com>
wolfguoliang pushed a commit to wolfguoliang/envoy that referenced this pull request Jan 23, 2021
…verprovisioning_factor

zh-translation: /intro/arch_overview/upstream/load_balancing/overprovisioning.rst
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants