Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stale] Throughput limit exemptions for users #11555

Closed
wants to merge 11 commits into from

Conversation

dlex
Copy link
Contributor

@dlex dlex commented Jun 20, 2023

Throughput control groups are used to define throughput limit exemptions at the moment. This PR adds a criteria to throughput control groups allowing them to match authenticated connections by the principal type user.

Re #11438

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

Improvements

  • SASL users can now be excluded from node wide throughput control. The cluster property kafka_throughput_control is used to define throughput control groups for which Kafka traffic will not be limited by the values specified by kafka_throughput_limit_node_*_bps. The criteria for these groups now can have a list of authenticated usernames along with client_id regex.

@dlex dlex requested review from dotnwat and BenPope June 20, 2023 14:37
@dlex dlex self-assigned this Jun 20, 2023
@dlex dlex force-pushed the 11438/tp-exemptions-for-users branch from 9696a2e to 51ef2ec Compare June 20, 2023 15:05
@dlex dlex marked this pull request as ready for review June 20, 2023 15:09
@dlex dlex marked this pull request as draft June 20, 2023 15:45
@dlex dlex force-pushed the 11438/tp-exemptions-for-users branch from 51ef2ec to b080c8f Compare June 21, 2023 20:45
@dlex dlex marked this pull request as ready for review June 21, 2023 23:00
Copy link
Member

@BenPope BenPope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.

Mostly style nitpicks (feel free to ignore), a couple of questions, and one line in the wrong commit.

tests/rptest/services/rpk_producer.py Outdated Show resolved Hide resolved
src/v/kafka/server/snc_quota_manager.cc Outdated Show resolved Hide resolved
src/v/kafka/server/snc_quota_manager.cc Show resolved Hide resolved
Comment on lines 75 to 108
// Equality
for (size_t k = 0; k != cfg.cgroups().size(); ++k) {
BOOST_TEST(cfg.cgroups()[k] == cfg.cgroups()[k], "k=" << k);
for (size_t l = 0; l != cfg.cgroups().size(); ++l) {
if (k != l) {
BOOST_TEST(
!(cfg.cgroups()[k] == cfg.cgroups()[l]),
"k=" << k << " l=" << l);
}
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: This code is duplicated in throughput_control_group_by_principal_test further down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same in both tests however the data (the configuration) is different, so effectively one of them checks how client_id_matcher_type::operator== is used and the other checks acl_principal::operator==

src/v/config/tests/throughput_control_group_test.cc Outdated Show resolved Hide resolved
src/v/security/fwd.h Outdated Show resolved Hide resolved
src/v/config/throughput_control_group.cc Outdated Show resolved Hide resolved
tests/rptest/tests/throughput_limits_snc_test.py Outdated Show resolved Hide resolved
src/v/kafka/server/snc_quota_manager.cc Outdated Show resolved Hide resolved
src/v/kafka/server/snc_quota_manager.h Outdated Show resolved Hide resolved
@dlex dlex force-pushed the 11438/tp-exemptions-for-users branch from b080c8f to 2bbfd99 Compare June 28, 2023 23:55
@dlex
Copy link
Contributor Author

dlex commented Jun 28, 2023

Force push: minor changes f/up the review

@dlex dlex requested a review from BenPope June 28, 2023 23:59
@dlex
Copy link
Contributor Author

dlex commented Jun 29, 2023

@dlex
Copy link
Contributor Author

dlex commented Jun 30, 2023

/ci-repeat 5
debug
skip-units
dt-repeat=5
tests/rptest/tests/cloud_storage_chunk_read_path_test::CloudStorageChunkReadTest.test_read_when_cache_smaller_than_segment_size

@dlex
Copy link
Contributor Author

dlex commented Jun 30, 2023

/ci-repeat 1
debug
skip-units
dt-repeat=1
tests/rptest/tests/cloud_storage_chunk_read_path_test::CloudStorageChunkReadTest.test_read_when_cache_smaller_than_segment_size

Add a function to RpkTool to create an allow topic ACL for a user.
A reusable part of acl_create_allow_cluster refactored.
@dlex dlex force-pushed the 11438/tp-exemptions-for-users branch from 2bbfd99 to 9481abd Compare August 13, 2023 20:01
Copy link
Contributor

@michael-redpanda michael-redpanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just a couple of comments

tests/rptest/services/rpk_producer.py Outdated Show resolved Hide resolved
src/v/kafka/server/snc_quota_manager.cc Outdated Show resolved Hide resolved
Support --username/--password authentication options

Add a function to check if the producer is running

Log service name with progress messages so that in case of several
producers running at the same time, it is possible to understand
who is doing what
typos, renames, comments
when throughput group is matched, log the client address
Becasue of the unique_ptr involved, the default equality operator
does not work correctly here.
`throughput_control_group` gets additional matching criteria by ACL
principals (users, etc.). In yaml/json, there can now be a list of
principals the group must match besides client_id if provided. Only the
principal type `user` is supported in this commit.
dlex and others added 5 commits October 25, 2023 23:20
Add user names to the matching pattern of throughput_control groups
acl_principal is now another key to snc_quotas_context. It is passed
from the connection_context to create the context by matching with
a tput ctrl group, and to verify whether the context is still valid.
Measure time RpkProducer has spent sending messages and make it
available via a property

Prefix the "Finish sending" message with the service_id so that
it's clear in the log what producer it is for in case there are
many running simultaneously
A new test to verify that when a user matches a tput ctrl group,
the connections authenticated by the user are not throttled while
the rest still are.
to complement the logging of node configurations. Note that
cluster_config.yaml records cluster config when nodes stop, whlie
this log dumps the initial bootstrap config.
@dlex dlex force-pushed the 11438/tp-exemptions-for-users branch from 9481abd to ea717ef Compare October 26, 2023 03:47
@dlex
Copy link
Contributor Author

dlex commented Oct 26, 2023

Force push:

  • use optional<reference_wrapper> instead of raw pointer with acl_principal (thanks to @michael-redpanda and @BenPope for the hints)
  • test_throughput_groups_exemptions improved: made more robust to the environment, use better time measurement
  • rpk_producer takes username and passwd as a single argument

@aanthony-rp aanthony-rp changed the title Throughput limit exemptions for users [stale] Throughput limit exemptions for users Apr 22, 2024
@aanthony-rp
Copy link
Contributor

7 months old. not merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants