Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit memory used while fetching many partitions #10905

Merged
merged 12 commits into from
Jun 7, 2023

Commits on May 31, 2023

  1. config: support nonintegral bound properties

    `numeric_bounds<T>` implied the argument to be an integral numeric
    by requiring the `%` operation on it. This change renames `numeric_bounds`
    into `numeric_integral_bounds` to emphasize that, and introduces
    the `numeric_bounds` that does not support alignments and odd/even checks,
    and thus works with floating point types too.
    
    The `bounded_property` now can accept an arbitrary bounds struct that
    conforms to `detail::bounds<>` concept. For compatibility purpose,
    it defaults to `numeric_integral_bounds` so no code change is necessary.
    dlex committed May 31, 2023
    Configuration menu
    Copy the full SHA
    1db2edc View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ee5e069 View commit details
    Browse the repository at this point in the history

Commits on Jun 1, 2023

  1. k/fetch: respect the max_bytes from the fetch request

    While limiting the number of partitions in fetch response by
    `kafka_max_bytes_per_fetch`, also consider the fetch plan's `bytes_left`
    which is based on on fetch request's max_bytes and on `fetch_max_bytes`
    property.
    dlex committed Jun 1, 2023
    Configuration menu
    Copy the full SHA
    eb8a915 View commit details
    Browse the repository at this point in the history
  2. k/server: kafka::server made a peering_sharded_service

    Functions down the fetch code path will need access to the local
    kafka::server instance members like memory semaphores.
    dlex committed Jun 1, 2023
    Configuration menu
    Copy the full SHA
    d891efe View commit details
    Browse the repository at this point in the history
  3. tests/fixture: initialize server configuration

    fix uninitialized max_service_memory_per_core, also disable metrics
    dlex committed Jun 1, 2023
    Configuration menu
    Copy the full SHA
    58a9de0 View commit details
    Browse the repository at this point in the history
  4. k/server: kafka memory fetch semaphore

    Kafka server now stores (per shard) memory semaphore that will limit
    memory usage by fetch request handler. Semaphore count is configured
    based on the "kafka_memory_share_for_fetch" property and the kafka
    rpc service memory size.
    
    Metric `vectorized_kafka_rpc_fetch_avail_mem_bytes` added to control
    the semaphore level.
    
    There is a sharded `server` accessor in `request_context` to reach
    the local shard instance of the new semaphore, as well as the local
    instance of `net::memory` semaphore.
    dlex committed Jun 1, 2023
    Configuration menu
    Copy the full SHA
    7b38601 View commit details
    Browse the repository at this point in the history
  5. k/fetch: use replica_selector from the fetching shard

    Access `replica_selector` via the newly exposed `sharded<server>`
    to reach the local shard instance of `kafka::server` and its
    replica_selector. This prevents cross shard access to `metadata_cache`
    and future objects when replica selectors evolve.
    dlex committed Jun 1, 2023
    Configuration menu
    Copy the full SHA
    d3d9276 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    c1d77cd View commit details
    Browse the repository at this point in the history

Commits on Jun 2, 2023

  1. k/fetch: limit fetch ntp parallelism

    Consult with memory semaphores on whether there is enough memory available
    to perform the fetch while concurrently fetching from ntps. Both general
    kafka memory semaphore, and the new kafka fetch memory semaphores
    are used. With the former one, the amount consumed from it by request
    memory estimator is considered.
    
    Since batch size is not known ahead, it is estimated at 1 MiB. The first
    partition in the list is fetched regardless of the semaphores values, to
    satisfy the requirement that at least a signle partition from the
    fetch request must advance.
    
    The amount of units held is adjusted to the actual size used as soon as
    it is known.
    
    The acquired units of the memory semaphores are held with `read_result`
    until it is destroyed at the end of the fetch request processing. When
    `read_result` is destroyed in the connection shard, the semaphore units
    are returned in the shard where they have been acquired.
    
    If request's max_size bytes is more than either semaphore holds,
    max_size is reduced to the memory actually available, also considering
    the minimum batch size.
    dlex committed Jun 2, 2023
    Configuration menu
    Copy the full SHA
    2474ef6 View commit details
    Browse the repository at this point in the history
  2. net: fix uninitialized member of server_configuration

    In kafka_server_rpfixture, an extra `kafka::server` is created using
    a barely initialized `server_configuration` instance. A garbage in
    `max_service_memory_per_core` has caused issues now because of the
    new arithmetics done with in in the kafka::server ctor.
    dlex committed Jun 2, 2023
    Configuration menu
    Copy the full SHA
    623e613 View commit details
    Browse the repository at this point in the history
  3. k/tests: UT for the memory limiting algo

    Test the algorithm that decides whether can a fetch request proceed
    in an ntp based on the resources available.
    
    Move the existing testing-only symbols into the `testing` ns.
    dlex committed Jun 2, 2023
    Configuration menu
    Copy the full SHA
    950abc7 View commit details
    Browse the repository at this point in the history
  4. tests: enable test_fetch_with_many_partitions

    RAM increased to 512M because redpanda was failing on 256M for unrelated
    reasons.
    
    Test with different values for "kafka_memory_share_for_fetch".
    dlex committed Jun 2, 2023
    Configuration menu
    Copy the full SHA
    06a38b9 View commit details
    Browse the repository at this point in the history