[Merged by Bors] - fix: release batches lock #2840

davidbeesley · 2022-12-01T00:17:51Z

scopes mutex guard in PartitionProducer so that mutex is released immediately instead being held while batches are sent.

Means that TopicProducer::send() returns quickly instead of blocking while batches are sent.

This change gives almost a 10x improvement on producer throughput when producing to the cloud. Note the regression in latency is beacause all messages get "sent" right away where as previously messages blocked on send while waiting for other batches to go out.

Config

matrix_name: ExampleMatrix
topic_name: benchmarking-baingmcvadyojtn
current_profile: cloud
timestamp: "2022-12-01T00:37:15.840379872Z"
worker_timeout:
  secs: 10000
  nanos: 0
num_samples: 100
duration_between_samples:
  secs: 0
  nanos: 300000000
num_records_per_producer_worker_per_batch: 1000
producer_batch_size: 16000
producer_queue_size: 100
producer_linger:
  secs: 0
  nanos: 10000000
producer_server_timeout:
  secs: 5
  nanos: 0
producer_compression: none
consumer_max_bytes: 64000
num_concurrent_producer_workers: 1
num_concurrent_consumers_per_partition: 1
num_partitions: 1
record_size: 1000
record_key_allocation_strategy: NoKey

Per Record E2E Latency

Variable	p0.00	p0.50	p0.95	p0.99	p1.00
Latency	119.808ms	622.591ms	924.671ms	1.005567s	1.060863s

Throughput (Total Produced Bytes / Time)

Variable	Min	Median	Max	Description
Producer Throughput	2.165mb/s	2.658mb/s	3.052mb/s	First Produced Message <-> Last Produced Message
Consumer Throughput	1.154mb/s	1.389mb/s	1.563mb/s	First Consumed Message (First Time Consumed) <-> Last Consumed Message (First Time Consumed)
Combined Throughput	0.942mb/s	0.989mb/s	1.137mb/s	First Produced Message <-> Last Consumed Message (First Time Consumed)

Comparision with previous results: cloud @ 2022-12-01 00:24:33.995062094 UTC

Variable	Change	Previous	Current
Latency	Worse	405.427ms	1.003788s
Producer Throughput	Better	0.288mb/s	2.626mb/s
Consumer Throughput	Better	0.249mb/s	1.384mb/s
Combined Throughput	Better	0.240mb/s	0.996mb/s

sehz

Is there unit test?

davidbeesley · 2022-12-01T00:29:16Z

Is there unit test?

I don't know if there was one already but I didn't add a new one. The diff looks much more complicated than the change is, as I just added { } around the code that actually uses the guard.

I don't think this change benefits from a unit test because the compiler guarantees no direct changes because the guard was never used outside of the new scope. If there is a weird side effect on code running in another thread that was being prevented by holding the mutex longer it won't be caught by a unit test. Regardless, this code path is tested by our end to end tests.

davidbeesley · 2022-12-01T00:54:24Z

To clarify, batches were already copied out of the mutex protected queue into a local vec before sending. It's just that the lock on the queue was held around

morenol

LGTM

davidbeesley · 2022-12-01T01:42:51Z

bors r+

scopes mutex guard in `PartitionProducer` so that mutex is released immediately instead being held while batches are sent. Means that `TopicProducer::send()` returns quickly instead of blocking while batches are sent. This change gives almost a 10x improvement on producer throughput when producing to the cloud. Note the regression in latency is beacause all messages get "sent" right away where as previously messages blocked on `send` while waiting for other batches to go out. **Config** ```yaml--- matrix_name: ExampleMatrix topic_name: benchmarking-baingmcvadyojtn current_profile: cloud timestamp: "2022-12-01T00:37:15.840379872Z" worker_timeout: secs: 10000 nanos: 0 num_samples: 100 duration_between_samples: secs: 0 nanos: 300000000 num_records_per_producer_worker_per_batch: 1000 producer_batch_size: 16000 producer_queue_size: 100 producer_linger: secs: 0 nanos: 10000000 producer_server_timeout: secs: 5 nanos: 0 producer_compression: none consumer_max_bytes: 64000 num_concurrent_producer_workers: 1 num_concurrent_consumers_per_partition: 1 num_partitions: 1 record_size: 1000 record_key_allocation_strategy: NoKey ``` **Per Record E2E Latency** |Variable| p0.00 | p0.50 | p0.95 | p0.99 | p1.00 | |--------|---------|---------|---------|---------|---------| |Latency |119.808ms|622.591ms|924.671ms|1.005567s|1.060863s| **Throughput (Total Produced Bytes / Time)** | Variable | Min | Median | Max | Description | |-------------------|---------|---------|---------|--------------------------------------------------------------------------------------------| |Producer Throughput|2.165mb/s|2.658mb/s|3.052mb/s| First Produced Message <-> Last Produced Message | |Consumer Throughput|1.154mb/s|1.389mb/s|1.563mb/s|First Consumed Message (First Time Consumed) <-> Last Consumed Message (First Time Consumed)| |Combined Throughput|0.942mb/s|0.989mb/s|1.137mb/s| First Produced Message <-> Last Consumed Message (First Time Consumed) | **Comparision with previous results: cloud @ 2022-12-01 00:24:33.995062094 UTC** | Variable |Change|Previous | Current |P-Value| |-------------------|------|---------|---------|-------| | Latency |Worse |405.427ms|1.003788s|0.00000| |Producer Throughput|Better|0.288mb/s|2.626mb/s|0.00000| |Consumer Throughput|Better|0.249mb/s|1.384mb/s|0.00000| |Combined Throughput|Better|0.240mb/s|0.996mb/s|0.00000|

bors · 2022-12-01T02:09:20Z

Pull request successfully merged into master.

Build succeeded:

Done

release-lock

5d2f1da

davidbeesley requested a review from sehz December 1, 2022 00:17

sehz reviewed Dec 1, 2022

View reviewed changes

version-bump

8a55116

davidbeesley requested a review from sehz December 1, 2022 00:49

davidbeesley requested a review from morenol December 1, 2022 01:37

sehz approved these changes Dec 1, 2022

View reviewed changes

morenol approved these changes Dec 1, 2022

View reviewed changes

bors bot changed the title ~~fix: release batches lock~~ [Merged by Bors] - fix: release batches lock Dec 1, 2022

bors bot closed this Dec 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Merged by Bors] - fix: release batches lock #2840

[Merged by Bors] - fix: release batches lock #2840

davidbeesley commented Dec 1, 2022 •

edited

Loading

sehz left a comment

davidbeesley commented Dec 1, 2022 •

edited

Loading

davidbeesley commented Dec 1, 2022

morenol left a comment

davidbeesley commented Dec 1, 2022

bors bot commented Dec 1, 2022

[Merged by Bors] - fix: release batches lock #2840

[Merged by Bors] - fix: release batches lock #2840

Conversation

davidbeesley commented Dec 1, 2022 • edited Loading

sehz left a comment

Choose a reason for hiding this comment

davidbeesley commented Dec 1, 2022 • edited Loading

davidbeesley commented Dec 1, 2022

morenol left a comment

Choose a reason for hiding this comment

davidbeesley commented Dec 1, 2022

bors bot commented Dec 1, 2022

davidbeesley commented Dec 1, 2022 •

edited

Loading

davidbeesley commented Dec 1, 2022 •

edited

Loading