-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - fix: release batches lock #2840
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there unit test?
I don't know if there was one already but I didn't add a new one. The diff looks much more complicated than the change is, as I just added { } around the code that actually uses the guard. I don't think this change benefits from a unit test because the compiler guarantees no direct changes because the guard was never used outside of the new scope. If there is a weird side effect on code running in another thread that was being prevented by holding the mutex longer it won't be caught by a unit test. Regardless, this code path is tested by our end to end tests. |
To clarify, batches were already copied out of the mutex protected queue into a local vec before sending. It's just that the lock on the queue was held around |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
bors r+ |
scopes mutex guard in `PartitionProducer` so that mutex is released immediately instead being held while batches are sent. Means that `TopicProducer::send()` returns quickly instead of blocking while batches are sent. This change gives almost a 10x improvement on producer throughput when producing to the cloud. Note the regression in latency is beacause all messages get "sent" right away where as previously messages blocked on `send` while waiting for other batches to go out. **Config** ```yaml--- matrix_name: ExampleMatrix topic_name: benchmarking-baingmcvadyojtn current_profile: cloud timestamp: "2022-12-01T00:37:15.840379872Z" worker_timeout: secs: 10000 nanos: 0 num_samples: 100 duration_between_samples: secs: 0 nanos: 300000000 num_records_per_producer_worker_per_batch: 1000 producer_batch_size: 16000 producer_queue_size: 100 producer_linger: secs: 0 nanos: 10000000 producer_server_timeout: secs: 5 nanos: 0 producer_compression: none consumer_max_bytes: 64000 num_concurrent_producer_workers: 1 num_concurrent_consumers_per_partition: 1 num_partitions: 1 record_size: 1000 record_key_allocation_strategy: NoKey ``` **Per Record E2E Latency** |Variable| p0.00 | p0.50 | p0.95 | p0.99 | p1.00 | |--------|---------|---------|---------|---------|---------| |Latency |119.808ms|622.591ms|924.671ms|1.005567s|1.060863s| **Throughput (Total Produced Bytes / Time)** | Variable | Min | Median | Max | Description | |-------------------|---------|---------|---------|--------------------------------------------------------------------------------------------| |Producer Throughput|2.165mb/s|2.658mb/s|3.052mb/s| First Produced Message <-> Last Produced Message | |Consumer Throughput|1.154mb/s|1.389mb/s|1.563mb/s|First Consumed Message (First Time Consumed) <-> Last Consumed Message (First Time Consumed)| |Combined Throughput|0.942mb/s|0.989mb/s|1.137mb/s| First Produced Message <-> Last Consumed Message (First Time Consumed) | **Comparision with previous results: cloud @ 2022-12-01 00:24:33.995062094 UTC** | Variable |Change|Previous | Current |P-Value| |-------------------|------|---------|---------|-------| | Latency |Worse |405.427ms|1.003788s|0.00000| |Producer Throughput|Better|0.288mb/s|2.626mb/s|0.00000| |Consumer Throughput|Better|0.249mb/s|1.384mb/s|0.00000| |Combined Throughput|Better|0.240mb/s|0.996mb/s|0.00000|
Pull request successfully merged into master. Build succeeded: |
scopes mutex guard in
PartitionProducer
so that mutex is released immediately instead being held while batches are sent.Means that
TopicProducer::send()
returns quickly instead of blocking while batches are sent.This change gives almost a 10x improvement on producer throughput when producing to the cloud. Note the regression in latency is beacause all messages get "sent" right away where as previously messages blocked on
send
while waiting for other batches to go out.Config
Per Record E2E Latency
Throughput (Total Produced Bytes / Time)
Comparision with previous results: cloud @ 2022-12-01 00:24:33.995062094 UTC