Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use critical section for batch counters in producer default_router #684

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zzzming
Copy link
Contributor

@zzzming zzzming commented Dec 12, 2021

Motivation

A group of batch counters are not properly critical section protected in the producer's default_router. Read and update of these counters should be protected as a whole, instead of individually synchronized using Atomic access. Multiple goroutines can access and update these counters at the same time. Basically it is not thread safe.

Modifications

The change use mutex lock to protect the entire counters during the batch partition decision making process.
Minor improvement includes to evaluate message max size, byte size, and batch window on a needed basis.

Verifying this change

  • [ v] Make sure that the change passes the CI checks.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): ( no)
  • The public API: (no)
  • The schema: ( no)
  • The default values of configurations: (no)
  • The wire protocol: (no)

Documentation

  • Does this pull request introduce a new feature? (/ no)
  • If yes, how is the feature documented? (not applicable )

@@ -30,6 +31,7 @@ type defaultRouter struct {
lastChangeTimestamp int64
msgCounter uint32
cumulativeBatchSize uint32
sync.RWMutex
Copy link
Contributor

@dferstay dferstay Dec 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we only ever acquire write access; can this be a regular sync.Mutex?

Copy link
Contributor

@dferstay dferstay Dec 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm concerned that the use of a mutex will incur overhead and affect performance, especially under concurrent publishing. I've added a bench test to get quantitative numbers in #693

Copy link
Contributor

@dferstay dferstay Dec 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zzzming , I've attempted to address the races when updating the default router state in a lock-free manner here: Please take a look and see if you are satisfied: #694

Comparing the results from the parallel default router bench test we can see that the use of a mutex has an impact on performance (old==this PR, new==#694):

name                      old time/op  new time/op  delta
DefaultRouterParallel     27.4ns ± 3%  14.8ns ± 2%  -45.79%  (p=0.000 n=10+8)
DefaultRouterParallel-2   35.2ns ± 1%  41.9ns ± 0%  +18.96%  (p=0.000 n=9+7)
DefaultRouterParallel-4   49.4ns ± 6%  44.1ns ± 8%  -10.84%  (p=0.000 n=10+9)
DefaultRouterParallel-8   58.8ns ± 6%  53.2ns ± 3%   -9.53%  (p=0.000 n=10+8)
DefaultRouterParallel-16  69.7ns ± 3%  51.3ns ± 0%  -26.43%  (p=0.000 n=10+8)

EDIT: I can't explain why DefaultRouterParallel-2 is slower on the branch that doesn't use the mutex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: the implementation in #694 was modified slightly and now DefaultRouterParallel-2 is no longer slower in the lock-free implementation. New bench comparison is below:

name                      old time/op  new time/op  delta
DefaultRouterParallel     27.4ns ± 3%  14.9ns ± 6%  -45.51%  (p=0.000 n=10+9)
DefaultRouterParallel-2   35.2ns ± 1%  33.7ns ± 8%     ~     (p=0.161 n=9+9)
DefaultRouterParallel-4   49.4ns ± 6%  28.8ns ± 4%  -41.66%  (p=0.000 n=10+9)
DefaultRouterParallel-8   58.8ns ± 6%  36.3ns ± 1%  -38.32%  (p=0.000 n=10+8)
DefaultRouterParallel-16  69.7ns ± 3%  39.5ns ±21%  -43.38%  (p=0.000 n=10+10)

@wolfstudy wolfstudy modified the milestones: v0.8.0, 0.9.0 Feb 16, 2022
@freeznet freeznet modified the milestones: v0.9.0, v0.10.0 Jul 4, 2022
@RobertIndie RobertIndie modified the milestones: v0.10.0, v0.11.0 Mar 27, 2023
@RobertIndie RobertIndie modified the milestones: v0.11.0, v0.12.0 Jul 4, 2023
@RobertIndie RobertIndie modified the milestones: v0.12.0, v0.13.0 Jan 10, 2024
@RobertIndie RobertIndie modified the milestones: v0.13.0, v0.14.0 Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants