-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use critical section for batch counters in producer default_router #684
base: master
Are you sure you want to change the base?
Conversation
@@ -30,6 +31,7 @@ type defaultRouter struct { | |||
lastChangeTimestamp int64 | |||
msgCounter uint32 | |||
cumulativeBatchSize uint32 | |||
sync.RWMutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that we only ever acquire write access; can this be a regular sync.Mutex
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm concerned that the use of a mutex will incur overhead and affect performance, especially under concurrent publishing. I've added a bench test to get quantitative numbers in #693
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zzzming , I've attempted to address the races when updating the default router state in a lock-free manner here: Please take a look and see if you are satisfied: #694
Comparing the results from the parallel default router bench test we can see that the use of a mutex has an impact on performance (old==this PR, new==#694):
name old time/op new time/op delta
DefaultRouterParallel 27.4ns ± 3% 14.8ns ± 2% -45.79% (p=0.000 n=10+8)
DefaultRouterParallel-2 35.2ns ± 1% 41.9ns ± 0% +18.96% (p=0.000 n=9+7)
DefaultRouterParallel-4 49.4ns ± 6% 44.1ns ± 8% -10.84% (p=0.000 n=10+9)
DefaultRouterParallel-8 58.8ns ± 6% 53.2ns ± 3% -9.53% (p=0.000 n=10+8)
DefaultRouterParallel-16 69.7ns ± 3% 51.3ns ± 0% -26.43% (p=0.000 n=10+8)
EDIT: I can't explain why DefaultRouterParallel-2
is slower on the branch that doesn't use the mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: the implementation in #694 was modified slightly and now DefaultRouterParallel-2
is no longer slower in the lock-free implementation. New bench comparison is below:
name old time/op new time/op delta
DefaultRouterParallel 27.4ns ± 3% 14.9ns ± 6% -45.51% (p=0.000 n=10+9)
DefaultRouterParallel-2 35.2ns ± 1% 33.7ns ± 8% ~ (p=0.161 n=9+9)
DefaultRouterParallel-4 49.4ns ± 6% 28.8ns ± 4% -41.66% (p=0.000 n=10+9)
DefaultRouterParallel-8 58.8ns ± 6% 36.3ns ± 1% -38.32% (p=0.000 n=10+8)
DefaultRouterParallel-16 69.7ns ± 3% 39.5ns ±21% -43.38% (p=0.000 n=10+10)
Motivation
A group of batch counters are not properly critical section protected in the producer's default_router. Read and update of these counters should be protected as a whole, instead of individually synchronized using Atomic access. Multiple goroutines can access and update these counters at the same time. Basically it is not thread safe.
Modifications
The change use mutex lock to protect the entire counters during the batch partition decision making process.
Minor improvement includes to evaluate message max size, byte size, and batch window on a needed basis.
Verifying this change
Does this pull request potentially affect one of the following parts:
Documentation