-
-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shutdown race in EpollEventLoop #9362
Comments
Hmm is this only in the last release ? Using a synchronized sounds like not a good idea |
Not just in last release, but I noticed it while cleaning up races in gRPC. Are you open to busy polling in shutdown? |
Yes this sounds like a better idea |
Can we continue polling until the event loop successfully sets wakenUp to 1, after which point it closes the fds? If the event loop sets wakenUp to 1, then wakeup()'s getAndSet will become a "noop." |
@ejona86 We can't because wakeup may have already won the race, but not yet written to the fd. I thought about this more over the weekend, and I can't see a way to do with without both an "enter" and "exit" atomic op. The loop doesn't know if there is an in-progress or upcoming write to the fd. |
We know when we lose the race as wakeup will be 1. In that case we wait on the eventfd for the event we know is coming and then we "try again." Except we could choose not to set wakeup to 0, so I guess we just leave wakeup as-is and are now safe to close the eventfd. Note that means that wakeup then becomes "load bearing." Before it was an optimization. Now it is critical and must not use eventfd when wakeup is 1. |
losing the race means that wakenUp will be 1, but since the eventfd is in non-blocking mode, we would have to poll it until it returns eagain. I dislike the idea of making close required to read from the fd, so I changed the logic to be a tri-state. |
Motivation @carl-mastrangelo discovered a non-hypothetical race condition during EpollEventLoop shutdown where wakeup writes can complete after the eventFd has been closed and subsequently reassigned by the kernel. This fix is an alternative to netty#9388 which uses eventfd_read to hopefully close the gap completely, and doesn't involve an additional CAS during wakeup. Modification After waking from epollWait, CAS the wakenUp atomic from 0 to 1. The times that a value of 1 is encountered here (CAS fail) correspond 1-1 with prior CAS wins by other threads in the wakeup(...) method, which correspond 1-1 with eventfd_write(1) calls (even if the most recent write is yet to happen). Thus we can locally maintain a precise total count of those writes (eventFdWriteCount) which will be constant while the EL is awake - no further writes can happen until we reset wakenUp back to 0. Since eventFd is a counter, when shutting down we just need to read from it until the sum of read values equals the known total write count. At this point all the writes must have completed and no more can happen. Result Race condition eliminated. Fixes netty#9362
Motivation This is another iteration of netty#9476. Modifications Instead of maintaining a count of all writes performed and then using reads during shutdown to ensure all are accounted for, just set a flag after each write and don't reset it until the corresponding event has been returned from epoll_wait. This requires that while a write is still pending we don't reset wakenUp, i.e. continue to block writes from the wakeup() method. Result Race condition eliminated. Fixes netty#9362
…9535) Motivation This is another iteration of #9476. Modifications Instead of maintaining a count of all writes performed and then using reads during shutdown to ensure all are accounted for, just set a flag after each write and don't reset it until the corresponding event has been returned from epoll_wait. This requires that while a write is still pending we don't reset wakenUp, i.e. continue to block writes from the wakeup() method. Result Race condition eliminated. Fixes #9362
Motivation This is another iteration of #9476. Modifications Instead of maintaining a count of all writes performed and then using reads during shutdown to ensure all are accounted for, just set a flag after each write and don't reset it until the corresponding event has been returned from epoll_wait. This requires that while a write is still pending we don't reset wakenUp, i.e. continue to block writes from the wakeup() method. Result Race condition eliminated. Fixes #9362 Co-authored-by: Norman Maurer <norman_maurer@apple.com>
…9586) Motivation This is another iteration of #9476. Modifications Instead of maintaining a count of all writes performed and then using reads during shutdown to ensure all are accounted for, just set a flag after each write and don't reset it until the corresponding event has been returned from epoll_wait. This requires that while a write is still pending we don't reset wakenUp, i.e. continue to block writes from the wakeup() method. Result Race condition eliminated. Fixes #9362 Co-authored-by: Norman Maurer <norman_maurer@apple.com>
(netty#9586) Motivation This is another iteration of netty#9476. Modifications Instead of maintaining a count of all writes performed and then using reads during shutdown to ensure all are accounted for, just set a flag after each write and don't reset it until the corresponding event has been returned from epoll_wait. This requires that while a write is still pending we don't reset wakenUp, i.e. continue to block writes from the wakeup() method. Result Race condition eliminated. Fixes netty#9362 Co-authored-by: Norman Maurer <norman_maurer@apple.com>
…9586) (#9612) Motivation This is another iteration of #9476. Modifications Instead of maintaining a count of all writes performed and then using reads during shutdown to ensure all are accounted for, just set a flag after each write and don't reset it until the corresponding event has been returned from epoll_wait. This requires that while a write is still pending we don't reset wakenUp, i.e. continue to block writes from the wakeup() method. Result Race condition eliminated. Fixes #9362 Co-authored-by: Norman Maurer <norman_maurer@apple.com>
In 4.1.38, there is an fd race in EpollEventLoop. The loop thread, while trying to shutdown, closes the eventFd. The outside thread that calls shutdownGracefully, tries to wake up the loop to notice that it wants it to shutdown. There is a race where the wakeup() call can write to an eventfd that has been closed already, resulting in the possibility of writing to the wrong fd.
The sequence of events loops like:
The easiest way I can see to fix this race is to use a synchronized block in wakeup(), after the thread has won the wakenUp = 1 race:
Additionally, in EpollEventLoop.cleanup, acquire a lock before closing the eventFd, synchronize the close:
This is kind of unfortunate, but the race makes it possible that another FD is opened after closing the eventFd, and the wakeUp call writes to the wrong FD. The main downside of my proposed patch is probably the risk of deadlocks (because I dont know what locks I hold), as well as some performance penalty.
Ideas on how to not pay this cost are welcome, but I think the danger is enough to merit fixing this.
cc @ejona86 @normanmaurer
The text was updated successfully, but these errors were encountered: