Shutdown race in EpollEventLoop #9362

carl-mastrangelo · 2019-07-13T03:43:14Z

In 4.1.38, there is an fd race in EpollEventLoop. The loop thread, while trying to shutdown, closes the eventFd. The outside thread that calls shutdownGracefully, tries to wake up the loop to notice that it wants it to shutdown. There is a race where the wakeup() call can write to an eventfd that has been closed already, resulting in the possibility of writing to the wrong fd.

The sequence of events loops like:

T1: eventLoop.shutdownGracefully()
T1: state = ST_SHUTDOWN
T2: reads state == ST_SHUTDOWN
T1: Enters EpollEventLoop.wakeup()
T2: Exits loop, enters EpollEventLoop.cleanup()
T2: Closes eventFd.
T1: Wins race to by setting wakenUp to 1
T1: Calls Native.eventFdWrite(eventFd.intValue(), 1L);

The easiest way I can see to fix this race is to use a synchronized block in wakeup(), after the thread has won the wakenUp = 1 race:

    @Override
    protected void wakeup(boolean inEventLoop) {
        if (!inEventLoop && WAKEN_UP_UPDATER.getAndIncrement(this) == 0) {
            // write to the evfd which will then wake-up epoll_wait(...)
            synchronized (eventFd) {
                if (!isShutdown()) {
                    Native.eventFdWrite(eventFd.intValue(), 1L);
                }
            }
        }
    }

Additionally, in EpollEventLoop.cleanup, acquire a lock before closing the eventFd, synchronize the close:

            synchronized (eventFd) {
                try {
                    eventFd.close();
                } catch (IOException e) {
                    logger.warn("Failed to close the event fd.", e);
                }
            }

This is kind of unfortunate, but the race makes it possible that another FD is opened after closing the eventFd, and the wakeUp call writes to the wrong FD. The main downside of my proposed patch is probably the risk of deadlocks (because I dont know what locks I hold), as well as some performance penalty.

Ideas on how to not pay this cost are welcome, but I think the danger is enough to merit fixing this.

cc @ejona86 @normanmaurer

normanmaurer · 2019-07-13T05:33:26Z

Hmm is this only in the last release ? Using a synchronized sounds like not a good idea

carl-mastrangelo · 2019-07-13T05:41:36Z

Not just in last release, but I noticed it while cleaning up races in gRPC.

Are you open to busy polling in shutdown?

normanmaurer · 2019-07-13T05:44:50Z

Yes this sounds like a better idea

ejona86 · 2019-07-15T18:13:26Z

Can we continue polling until the event loop successfully sets wakenUp to 1, after which point it closes the fds? If the event loop sets wakenUp to 1, then wakeup()'s getAndSet will become a "noop."

carl-mastrangelo · 2019-07-15T21:36:44Z

@ejona86 We can't because wakeup may have already won the race, but not yet written to the fd.

I thought about this more over the weekend, and I can't see a way to do with without both an "enter" and "exit" atomic op. The loop doesn't know if there is an in-progress or upcoming write to the fd.

ejona86 · 2019-07-15T23:03:12Z

We can't because wakeup may have already won the race, but not yet written to the fd.

We know when we lose the race as wakeup will be 1. In that case we wait on the eventfd for the event we know is coming and then we "try again." Except we could choose not to set wakeup to 0, so I guess we just leave wakeup as-is and are now safe to close the eventfd.

Note that means that wakeup then becomes "load bearing." Before it was an optimization. Now it is critical and must not use eventfd when wakeup is 1.

carl-mastrangelo · 2019-07-17T22:11:05Z

losing the race means that wakenUp will be 1, but since the eventfd is in non-blocking mode, we would have to poll it until it returns eagain. I dislike the idea of making close required to read from the fd, so I changed the logic to be a tri-state.

@carl-mastrangelo

Motivation @carl-mastrangelo discovered a non-hypothetical race condition during EpollEventLoop shutdown where wakeup writes can complete after the eventFd has been closed and subsequently reassigned by the kernel. This fix is an alternative to netty#9388 which uses eventfd_read to hopefully close the gap completely, and doesn't involve an additional CAS during wakeup. Modification After waking from epollWait, CAS the wakenUp atomic from 0 to 1. The times that a value of 1 is encountered here (CAS fail) correspond 1-1 with prior CAS wins by other threads in the wakeup(...) method, which correspond 1-1 with eventfd_write(1) calls (even if the most recent write is yet to happen). Thus we can locally maintain a precise total count of those writes (eventFdWriteCount) which will be constant while the EL is awake - no further writes can happen until we reset wakenUp back to 0. Since eventFd is a counter, when shutting down we just need to read from it until the sum of read values equals the known total write count. At this point all the writes must have completed and no more can happen. Result Race condition eliminated. Fixes netty#9362

Motivation This is another iteration of netty#9476. Modifications Instead of maintaining a count of all writes performed and then using reads during shutdown to ensure all are accounted for, just set a flag after each write and don't reset it until the corresponding event has been returned from epoll_wait. This requires that while a write is still pending we don't reset wakenUp, i.e. continue to block writes from the wakeup() method. Result Race condition eliminated. Fixes netty#9362

…9535) Motivation This is another iteration of #9476. Modifications Instead of maintaining a count of all writes performed and then using reads during shutdown to ensure all are accounted for, just set a flag after each write and don't reset it until the corresponding event has been returned from epoll_wait. This requires that while a write is still pending we don't reset wakenUp, i.e. continue to block writes from the wakeup() method. Result Race condition eliminated. Fixes #9362

Motivation This is another iteration of #9476. Modifications Instead of maintaining a count of all writes performed and then using reads during shutdown to ensure all are accounted for, just set a flag after each write and don't reset it until the corresponding event has been returned from epoll_wait. This requires that while a write is still pending we don't reset wakenUp, i.e. continue to block writes from the wakeup() method. Result Race condition eliminated. Fixes #9362 Co-authored-by: Norman Maurer <norman_maurer@apple.com>

…9586) Motivation This is another iteration of #9476. Modifications Instead of maintaining a count of all writes performed and then using reads during shutdown to ensure all are accounted for, just set a flag after each write and don't reset it until the corresponding event has been returned from epoll_wait. This requires that while a write is still pending we don't reset wakenUp, i.e. continue to block writes from the wakeup() method. Result Race condition eliminated. Fixes #9362 Co-authored-by: Norman Maurer <norman_maurer@apple.com>

(netty#9586) Motivation This is another iteration of netty#9476. Modifications Instead of maintaining a count of all writes performed and then using reads during shutdown to ensure all are accounted for, just set a flag after each write and don't reset it until the corresponding event has been returned from epoll_wait. This requires that while a write is still pending we don't reset wakenUp, i.e. continue to block writes from the wakeup() method. Result Race condition eliminated. Fixes netty#9362 Co-authored-by: Norman Maurer <norman_maurer@apple.com>

…9586) (#9612) Motivation This is another iteration of #9476. Modifications Instead of maintaining a count of all writes performed and then using reads during shutdown to ensure all are accounted for, just set a flag after each write and don't reset it until the corresponding event has been returned from epoll_wait. This requires that while a write is still pending we don't reset wakenUp, i.e. continue to block writes from the wakeup() method. Result Race condition eliminated. Fixes #9362 Co-authored-by: Norman Maurer <norman_maurer@apple.com>

carl-mastrangelo mentioned this issue Jul 17, 2019

Only close eventFD in EpollEventLoop when it cannot be written to. #9388

Closed

ejona86 mentioned this issue Jul 23, 2019

Dataproc: RejectedExecutionException: event executor terminated googleapis/google-cloud-java#5810

Closed

njhill mentioned this issue Aug 17, 2019

Use eventfd_read to close EpollEventLoop shutdown/wakeup race #9476

Closed

njhill mentioned this issue Sep 4, 2019

Close eventfd shutdown/wakeup race by closely tracking epoll edges #9535

Merged

normanmaurer closed this as completed in #9535 Sep 5, 2019

normanmaurer added this to the 4.1.40.Final milestone Sep 5, 2019

normanmaurer assigned njhill Sep 5, 2019

normanmaurer modified the milestones: 4.1.40.Final, 4.1.41.Final Sep 12, 2019

normanmaurer mentioned this issue Sep 20, 2019

Close eventfd shutdown/wakeup race by closely tracking epoll edges #9586

Merged

normanmaurer modified the milestones: 4.1.41.Final, 4.1.42.Final Sep 23, 2019

njhill mentioned this issue Sep 26, 2019

Close eventfd shutdown/wakeup race by closely tracking epoll edges (#9586) #9612

Merged

SgtSilvio mentioned this issue Oct 19, 2020

Failure to shutdown NettyEventLoopProvider.releaseEventLoop() leaves bad executor in cache hivemq/hivemq-mqtt-client#445

Closed

ejona86 mentioned this issue May 4, 2021

ThreadSanitizer detected data race in EpollEventLoop #11219

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shutdown race in EpollEventLoop #9362

Shutdown race in EpollEventLoop #9362

carl-mastrangelo commented Jul 13, 2019

normanmaurer commented Jul 13, 2019

carl-mastrangelo commented Jul 13, 2019

normanmaurer commented Jul 13, 2019

ejona86 commented Jul 15, 2019

carl-mastrangelo commented Jul 15, 2019

ejona86 commented Jul 15, 2019

carl-mastrangelo commented Jul 17, 2019

Shutdown race in EpollEventLoop #9362

Shutdown race in EpollEventLoop #9362

Comments

carl-mastrangelo commented Jul 13, 2019

normanmaurer commented Jul 13, 2019

carl-mastrangelo commented Jul 13, 2019

normanmaurer commented Jul 13, 2019

ejona86 commented Jul 15, 2019

carl-mastrangelo commented Jul 15, 2019

ejona86 commented Jul 15, 2019

carl-mastrangelo commented Jul 17, 2019