Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutdown race in EpollEventLoop #9362

Closed
carl-mastrangelo opened this issue Jul 13, 2019 · 7 comments · Fixed by #9535 or #9586
Closed

Shutdown race in EpollEventLoop #9362

carl-mastrangelo opened this issue Jul 13, 2019 · 7 comments · Fixed by #9535 or #9586
Assignees
Milestone

Comments

@carl-mastrangelo
Copy link
Member

In 4.1.38, there is an fd race in EpollEventLoop. The loop thread, while trying to shutdown, closes the eventFd. The outside thread that calls shutdownGracefully, tries to wake up the loop to notice that it wants it to shutdown. There is a race where the wakeup() call can write to an eventfd that has been closed already, resulting in the possibility of writing to the wrong fd.

The sequence of events loops like:

  1. T1: eventLoop.shutdownGracefully()
  2. T1: state = ST_SHUTDOWN
  3. T2: reads state == ST_SHUTDOWN
  4. T1: Enters EpollEventLoop.wakeup()
  5. T2: Exits loop, enters EpollEventLoop.cleanup()
  6. T2: Closes eventFd.
  7. T1: Wins race to by setting wakenUp to 1
  8. T1: Calls Native.eventFdWrite(eventFd.intValue(), 1L);

The easiest way I can see to fix this race is to use a synchronized block in wakeup(), after the thread has won the wakenUp = 1 race:

    @Override
    protected void wakeup(boolean inEventLoop) {
        if (!inEventLoop && WAKEN_UP_UPDATER.getAndIncrement(this) == 0) {
            // write to the evfd which will then wake-up epoll_wait(...)
            synchronized (eventFd) {
                if (!isShutdown()) {
                    Native.eventFdWrite(eventFd.intValue(), 1L);
                }
            }
        }
    }

Additionally, in EpollEventLoop.cleanup, acquire a lock before closing the eventFd, synchronize the close:

            synchronized (eventFd) {
                try {
                    eventFd.close();
                } catch (IOException e) {
                    logger.warn("Failed to close the event fd.", e);
                }
            }

This is kind of unfortunate, but the race makes it possible that another FD is opened after closing the eventFd, and the wakeUp call writes to the wrong FD. The main downside of my proposed patch is probably the risk of deadlocks (because I dont know what locks I hold), as well as some performance penalty.

Ideas on how to not pay this cost are welcome, but I think the danger is enough to merit fixing this.

cc @ejona86 @normanmaurer

@normanmaurer
Copy link
Member

Hmm is this only in the last release ? Using a synchronized sounds like not a good idea

@carl-mastrangelo
Copy link
Member Author

Not just in last release, but I noticed it while cleaning up races in gRPC.

Are you open to busy polling in shutdown?

@normanmaurer
Copy link
Member

Yes this sounds like a better idea

@ejona86
Copy link
Member

ejona86 commented Jul 15, 2019

Can we continue polling until the event loop successfully sets wakenUp to 1, after which point it closes the fds? If the event loop sets wakenUp to 1, then wakeup()'s getAndSet will become a "noop."

@carl-mastrangelo
Copy link
Member Author

@ejona86 We can't because wakeup may have already won the race, but not yet written to the fd.

I thought about this more over the weekend, and I can't see a way to do with without both an "enter" and "exit" atomic op. The loop doesn't know if there is an in-progress or upcoming write to the fd.

@ejona86
Copy link
Member

ejona86 commented Jul 15, 2019

We can't because wakeup may have already won the race, but not yet written to the fd.

We know when we lose the race as wakeup will be 1. In that case we wait on the eventfd for the event we know is coming and then we "try again." Except we could choose not to set wakeup to 0, so I guess we just leave wakeup as-is and are now safe to close the eventfd.

Note that means that wakeup then becomes "load bearing." Before it was an optimization. Now it is critical and must not use eventfd when wakeup is 1.

@carl-mastrangelo
Copy link
Member Author

losing the race means that wakenUp will be 1, but since the eventfd is in non-blocking mode, we would have to poll it until it returns eagain. I dislike the idea of making close required to read from the fd, so I changed the logic to be a tri-state.

njhill added a commit to njhill/netty that referenced this issue Aug 22, 2019
Motivation

@carl-mastrangelo discovered a non-hypothetical race condition during
EpollEventLoop shutdown where wakeup writes can complete after the
eventFd has been closed and subsequently reassigned by the kernel.

This fix is an alternative to netty#9388 which uses eventfd_read to hopefully
close the gap completely, and doesn't involve an additional CAS during
wakeup.

Modification

After waking from epollWait, CAS the wakenUp atomic from 0 to 1. The
times that a value of 1 is encountered here (CAS fail) correspond 1-1
with prior CAS wins by other threads in the wakeup(...) method, which
correspond 1-1 with eventfd_write(1) calls (even if the most recent
write is yet to happen).

Thus we can locally maintain a precise total count of those writes
(eventFdWriteCount) which will be constant while the EL is awake - no
further writes can happen until we reset wakenUp back to 0.

Since eventFd is a counter, when shutting down we just need to read from
it until the sum of read values equals the known total write count. At
this point all the writes must have completed and no more can happen.

Result

Race condition eliminated. Fixes netty#9362
njhill added a commit to njhill/netty that referenced this issue Sep 4, 2019
Motivation

This is another iteration of netty#9476.

Modifications

Instead of maintaining a count of all writes performed and then using
reads during shutdown to ensure all are accounted for, just set a flag
after each write and don't reset it until the corresponding event has
been returned from epoll_wait.

This requires that while a write is still pending we don't reset
wakenUp, i.e. continue to block writes from the wakeup() method.

Result

Race condition eliminated. Fixes netty#9362
normanmaurer pushed a commit that referenced this issue Sep 5, 2019
…9535)

Motivation

This is another iteration of #9476.

Modifications

Instead of maintaining a count of all writes performed and then using
reads during shutdown to ensure all are accounted for, just set a flag
after each write and don't reset it until the corresponding event has
been returned from epoll_wait.

This requires that while a write is still pending we don't reset
wakenUp, i.e. continue to block writes from the wakeup() method.

Result

Race condition eliminated. Fixes #9362
@normanmaurer normanmaurer added this to the 4.1.40.Final milestone Sep 5, 2019
normanmaurer added a commit that referenced this issue Sep 20, 2019
Motivation

This is another iteration of #9476.

Modifications

Instead of maintaining a count of all writes performed and then using
reads during shutdown to ensure all are accounted for, just set a flag
after each write and don't reset it until the corresponding event has
been returned from epoll_wait.

This requires that while a write is still pending we don't reset
wakenUp, i.e. continue to block writes from the wakeup() method.

Result

Race condition eliminated. Fixes #9362

Co-authored-by: Norman Maurer <norman_maurer@apple.com>
normanmaurer added a commit that referenced this issue Sep 23, 2019
…9586)

Motivation

This is another iteration of #9476.

Modifications

Instead of maintaining a count of all writes performed and then using
reads during shutdown to ensure all are accounted for, just set a flag
after each write and don't reset it until the corresponding event has
been returned from epoll_wait.

This requires that while a write is still pending we don't reset
wakenUp, i.e. continue to block writes from the wakeup() method.

Result

Race condition eliminated. Fixes #9362

Co-authored-by: Norman Maurer <norman_maurer@apple.com>
njhill added a commit to njhill/netty that referenced this issue Sep 25, 2019
(netty#9586)

Motivation

This is another iteration of netty#9476.

Modifications

Instead of maintaining a count of all writes performed and then using
reads during shutdown to ensure all are accounted for, just set a flag
after each write and don't reset it until the corresponding event has
been returned from epoll_wait.

This requires that while a write is still pending we don't reset
wakenUp, i.e. continue to block writes from the wakeup() method.

Result

Race condition eliminated. Fixes netty#9362


Co-authored-by: Norman Maurer <norman_maurer@apple.com>
normanmaurer added a commit that referenced this issue Sep 27, 2019
…9586) (#9612)


Motivation

This is another iteration of #9476.

Modifications

Instead of maintaining a count of all writes performed and then using
reads during shutdown to ensure all are accounted for, just set a flag
after each write and don't reset it until the corresponding event has
been returned from epoll_wait.

This requires that while a write is still pending we don't reset
wakenUp, i.e. continue to block writes from the wakeup() method.

Result

Race condition eliminated. Fixes #9362


Co-authored-by: Norman Maurer <norman_maurer@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment