Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

n-api: detect deadlocks in thread-safe function (take two) #32860

Conversation

gabrielschulhof
Copy link
Contributor

@gabrielschulhof gabrielschulhof commented Apr 15, 2020

uv_thread_self() works on Windows only for threads created using
uv_thread_start() because libuv does not use GetCurrentThreadId()
for threads that were created otherwise.

Thus we must use a thread-local static boolean which we set to true
only on JS threads to distinguish JS threads from secondary threads,
thereby preventing deadlocks.

Signed-off-by: Gabriel Schulhof gabriel.schulhof@intel.com

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. node-api Issues and PRs related to the Node-API. labels Apr 15, 2020
@gabrielschulhof gabrielschulhof marked this pull request as draft April 15, 2020 04:04
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@gabrielschulhof gabrielschulhof marked this pull request as ready for review April 15, 2020 06:15
@gabrielschulhof
Copy link
Contributor Author

@bnoordhuis @addaleax @gireeshpunathil @himself65 this is an alternative to #32823 based on #32823 (comment).

Copy link
Member

@himself65 himself65 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if CI passes

src/node_api.cc Outdated Show resolved Hide resolved
Copy link
Member

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I meant, yes. :-)

@addaleax
Copy link
Member

@gabrielschulhof Doesn’t this potentially do the wrong thing? We don’t want to know whether this is a JS thread, we want to know whether it’s the thread on which the loop runs to which this ThreadSafeFunction is associated with? I can see how it’s unlikely that that would pose a problem in practice, but still.

I definitely have a preference for #32823 here, and to me it seems like Ben’s original request for something that doesn’t create “invalid” uv_thread_ts has already been addressed.

@gabrielschulhof
Copy link
Contributor Author

@addaleax with this PR, the TSFN would be overly cautious. If you called a TSFN created on one JS thread from another JS thread, it would return napi_would_deadlock, which is incorrect, because one JS thread can definitely wait on another without deadlocking, however undesirable that may be. OTOH, if we allow one JS thread to wait on the other, you could get into a deadly embrace between two JS threads as they wait on one another. So, in a sense, we may want to return napi_would_deadlock when calling from any JS thread to any JS thread, because we cannot eliminate the probability, however unlikely, that a deadlock between at least two threads would ensue.

Of course, the correct solution is to not call napi_call_threadsafe_function() with napi_tsfn_blocking from any JS thread.

@gabrielschulhof
Copy link
Contributor Author

@addaleax also, in practice, as you also point out, the napi_would_deadlock is very unlikely to be returned for a JS-thread-to-another-JS-thread call. Most often by far (AFAICT) it will be returned for a JS-thread-to-the-same-JS-thread call. So, from a certain perspective, this modification is actually safer, because it detects the deadlock as reported in #32615 but it also prevents deadlocks resulting from JS-thread-to-another-JS-thread calls.

I will add this observation to the documentation surrounding napi_would_deadlock.

@nodejs-github-bot
Copy link
Collaborator

@addaleax
Copy link
Member

@gabrielschulhof Yeah, I guess my position is more based on the fact that thread_local variables usually are code smells.

What if an embedder decides to move running one Isolate/event loop combination from one thread to another? That probably not a common scenario, but it is for example something that was brought up in discussions that I have participated in.

Tbh, if you want to figure out whether you’re currently on a JS thread, I’d check Isolate::GetCurrent() != nullptr, similar to what we do in LowMemoryNotification() in util.h.

@gabrielschulhof
Copy link
Contributor Author

@addaleax checking v8::Isolate::GetCurrent() != nullptr sounds good. I'll make the change.

@gabrielschulhof
Copy link
Contributor Author

@addaleax ... until we can have a v8::Isolate on one thread, and the rest of the environment on another 😕 Well, hopefully not anytime soon 🤞

@gabrielschulhof
Copy link
Contributor Author

@addaleax if environments can migrate between threads as per the discussion in which you participated then, with #32823 behaviour would return to the pre-deadlock-check code whenever the environment was migrated. So, this PR would be the only solution.

@gabrielschulhof
Copy link
Contributor Author

Using v8::Isolate::GetCurrent() actually makes for the nicest diff of them all. The deadlock detection now uses @bnoordhuis' idea for a thread-local variable but via v8::Isolate, and is reduced to

$ git diff 96eceb7 -- src/node_api.cc

diff --git a/src/node_api.cc b/src/node_api.cc
index fad9cf72a9..89ec03b088 100644
--- a/src/node_api.cc
+++ b/src/node_api.cc
@@ -154,6 +154,14 @@ class ThreadSafeFunction : public node::AsyncResource {
         !is_closing) {
       if (mode == napi_tsfn_nonblocking) {
         return napi_queue_full;
+      } else if (v8::Isolate::GetCurrent() != nullptr) {
+        // Return `napi_would_deadlock` because waiting on the queue from a JS
+        // thread will prevent that same thread from removing items from the
+        // queue thereby causing deadlock. Deadlock can also be caused by one JS
+        // thread calling the thread-safe function of another JS thread because
+        // if two JS threads call each other's thread-safe functions blockingly
+        // when both their queues are full they will both deadlock.
+        return napi_would_deadlock;
       }
       cond->Wait(lock);
     }

where 96eceb7 is the commit before the deadlock detection was introduced.

@gabrielschulhof
Copy link
Contributor Author

@bnoordhuis @himself65 @addaleax please take another look!

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@gabrielschulhof
Copy link
Contributor Author

@addaleax I also updated the comment accompanying the return of napi_would_deadlock to explain the two kinds of deadlock we're seeking to avoid.

gabrielschulhof pushed a commit that referenced this pull request Apr 19, 2020
We introduce status `napi_would_deadlock` to be used as a return status
by `napi_call_threadsafe_function` if the call is made with
`napi_tsfn_blocking` on the main thread and the queue is full.

Fixes: #32615
Signed-off-by: Gabriel Schulhof <gabriel.schulhof@intel.com>
PR-URL: #32860
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Zeyu Yang <himself65@outlook.com>
@gabrielschulhof
Copy link
Contributor Author

Landed in d26ca06.

gabrielschulhof pushed a commit to gabrielschulhof/node that referenced this pull request Apr 20, 2020
We introduce status `napi_would_deadlock` to be used as a return status
by `napi_call_threadsafe_function` if the call is made with
`napi_tsfn_blocking` on the main thread and the queue is full.

Fixes: nodejs#32615
Signed-off-by: Gabriel Schulhof <gabriel.schulhof@intel.com>
PR-URL: nodejs#32860
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Zeyu Yang <himself65@outlook.com>
@gabrielschulhof gabrielschulhof deleted the tsfn-fix-win32-thread-self-thread_local branch April 21, 2020 02:58
BethGriggs pushed a commit that referenced this pull request Apr 27, 2020
We introduce status `napi_would_deadlock` to be used as a return status
by `napi_call_threadsafe_function` if the call is made with
`napi_tsfn_blocking` on the main thread and the queue is full.

Fixes: #32615
Signed-off-by: Gabriel Schulhof <gabriel.schulhof@intel.com>
PR-URL: #32860
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Zeyu Yang <himself65@outlook.com>
@BethGriggs BethGriggs mentioned this pull request Apr 27, 2020
targos pushed a commit that referenced this pull request Apr 28, 2020
We introduce status `napi_would_deadlock` to be used as a return status
by `napi_call_threadsafe_function` if the call is made with
`napi_tsfn_blocking` on the main thread and the queue is full.

Fixes: #32615
Signed-off-by: Gabriel Schulhof <gabriel.schulhof@intel.com>
PR-URL: #32860
Backport-PR-URL: #32948
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Zeyu Yang <himself65@outlook.com>
BethGriggs added a commit that referenced this pull request Apr 28, 2020
Notable changes:

- deps: upgrade openssl sources to 1.1.1g (Hassaan Pasha)
  [#32971](#32971)
- doc: add juanarbol as collaborator (Juan José Arboleda)
  [#32906](#32906)
- http: doc deprecate abort and improve docs (Robert Nagy)
  [#32807](#32807)
- module: do not warn when accessing `__esModule` of unfinished exports
  (Anna Henningsen) [#33048](#33048)
- n-api: detect deadlocks in thread-safe function (Gabriel Schulhof)
  [#32860](#32860)
- src: deprecate embedder APIs with replacements (Anna Henningsen)
  [#32858](#32858)
- stream:
  - don't emit end after close (Robert Nagy)
    [#33076](#33076)
  - don't wait for close on legacy streams (Robert Nagy)
    [#33058](#33058)
  - pipeline should only destroy un-finished streams (Robert Nagy)
    [#32968](#32968)

PR-URL: #33103
BethGriggs added a commit that referenced this pull request Apr 28, 2020
Notable changes:

- deps: upgrade openssl sources to 1.1.1g (Hassaan Pasha)
  [#32971](#32971)
- doc: add juanarbol as collaborator (Juan José Arboleda)
  [#32906](#32906)
- http: doc deprecate abort and improve docs (Robert Nagy)
  [#32807](#32807)
- module: do not warn when accessing `__esModule` of unfinished exports
  (Anna Henningsen) [#33048](#33048)
- n-api: detect deadlocks in thread-safe function (Gabriel Schulhof)
  [#32860](#32860)
- src: deprecate embedder APIs with replacements (Anna Henningsen)
  [#32858](#32858)
- stream:
  - don't emit end after close (Robert Nagy)
    [#33076](#33076)
  - don't wait for close on legacy streams (Robert Nagy)
    [#33058](#33058)
  - pipeline should only destroy un-finished streams (Robert Nagy)
    [#32968](#32968)
- vm: add importModuleDynamically option to compileFunction (Gus Caplan)
  [#32985](#32985)

PR-URL: #33103
BethGriggs added a commit that referenced this pull request Apr 29, 2020
Notable changes:

- deps: upgrade openssl sources to 1.1.1g (Hassaan Pasha)
  [#32971](#32971)
- doc: add juanarbol as collaborator (Juan José Arboleda)
  [#32906](#32906)
- http: doc deprecate abort and improve docs (Robert Nagy)
  [#32807](#32807)
- module: do not warn when accessing `__esModule` of unfinished exports
  (Anna Henningsen) [#33048](#33048)
- n-api: detect deadlocks in thread-safe function (Gabriel Schulhof)
  [#32860](#32860)
- src: deprecate embedder APIs with replacements (Anna Henningsen)
  [#32858](#32858)
- stream:
  - don't emit end after close (Robert Nagy)
    [#33076](#33076)
  - don't wait for close on legacy streams (Robert Nagy)
    [#33058](#33058)
  - pipeline should only destroy un-finished streams (Robert Nagy)
    [#32968](#32968)
- vm: add importModuleDynamically option to compileFunction (Gus Caplan)
  [#32985](#32985)

PR-URL: #33103
BethGriggs added a commit that referenced this pull request Apr 29, 2020
Notable changes:

- deps: upgrade openssl sources to 1.1.1g (Hassaan Pasha)
  [#32971](#32971)
- doc: add juanarbol as collaborator (Juan José Arboleda)
  [#32906](#32906)
- http: doc deprecate abort and improve docs (Robert Nagy)
  [#32807](#32807)
- module: do not warn when accessing `__esModule` of unfinished exports
  (Anna Henningsen) [#33048](#33048)
- n-api: detect deadlocks in thread-safe function (Gabriel Schulhof)
  [#32860](#32860)
- src: deprecate embedder APIs with replacements (Anna Henningsen)
  [#32858](#32858)
- stream:
  - don't emit end after close (Robert Nagy)
    [#33076](#33076)
  - don't wait for close on legacy streams (Robert Nagy)
    [#33058](#33058)
  - pipeline should only destroy un-finished streams (Robert Nagy)
    [#32968](#32968)
- vm: add importModuleDynamically option to compileFunction (Gus Caplan)
  [#32985](#32985)

PR-URL: #33103
targos pushed a commit that referenced this pull request Apr 30, 2020
We introduce status `napi_would_deadlock` to be used as a return status
by `napi_call_threadsafe_function` if the call is made with
`napi_tsfn_blocking` on the main thread and the queue is full.

Fixes: #32615
Signed-off-by: Gabriel Schulhof <gabriel.schulhof@intel.com>
PR-URL: #32860
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Zeyu Yang <himself65@outlook.com>
addaleax added a commit to addaleax/node that referenced this pull request May 18, 2020
This reverts commit d26ca06 because
it breaks running the tests in debug mode, as
`v8::Isolate::GetCurrent()` is not allowed if no `Isolate` is active
on the current thread.

Refs: nodejs#33276
Refs: nodejs#32860
BridgeAR pushed a commit to BridgeAR/node that referenced this pull request May 23, 2020
This reverts commit d26ca06 because
it breaks running the tests in debug mode, as
`v8::Isolate::GetCurrent()` is not allowed if no `Isolate` is active
on the current thread.

Refs: nodejs#33276
Refs: nodejs#32860

PR-URL: nodejs#33453
Reviewed-By: Daniel Bevenius <daniel.bevenius@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Gabriel Schulhof <gabriel.schulhof@intel.com>
codebytere pushed a commit that referenced this pull request Jun 18, 2020
This reverts commit d26ca06 because
it breaks running the tests in debug mode, as
`v8::Isolate::GetCurrent()` is not allowed if no `Isolate` is active
on the current thread.

Refs: #33276
Refs: #32860

PR-URL: #33453
Reviewed-By: Daniel Bevenius <daniel.bevenius@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Gabriel Schulhof <gabriel.schulhof@intel.com>
codebytere pushed a commit that referenced this pull request Jun 18, 2020
This reverts commit d26ca06 because
it breaks running the tests in debug mode, as
`v8::Isolate::GetCurrent()` is not allowed if no `Isolate` is active
on the current thread.

Refs: #33276
Refs: #32860

PR-URL: #33453
Reviewed-By: Daniel Bevenius <daniel.bevenius@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Gabriel Schulhof <gabriel.schulhof@intel.com>
codebytere pushed a commit that referenced this pull request Jun 30, 2020
This reverts commit d26ca06 because
it breaks running the tests in debug mode, as
`v8::Isolate::GetCurrent()` is not allowed if no `Isolate` is active
on the current thread.

Refs: #33276
Refs: #32860

PR-URL: #33453
Reviewed-By: Daniel Bevenius <daniel.bevenius@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Gabriel Schulhof <gabriel.schulhof@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. node-api Issues and PRs related to the Node-API. semver-minor PRs that contain new features and should be released in the next minor version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants