Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(driver): handle promise rejections from sendCommand #6739

Merged
merged 8 commits into from
Dec 13, 2018

Conversation

exterkamp
Copy link
Member

Summary
As of #6347 driver's sendCommand can reject its Promise if the underlying protocol command times out. This new rejection path isn't handled in some invocations of sendCommand leading to unhandled promise rejections. This would present as us "missing" a PROTOCOL_TIMEOUT in LR, leading to the graceful resolve fix push; however, upon investigation, the cause of missing those timeouts was the timeouts causing an unhandled promise rejection which freaks out LR!
This should catch all unhandled promise rejections as of right now. I did a lot of testing to try to track down unhandled promise rejection, but I might have missed some, so if you find any lmk.

Related Issues/PRs

@@ -893,8 +894,8 @@ class Driver {
// These can 'race' and that's OK.
// We don't want to wait for Page.navigate's resolution, as it can now
// happen _after_ onload: https://crbug.com/768961
this.sendCommand('Page.enable');
this.sendCommand('Emulation.setScriptExecutionDisabled', {value: disableJS});
await this.sendCommand('Page.enable');
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay to await these? The comments above them talk about racing like this might be purposefully not-awaited.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug is marked as fixed now. If it truly is, can you also remove the comments?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the big bug was definitely about not awaiting Page.navigate, but why are we needing to await these? is it just about unhandled rejections?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could still timeout on any protocol command. what happens in the underlying connection dies? we should avoid any possible unhandled rejection.

But await each command isn't the only way to avoid unhandled rejections here. wdyt of just tacking a catch on Page.enable, or combining the two commands like this:

await Promise.all([
this.sendCommand('Page.enable'),
this.sendCommand('Emulation.setScriptExecutionDisabled', {value: disableJS}),
])

so that it's one less event loop iteration ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right I'm onboard with that, but I'm asking because there are then other things we might try instead of serialling awaiting like this._sendAndSilentlyIgnoreErrors or something :)

It looks like we used to be just fine waiting on these in sequence though, and the Page.navigate change just moved all of 'em to be together, so we don't need to go crazy avoiding the await #3413

I'm a tad spooked by this given that it was a real nasty bug to hunt down, but I think this makes sense if the bug really is fixed. +1 to removing the comments if we're making this change

Copy link
Collaborator

@connorjclark connorjclark Dec 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_innerSendCommand is specifically for when you don't want to timeout.

I'm not sure if _innerSendCommand can ever reject. You could try setting a breakpoint just before, then kill Chrome, then hit play, and see what happens ... But if I read the code correctly for how the connection works, it won't ever result in a rejected promise - just a stalled one.

Copy link
Collaborator

@patrickhulce patrickhulce Dec 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or I guess sendCommand is never the real problem it's always the PROTOCOL_TIMEOUT that's the real problem?

You guys beat me to the punch you're already on it 👍 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I'm not following the conversation correctly here, but _innerSendCommand can reject via connection:

return callback.resolve(Promise.resolve().then(_ => {
if (object.error) {
log.formatProtocol('method <= browser ERR', {method: callback.method}, 'error');
throw LHError.fromProtocolMessage(callback.method, object.error);
}

(throwing inside Promise.resolve().then(_ => {/* throw */}) rejects it, then callback.resolve() with a rejected promise rejects as well. We could probably make that a lot more understandable with an explicit callback.resolve or callback.reject, as well as using an async function for the inner part)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so do we want to add an empty catch to the this._innerSendCommand('Page.navigate', {url}); line? Or are we assuming it's so rare that it's ok that a unhandledRejection listener can catch it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it seems like this never happens, should we just let sleeping dragons lie?

@exterkamp exterkamp changed the title core(driver): Handle promise rejections from sendCommand core(driver): handle promise rejections from sendCommand Dec 6, 2018
@@ -893,8 +894,8 @@ class Driver {
// These can 'race' and that's OK.
// We don't want to wait for Page.navigate's resolution, as it can now
// happen _after_ onload: https://crbug.com/768961
this.sendCommand('Page.enable');
this.sendCommand('Emulation.setScriptExecutionDisabled', {value: disableJS});
await this.sendCommand('Page.enable');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug is marked as fixed now. If it truly is, can you also remove the comments?

}

/** @type {(() => void)} */
let cancel = () => {
throw new Error('_waitForCPUIdle.cancel() called before it was defined');
};
const promise = new Promise((resolve, reject) => {
checkForQuiet(this, resolve);
checkForQuiet(this, resolve, reject);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if reject is called twice? is that possible here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, once a promise has been resolved/rejected all further calls will be noops

lighthouse-core/gather/driver.js Outdated Show resolved Hide resolved
lighthouse-core/gather/driver.js Outdated Show resolved Hide resolved
@@ -893,8 +894,8 @@ class Driver {
// These can 'race' and that's OK.
// We don't want to wait for Page.navigate's resolution, as it can now
// happen _after_ onload: https://crbug.com/768961
this.sendCommand('Page.enable');
this.sendCommand('Emulation.setScriptExecutionDisabled', {value: disableJS});
await this.sendCommand('Page.enable');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the big bug was definitely about not awaiting Page.navigate, but why are we needing to await these? is it just about unhandled rejections?

lighthouse-core/gather/driver.js Outdated Show resolved Hide resolved
@@ -893,8 +894,8 @@ class Driver {
// These can 'race' and that's OK.
// We don't want to wait for Page.navigate's resolution, as it can now
// happen _after_ onload: https://crbug.com/768961
this.sendCommand('Page.enable');
this.sendCommand('Emulation.setScriptExecutionDisabled', {value: disableJS});
await this.sendCommand('Page.enable');
Copy link
Collaborator

@patrickhulce patrickhulce Dec 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or I guess sendCommand is never the real problem it's always the PROTOCOL_TIMEOUT that's the real problem?

You guys beat me to the punch you're already on it 👍 😄

Copy link
Member

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3413 also provides a repro step. Updated for the current cli flags, that would be yarn start ----emulated-form-factor=none --throttling-method=provided http://example.com --verbose, run against this PR and not seeing a long timeout pause.

However the main solution from #3413 seems to just be not waiting on Page.navigate. This still doesn't do that, so that shouldn't be a problem even if the underlying crbug isn't actually fixed.

lighthouse-core/gather/driver.js Outdated Show resolved Hide resolved
@@ -893,8 +894,8 @@ class Driver {
// These can 'race' and that's OK.
// We don't want to wait for Page.navigate's resolution, as it can now
// happen _after_ onload: https://crbug.com/768961
this.sendCommand('Page.enable');
this.sendCommand('Emulation.setScriptExecutionDisabled', {value: disableJS});
await this.sendCommand('Page.enable');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so do we want to add an empty catch to the this._innerSendCommand('Page.navigate', {url}); line? Or are we assuming it's so rare that it's ok that a unhandledRejection listener can catch it?

lighthouse-core/gather/driver.js Outdated Show resolved Hide resolved
this.sendCommand('Page.enable');
this.sendCommand('Emulation.setScriptExecutionDisabled', {value: disableJS});
await this.sendCommand('Page.enable');
await this.sendCommand('Emulation.setScriptExecutionDisabled', {value: disableJS});
// No timeout needed for Page.navigate. See #6413.
this._innerSendCommand('Page.navigate', {url});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, after talking this through for a little while, we think we do want to incorporate the Page.navigate errors into the promise chain. It's a little ugly, so if anyone has any other structure ideas, please chime in :)

Basically it would be something like

const waitforPageNavigateCmd = this._innerSendCommand('Page.navigate', {url});
// ...
let pageWaiter = Promise.resolve();
if (waitForNavigated) {
  pageWaiter = this._waitForFrameNavigated();
} else if (waitForLoad) {
  // ...
  pageWaiter = this._waitForFullyLoaded(/* ... */);
}

await Promise.all([
  waitforPageNavigateCmd,
  pageWaiter,
]);
  • If Page.navigate has no problem, as before it doesn't block or interfere with observing the page load
  • If Page.navigate throws, it wins the Promise.all with a rejection, which leads to lighthouse exit.

The main issue I see (besides reduced readability) is that we don't clean up the _waitForFullyLoaded machinery. This is a fatal error so it doesn't matter in most clients, but are we going to have to actually clean up to support when LH is run as a module?

Copy link
Member

@brendankenny brendankenny Dec 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also during error testing we discovered that _waitForFrameNavigated should probably also have a timeout instead of blindly waiting on the Page.frameNavigated event.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we discovered that _waitForFrameNavigated should probably also have a timeout

WDYT about having an option to this.once that does the timeout bit?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that we don't clean up the _waitForFullyLoaded machinery

why don't we just await waitForPageNavigateCmd as the ultimate very last thing? the only thing we want to happen is get the errors into the chain, right? if we do it as the very last thing, then it can't prevent other cleanup

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about having an option to this.once that does the timeout bit?

yeah, good idea. We have at least one or two places we will happily wait forever on an event.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though the service worker one I'm thinking of only waits forever on about:blank, which adding an _waitForFrameNavigated timeout will fix :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we just await waitForPageNavigateCmd as the ultimate very last thing? the only thing we want to happen is get the errors into the chain, right? if we do it as the very last thing, then it can't prevent other cleanup

This would definitely be easier to follow.

The ordering will be a little wonky (I'd imagine a Page.navigate failure would end up usually failing on another command within _waitForFullyLoaded or by hitting the 45 second timeout), but ordering is already an issue with Promise.all() (and without cancelling _waitForFullyLoaded), and we've already all agreed that Page.navigate rarely if ever fails :)

Copy link
Member

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just the nit, but otherwise LGTM!

lighthouse-core/gather/driver.js Outdated Show resolved Hide resolved
}

if (waitForLoad) {
} else if (waitForLoad) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -918,6 +911,9 @@ class Driver {
maxWaitMs);
}

// Awaiting so that errors can be properly processed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: while this comment makes total sense in the context of this PR, I can definitely see myself coming back in a year and being like "what?" maybe a link to this PR or a tad more explanation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call. Maybe even just something like // Bring any 'Page.navigate' errors back into the promise chain. and a link back here.

The comment above the Page.navigate command isn't terribly helpful either, but I don't have a great suggestion :)

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % nit!

@@ -918,6 +911,9 @@ class Driver {
maxWaitMs);
}

// Awaiting so that errors can be properly processed.
await waitforPageNavigateCmd;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's nothing in _endNetworkStatusMonitoring we need, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's nothing in _endNetworkStatusMonitoring we need, right?

it returns the finalUrl after redirects, which is used for updating passContext (and eventually the artifact).

A little too convoluted but somewhat difficult to fix since networkRecords don't entirely exist by that point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants