Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix send / remove timing bug in Dispatcher #6756

Closed
wants to merge 1 commit into from

Conversation

uforic
Copy link
Contributor

@uforic uforic commented May 29, 2018

Description

I believe there is a JavaScript worker timing bug in dispatcher.js. On our Flexport Sentry, we are seeing infrequent bugs like this:
Unable to get property 'send' of undefined or null reference, with a fuller stack trace:

TypeError: Unable to get property 'send' of undefined or null reference
  at e(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:517:481)
  at t.prototype.loadTile(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:231:2184)
  at t.prototype._loadTile(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:223:2168)
  at h(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:223:8194)
  at n(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:223:7400)
  at o(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:223:6566)
  at r(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:223:795)
  at this(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:521:839)
  at e(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:521:1045)
  at o(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:231:1210)
  at t(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:207:438)
  at n.onload(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:503:1103)

The TypeError is occurring on this line here.

It appears that the way this error is getting caused is:

  • In componentWillUnmount, the implementor calls "map.remove()"
  • A web worker running a task calls "Dispatcher.send", or a previously running broadcast call called asyncAll(Dispatcher.send(payload....
  • Dispatcher.send tries to access the 0th index of an empty array, because remove() has already been completed.
    TypeError.

Open to suggestions on how to improve this PR. One thing I'm curious about is:
Is it possible for Dispatcher.remove() to be called, but an async process then tries to call Dispatcher.send()?

It's totally possible that this is somehow user error, and that the user is somehow dispatching an action after having called map.remove(), but looking through the code (especially the Dispatcher broadcast function earlier in the file), it seems like like the answer is "yes".

Another question: Does anyone depend on the actor ID being returned? Will -1 have any negative downstream impact?

@jfirebaugh
Copy link
Contributor

Hi @uforic. Thanks for the contribution! It looks to me like the proposed change here, while it will silence the TypeError, isn't really addressing the root cause, which is that something is attempting to send a message to to the worker after map.remove() is called. One of the things that map.remove() is supposed to do is cancel any pending async work, so that this doesn't happen.

I don't think it's related broadcast per se, because broadcast calls actor.send for all actors synchronously. The asyncAll you see there is to handle the fact that actor.send is itself async, so broadcast needs to collect all the async results before calling cb.

Can you dig a little more into why something is attempting to send even after map.remove()? From the stack trace I suspect it's related to tile loading, but with uglified function names in the stack it's hard to tell more.

@uforic
Copy link
Contributor Author

uforic commented Jun 15, 2018

Howdy @jfirebaugh , thanks for getting back to us. Apologies for the delayed response. We were able to get some more information on this one.

Another raw stack with occasional function names coming through:

TypeError: Unable to get property 'send' of undefined or null reference
  at i.prototype.send(/packs/ClientApp.e9f7c8958d58adb5da75.js:33:654324)
  at t.prototype.loadTile(/packs/ClientApp.e9f7c8958d58adb5da75.js:33:370297)
  at t.prototype._loadTile(/packs/ClientApp.e9f7c8958d58adb5da75.js:33:350246)
  at t.prototype._addTile(/packs/ClientApp.e9f7c8958d58adb5da75.js:33:356189)
  at t.prototype._updateRetainedTiles(/packs/ClientApp.e9f7c8958d58adb5da75.js:33:355406)
  at t.prototype.update(/packs/ClientApp.e9f7c8958d58adb5da75.js:33:354762)
  at Anonymous function(/packs/ClientApp.e9f7c8958d58adb5da75.js:33:349075)
  at o.prototype.fire(/packs/ClientApp.e9f7c8958d58adb5da75.js:33:657088)
  at o.prototype.fire(/packs/ClientApp.e9f7c8958d58adb5da75.js:33:657292)
  at Anonymous function(/packs/ClientApp.e9f7c8958d58adb5da75.js:33:369255)
  at s(/packs/ClientApp.e9f7c8958d58adb5da75.js:33:339754)
  at n.onload(/packs/ClientApp.e9f7c8958d58adb5da75.js:33:650379)

I dug into a couple frames, and got their line nos in GitHub:

Bottom of stack:

data = JSON.parse(xhr.response);

Next up:

loadTileJSON(this._options, this.map._transformRequest, (err, tileJSON) => {

Totally possible I'm misdiagnosing things, but it seems like the onload to getJSON is getting called after map.remove has been called, which is then triggering send action via loadTile.

Let me know what you think! Perhaps those promises need to be cancellable? I notice that the return type of getJSON is never stored off anywhere.

@dmnd
Copy link

dmnd commented Jun 15, 2018

Here's another stacktrace from production that I painstakingly un-uglified:

raw stacktrace:

TypeError: this.actors[r] is undefined
  at this(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:478:483)
  at this(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:232:2342)
  at this(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:224:2066)
  at n(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:224:7032)
  at this(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:224:6135)
  at r(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:224:5335)
  at i(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:224:715)
  at this(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:482:829)
  at e(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:482:1028)
  at o(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:232:1158)
  at t(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:212:438)
  at func(/home/app/flexport/node_modules/mapbox-gl/dist/mapbox-gl.js:464:1103)
  at wrapped(/home/app/flexport/node_modules/raven-js/src/raven.js:360:1)

un-uglified:

this.actors[targetID].send(type, data, callback);

tile.workerID = this.dispatcher.send('loadTile', params, done.bind(this));

return this._source.loadTile(tile, callback);

this._loadTile(tile, this._tileLoaded.bind(this, tile, tileID.key, tile.state));

let tile = this._addTile(tileID);

const retain = this._updateRetainedTiles(idealTileIDs, zoom);

this.update(this.transform);

listener.call(this, event);

this.fire(new Event('data', {dataType: 'source', sourceDataType: 'metadata'}));

callback(null, result);

callback(null, data);

As @uforic says, it looks like the map sends an XHR before it is removed. But the xhr isn't aborted, and so when it completes, the callbacks try to update retained tiles on the map, causing the tile to be loaded, but there are no actors and so ultimately we see this exception.

So either abort needs to be called on the xhr, or instead the xhr callbacks need to check if their map (or whatever) still exists before they run.

@jfirebaugh
Copy link
Contributor

Thanks, that was perfect detail. I opened a PR that should fix this: #6826.

@dmnd
Copy link

dmnd commented Jun 16, 2018

Excellent, thanks for the fix @jfirebaugh! 🙇‍♂️

@uforic
Copy link
Contributor Author

uforic commented Jun 18, 2018

Awesome, thanks!

@jfirebaugh jfirebaugh closed this Jun 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants