Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use two times less workers for better performance #3565

Merged
merged 1 commit into from
Nov 8, 2016

Conversation

mourner
Copy link
Member

@mourner mourner commented Nov 8, 2016

Based on our conversation with Chrome engineers, using anything more than num_cores / 2 actually makes things worse because those additional workers usually get heavily deprioritized to make room for other stuff (for Chrome and the system). You can see which workers are deprioritized based on their background color in the tracing profile, and also comparing the wall time to CPU time there.

This may explain our earlier observations where worker messages would suddenly get deprioritized and start lagging heavily, such as on the point drag example.

Per further recommendations from the Chrome engineers, we need to follow-up with a PR that sets the worker count to 1 for mobile devices, although we first need to come up with a good way to detect a mobile phone (preferably not involving user agent parsing).

cc @jfirebaugh @ansis @lucaswoj

@mourner mourner added the performance ⚡ Speed, stability, CPU usage, memory usage, or power usage label Nov 8, 2016
@jfirebaugh
Copy link
Contributor

I'd like to test out using a single worker everywhere. When I was testing startup time, using only a single worker was significantly faster (for startup performance) than using more than one. If the price of starting a worker is so high, and the marginal benefit even after it warms up is low, we may be better off with just one, even after startup. It would make stuff like cross-source label placement significantly easier.

@ansis
Copy link
Contributor

ansis commented Nov 8, 2016

Sounds good to me, but can we benchmark this before merging?

@mourner
Copy link
Member Author

mourner commented Nov 8, 2016

Updated the formula to use log2(num_cores), so that 8 cores = 3 workers, 4 cores = 2 workers.

@jfirebaugh I suggest reducing the number of workers for now (because they need to be reduced in any case), and following up with workerCount=1 experiments after, because this will need a dedicated non-trivial benchmark to measure reliably.

@mourner
Copy link
Member Author

mourner commented Nov 8, 2016

I performed a very basic test were I load the full-screen debug page and then zoom out with a button 4 times (each time waiting for tile loads). Here are the results (the first number is the total tile loading times):

workers: 1, 16379ms, 26 tiles, 629ms avg
workers: 2, 11683ms, 26 tiles, 449ms avg
workers: 3, 11041ms, 26 tiles, 424ms avg
workers: 4, 10983ms, 26 tiles, 422ms avg
workers: 7, 12365ms, 26 tiles, 475ms avg

You can see that a single worker performs significantly worse than other numbers in this scenario. Too much workers (7) is also apparently not helping. The best number seems to be around 3 or 4 on my machine.

I'll now try a more stressful test with a slow flyTo across US.

@mourner
Copy link
Member Author

mourner commented Nov 8, 2016

A 30s flyTo across the US:

workers: 1, 49517ms, 173 tiles, 286ms avg
workers: 2, 35241ms, 173 tiles, 203ms avg
workers: 3, 35527ms, 170 tiles, 208ms avg
workers: 4, 36199ms, 178 tiles, 203ms avg
workers: 7, 37491ms, 173 tiles, 216ms avg

Which seems to also support the conclusion that 1 worker is a bad idea (at least for the desktop), but there's not much sense in using more than 2-3.

@mourner
Copy link
Member Author

mourner commented Nov 8, 2016

Made a more reliable benchmark by enabling browser caching (to exclude tile loading times), flying in 20s for more stressful test, and counting loaded tiles and aborted tiles separately (with aborted not being counted towards total):

workers: 1, 25491ms loading times total, 110 tiles loaded, 59 aborted, 231ms avg
workers: 2, 25609ms loading times total, 158 tiles loaded, 22 aborted, 162ms avg
workers: 3, 24501ms loading times total, 165 tiles loaded, 18 aborted, 148ms avg
workers: 4, 22015ms loading times total, 162 tiles loaded, 21 aborted, 135ms avg
workers: 7, 24804ms loading times total, 162 tiles loaded, 22 aborted, 153ms avg

Notice that 1 worker result has 3x the amount of aborted tiles. Same conclusion — 2-4 is good (3 seems best), 1 is bad, 7 is unnecessary.

@jingsam
Copy link
Contributor

jingsam commented Nov 8, 2016

Nice tests! More tests on Firefox, safari would be more convincing.

@mourner
Copy link
Member Author

mourner commented Nov 8, 2016

Firefox:

1 worker:  35780ms loading times total, 76 tiles loaded, 79 aborted, 470ms avg
2 workers: 40201ms loading times total, 131 tiles loaded, 62 aborted, 306ms avg
3 workers: 32992ms loading times total, 143 tiles loaded, 64 aborted, 230ms avg
4 workers: 35392ms loading times total, 149 tiles loaded, 56 aborted, 237ms avg
7 workers: 29685ms loading times total, 127 tiles loaded, 71 aborted, 233ms avg

Safari:

1 worker:  35668ms loading times total, 122 tiles loaded, 46 aborted, 292ms avg
2 workers: 22300ms loading times total, 164 tiles loaded, 15 aborted, 135ms avg
3 workers: 22687ms loading times total, 181 tiles loaded, 22 aborted, 125ms avg
4 workers: 23202ms loading times total, 195 tiles loaded, 16 aborted, 118ms avg
7 workers: 24651ms loading times total, 183 tiles loaded, 34 aborted, 134ms avg

@mourner
Copy link
Member Author

mourner commented Nov 8, 2016

Looking at benchmarks in all browsers, settled on num_cores / 2. I think this is good to merge.

@mb12
Copy link

mb12 commented Nov 8, 2016

@mourner The average numbers reported here assume that worker spent 0ms for aborted tiles. Is this a valid assumption?

@mourner
Copy link
Member Author

mourner commented Nov 8, 2016

@mb12 the average numbers are calculated from non-aborted tile numbers + loading times. It doesn't make sense to calculate average for aborted and loaded tiles combined because lower average can be a consequence of more aborted tiles (which is the more important metric here).

@mb12
Copy link

mb12 commented Nov 8, 2016

Thanks for the clarification. I misunderstood how 'loading times' were computed.

@mourner mourner merged commit 9a2f5e7 into master Nov 8, 2016
@mourner mourner deleted the better-worker-count branch November 8, 2016 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance ⚡ Speed, stability, CPU usage, memory usage, or power usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants