All node removal in LokiJS takes a very long time (affects page creation and already cached builds) #17935

me4502 · 2019-09-27T04:31:41Z

Description

I've been experiencing random major slowdowns of Source and transform nodes that don't appear to conform to the other posted issues. After managing to get it reproduced with the debugger running - I've found that deleting stale nodes from LokiJS is the cause of the slowdown. When a large number of nodes become stale, this will take a very very long time, about 120ms per node. When there are 20000 stale nodes, this easily adds up to ~2400s. This is about the amount of time I'm seeing this step take whenever stale nodes need cleaning up.

If there are a lot of stale nodes, would it make sense to just clear all of them and start again? In this case it is quite literally over 99% of the nodes that are stale.

Steps to reproduce

Use LokiJS
Have a lot of nodes
Do whatever it is that causes nodes to become stale

Expected result

I'd expect a large quantity of stale nodes to not impact build time more than not using the cache.

Actual result

Stale node cleanup takes ~40 minutes.

Environment

  System:
    OS: macOS 10.15
    CPU: (12) x64 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
    Shell: 3.2.57 - /bin/bash
  Binaries:
    Node: 11.15.0 - ~/.nvm/versions/node/v11.15.0/bin/node
    Yarn: 1.17.3 - ~/.nvm/versions/node/v11.15.0/bin/yarn
    npm: 6.7.0 - ~/.nvm/versions/node/v11.15.0/bin/npm
  Languages:
    Python: 2.7.16 - /usr/local/bin/python
  Browsers:
    Chrome: 77.0.3865.90
    Safari: 13.0.1
  npmPackages:
    gatsby-plugin-svgr: ^2.0.2 => 2.0.2

Here's a Chrome CPU profile that shows a tiny part of this - I stopped it after a bit to keep file size down, but the profile would just continue like this for 40 minutes. https://drive.google.com/file/d/1QmkUEt3dNUYsRRv4Z1Y6NpCiuS39mfdK/view?usp=sharing

The text was updated successfully, but these errors were encountered:

me4502 · 2019-10-02T03:23:43Z

#11747 - I'd say this fits into that umbrella issue

me4502 · 2019-10-02T03:35:32Z

Just did a little bit of looking into it - the stale nodes it finds are the SitePage entries for every single page. They're stale as the pages aren't made during the sourceNodes step.

me4502 · 2019-10-04T04:37:59Z

After investigating why my computer kept shutting down due to the watchdog timeout, I found that for some reason Node appears to be blocking every thread when creating pages / sourcing nodes when using LokiJS. When running a profile, this slowdown appears to be the problem. Luckily it doesn't happen when profiling as the profiler must allow the watchdog to tick.

It did appear however that 40 of the 55 seconds that createPages took, is caused by removing old nodes from LokiJS. The main slowdown here is that it entirely re-sorts the index whenever a node is removed from a collection.

I wonder if it'd make sense to have a "bulk changes" option in LokiJS, so that it can and and remove all of the stale / old nodes at once.

gatsbot · 2019-11-07T17:27:35Z

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.

If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!

As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

me4502 · 2019-11-07T22:43:21Z

This is still an issue :)

vladar · 2019-12-25T08:34:18Z

CC @pvdz

pvdz · 2020-01-09T11:36:55Z

The main slowdown here is that it entirely re-sorts the index whenever a node is removed from a collection.

Sounds like a big oh problem that scales really badly and would not be too hard to fix or work around (on our side). Can you point to the source?

I'm very interested in this but currently knee deep into investigating a different scaling slowdown (in the run queries step).

pvdz · 2020-02-12T20:58:43Z

@me4502 while I still haven't been able to look into this, and not sure when that is, we are considering to deprecate Loki and merge whatever is still relevant into master somehow. This is not a "tomorrow" thing, don't worry. The recent improvements for filter caching seem to take away a lot of the pain that Loki tried to help with and the dual system is getting a bit of a burden to maintain.

From what I understand you have quite some knowledge in the whole area, in particular regarding scaling. Can you tell me (from your pov) what Loki brings to the table at this point and how it helps at scale?

me4502 · 2020-02-12T23:51:00Z

@pvdz So currently we've switched away from Loki due to this issue, but still see noticable declines in query performance. I'm not specifically sure why, but Loki was able to run queries probably 8x faster than Sift is able to right now.

The reason we switched away is that this issue made using a cache impossible, and image optimisation with sharp running every build is a more significant slowdown than not using Loki.

pvdz · 2020-02-13T14:00:32Z

@me4502 interesting, can you share the filter(s) and/or sort(s) being used in this case? We intend to gather some general stats on how people are using filters and sorts so we can implement better heuristics and myself I have no clue what common cases are. So right now it's only building up "JIT" caches for flat filters.

me4502 · 2020-02-13T23:25:46Z

@pvdz We've changed our site a fair bit since then, but the queries we use a lot are:

Filtering for list contents with in,
Multi-value sorting
Skip/Limit for pagination
A few elemMatch queries

And recently we switched to using eq: $id in a lot of places due to the other performance boost you did for lookups by node ID.

pvdz · 2020-02-14T17:36:54Z

other performance boost you did for lookups by node ID.

This should now apply to any key as long as the filter is "flat" (one property per object of the filter) and an eq. We do plan to update this heuristic to support more operations (neq, in, etc) as well as extending to filters that are not flat. Ultimately, I guess, the index should be built up for any filter that is used more than once since the overhead of building up the query more than makes up for not having to iterate the entire set over and over again. Anyways, that's future work.

Anyways, that's a bit OT. I plan to take a look at Loki, no ETA.

pvdz · 2020-02-20T18:59:15Z

A quick check shows that it's being destroyed by InsertionSort, in turn QuickSort.

I think Loki has never been tested at scale, I don't know what's going on.

This is the .sort() that's killing perf. Both index.values as well as this.data is the full set of nodes. At 100k nodes that is an ever growing set. After an hour it reached 40k nodes here, and "run to point" does not quickly lead to the index.dirty line, so it's actually "stuck" in that sort. Intriguing.

I'll have a closer look later but it seems to me like this is the problem during the source and transform nodes step.

This is Collection.prototype.ensureIndex in node_modules/lokijs/src/lokijs.js
The Comparators.lt in node_modules/gatsby/src/db/loki/custom-comparators.js isn't doing anything weird tbh, just a bunch or comparisons on numbers. Should be fast.
Assuming the sort is stable, as is, this shouldn't be something for which a single sort would take more than a minute for 40k elements.
But I need to look further into it.

pvdz · 2020-04-22T11:42:56Z

FYI: We are planning to drop Loki entirely in the next few weeks. The fixes I have and am merging for filters resolve most issues that Loki aimed to work around. Beyond that, Loki doesn't scale well either and it's causing a dual path in our codebase with a high maintenance burden. Additionally, Loki usage is relatively low, because it's never been formally supported or announced.

All in all, enough reasons to drop it.

If you're still highly depending on Loki please reach out to me directly, or anywhere else, so we can discuss the use case and triage why we can't solve the problem without Loki.

me4502 · 2020-04-22T13:22:26Z

Thanks, we stopped using Loki temporarily due to this issue, however with improvements to the current query system we’re seeing less of a need to. Running queries isn’t the slow point of our builds now so it’s not as necessary

pvdz · 2020-04-23T08:52:41Z

@me4502 I'm very happy to hear that :)

From what I understand you have large sites. I'd be happy to triage some problems with you some time. No promises but I guess it can't hurt either. Ping me if you're interested in that.

This was referenced Sep 27, 2019

[fsevents bug] Stuck at "source and transform nodes" / "createPagesStatefully" on MacOS #17131

Closed

How we can remove sift.js #11184

Closed

me4502 changed the title ~~Stale node removal in LokiJS takes a very long time~~ All node removal in LokiJS takes a very long time (affects page creation and already cached builds) Oct 4, 2019

LekoArts added the type: question or discussion Issue discussing or asking a question about Gatsby label Oct 17, 2019

gatsbot bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Nov 7, 2019

me4502 added not stale and removed stale? Issue that may be closed soon due to the original author not responding any more. labels Nov 7, 2019

LekoArts added type: bug An issue or pull request relating to a bug in Gatsby and removed type: question or discussion Issue discussing or asking a question about Gatsby labels Nov 13, 2019

pvdz self-assigned this Feb 14, 2020

pvdz mentioned this issue May 7, 2020

chore(gatsby): Drop (experimental) support for Loki #23838

Merged

pieh closed this as completed in #23838 May 8, 2020

LekoArts removed the not stale label May 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All node removal in LokiJS takes a very long time (affects page creation and already cached builds) #17935

All node removal in LokiJS takes a very long time (affects page creation and already cached builds) #17935

me4502 commented Sep 27, 2019

me4502 commented Oct 2, 2019

me4502 commented Oct 2, 2019

me4502 commented Oct 4, 2019

gatsbot bot commented Nov 7, 2019

me4502 commented Nov 7, 2019

vladar commented Dec 25, 2019

pvdz commented Jan 9, 2020

pvdz commented Feb 12, 2020

me4502 commented Feb 12, 2020

pvdz commented Feb 13, 2020

me4502 commented Feb 13, 2020 •

edited

Loading

pvdz commented Feb 14, 2020

pvdz commented Feb 20, 2020 •

edited

Loading

pvdz commented Apr 22, 2020

me4502 commented Apr 22, 2020

pvdz commented Apr 23, 2020

All node removal in LokiJS takes a very long time (affects page creation and already cached builds) #17935

All node removal in LokiJS takes a very long time (affects page creation and already cached builds) #17935

Comments

me4502 commented Sep 27, 2019

Description

Steps to reproduce

Expected result

Actual result

Environment

me4502 commented Oct 2, 2019

me4502 commented Oct 2, 2019

me4502 commented Oct 4, 2019

gatsbot bot commented Nov 7, 2019

me4502 commented Nov 7, 2019

vladar commented Dec 25, 2019

pvdz commented Jan 9, 2020

pvdz commented Feb 12, 2020

me4502 commented Feb 12, 2020

pvdz commented Feb 13, 2020

me4502 commented Feb 13, 2020 • edited Loading

pvdz commented Feb 14, 2020

pvdz commented Feb 20, 2020 • edited Loading

pvdz commented Apr 22, 2020

me4502 commented Apr 22, 2020

pvdz commented Apr 23, 2020

me4502 commented Feb 13, 2020 •

edited

Loading

pvdz commented Feb 20, 2020 •

edited

Loading