-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All node removal in LokiJS takes a very long time (affects page creation and already cached builds) #17935
Comments
#11747 - I'd say this fits into that umbrella issue |
Just did a little bit of looking into it - the stale nodes it finds are the SitePage entries for every single page. They're stale as the pages aren't made during the sourceNodes step. |
After investigating why my computer kept shutting down due to the watchdog timeout, I found that for some reason Node appears to be blocking every thread when creating pages / sourcing nodes when using LokiJS. When running a profile, this slowdown appears to be the problem. Luckily it doesn't happen when profiling as the profiler must allow the watchdog to tick. It did appear however that 40 of the 55 seconds that createPages took, is caused by removing old nodes from LokiJS. The main slowdown here is that it entirely re-sorts the index whenever a node is removed from a collection. I wonder if it'd make sense to have a "bulk changes" option in LokiJS, so that it can and and remove all of the stale / old nodes at once. |
Hiya! This issue has gone quiet. Spooky quiet. 👻 We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here. If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open! As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing! Thanks for being a part of the Gatsby community! 💪💜 |
This is still an issue :) |
CC @pvdz |
Sounds like a big oh problem that scales really badly and would not be too hard to fix or work around (on our side). Can you point to the source? I'm very interested in this but currently knee deep into investigating a different scaling slowdown (in the run queries step). |
@me4502 while I still haven't been able to look into this, and not sure when that is, we are considering to deprecate Loki and merge whatever is still relevant into master somehow. This is not a "tomorrow" thing, don't worry. The recent improvements for filter caching seem to take away a lot of the pain that Loki tried to help with and the dual system is getting a bit of a burden to maintain. From what I understand you have quite some knowledge in the whole area, in particular regarding scaling. Can you tell me (from your pov) what Loki brings to the table at this point and how it helps at scale? |
@pvdz So currently we've switched away from Loki due to this issue, but still see noticable declines in query performance. I'm not specifically sure why, but Loki was able to run queries probably 8x faster than Sift is able to right now. The reason we switched away is that this issue made using a cache impossible, and image optimisation with sharp running every build is a more significant slowdown than not using Loki. |
@me4502 interesting, can you share the filter(s) and/or sort(s) being used in this case? We intend to gather some general stats on how people are using filters and sorts so we can implement better heuristics and myself I have no clue what common cases are. So right now it's only building up "JIT" caches for flat filters. |
@pvdz We've changed our site a fair bit since then, but the queries we use a lot are: Filtering for list contents with And recently we switched to using |
This should now apply to any key as long as the filter is "flat" (one property per object of the filter) and an Anyways, that's a bit OT. I plan to take a look at Loki, no ETA. |
A quick check shows that it's being destroyed by InsertionSort, in turn QuickSort. I think Loki has never been tested at scale, I don't know what's going on. This is the I'll have a closer look later but it seems to me like this is the problem during the This is |
FYI: We are planning to drop Loki entirely in the next few weeks. The fixes I have and am merging for filters resolve most issues that Loki aimed to work around. Beyond that, Loki doesn't scale well either and it's causing a dual path in our codebase with a high maintenance burden. Additionally, Loki usage is relatively low, because it's never been formally supported or announced. All in all, enough reasons to drop it. If you're still highly depending on Loki please reach out to me directly, or anywhere else, so we can discuss the use case and triage why we can't solve the problem without Loki. |
Thanks, we stopped using Loki temporarily due to this issue, however with improvements to the current query system we’re seeing less of a need to. Running queries isn’t the slow point of our builds now so it’s not as necessary |
@me4502 I'm very happy to hear that :) From what I understand you have large sites. I'd be happy to triage some problems with you some time. No promises but I guess it can't hurt either. Ping me if you're interested in that. |
Description
I've been experiencing random major slowdowns of
Source and transform nodes
that don't appear to conform to the other posted issues. After managing to get it reproduced with the debugger running - I've found that deleting stale nodes from LokiJS is the cause of the slowdown. When a large number of nodes become stale, this will take a very very long time, about 120ms per node. When there are 20000 stale nodes, this easily adds up to ~2400s. This is about the amount of time I'm seeing this step take whenever stale nodes need cleaning up.If there are a lot of stale nodes, would it make sense to just clear all of them and start again? In this case it is quite literally over 99% of the nodes that are stale.
Steps to reproduce
Expected result
I'd expect a large quantity of stale nodes to not impact build time more than not using the cache.
Actual result
Stale node cleanup takes ~40 minutes.
Environment
Here's a Chrome CPU profile that shows a tiny part of this - I stopped it after a bit to keep file size down, but the profile would just continue like this for 40 minutes. https://drive.google.com/file/d/1QmkUEt3dNUYsRRv4Z1Y6NpCiuS39mfdK/view?usp=sharing
The text was updated successfully, but these errors were encountered: