-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU usage #440
Comments
How much memory is used? |
|
Thanks for helping out, will send across an export soon. Could you tell me what sort of summary you're after? I.e. just the same stuff as I opened the issue with, or some more things. Neofetch:
|
Just the one on your Ryot dashboard Also what is your architecture? I remember someone reporting high cpu usage on arm64. |
Try updating Ryot to latest and see it it improves (i doubt it). What do you use to monitor your server? I don't have any monitoring on my server so I don't know what an ideal cpu footprint should look like. |
Will do. I'll give it a day at least after updating before making any judgement calls on it. Just emailed you the export, too. I'm using Grafana with Cadvisor and Prometheus. And then this dashboard template, along with pretty similar config to what they have in the readme (might have tweaked it a little bit for my setup). |
Can you tell me the versions of the prometheus, node-exporter, cadvisor, grafana? I suspect the dashboard template is outdated. It uses |
Grafana: grafana-oss:9.1.1 I've also put my version of the dashboard (admittedly I've deleted some of the panels I never used) into a gist. |
Thanks! |
Unfortunately the behavior hasn't changed since updating. I might be a bit slow to get back to this, but I'm thinking of spinning up a separate Ryot instance, importing specific sections at a time (e.g. movies, shows, books) and observing the results to see if I can narrow down further. |
Can you set env variable |
Mine was super quick! 23 seconds.
|
Then it is not the summary calculation that takes up CPU. I recommend you wait for another day and observe the logs around the time when the CPU spike happens and see which job is causing them. You can also change the timezone (via |
Another possible cause: do you have a lot of media in the watchlist and in progress collections? Also do you have a lot of explicitly monitored stuff (there's a filter for it in the list page)? |
Not a huge number there. Similarly for Explicitly Monitored, where the only thing that came up was 7 TV Shows. I've also imported my export into a separate, v2.0.0 instance of Ryot, just to see if it behaves the same. I'll keep an eye on it and slowly bump the version on that second instance until I start seeing the behavior. |
Cool thanks! |
Can you also attach usages of the database container? Maybe something is blocking there. |
I currently share the Postgres instance with a number of other services. I'll spin up a dedicated postgres instance so I can observe it more accurately. I'll post some screenshots in a day or two when there's a good amount of data there. |
@thespad your usage pattern seems fine to me. Those are just the daily cron jobs running. |
Running for 3-18 hours every day? That seems excessive. |
I suggest you do this (#440 (comment)) too and report back the times. |
This is now with v3.3.2
This is still going after 10 minutes, I'll let you know when it finally completes |
CPU usage (obviously) dropped to almost zero when I recreated the container to update the logging envs, and has not returned to the previous levels while running any of these jobs, which suggests that however long they take they are not (on their own) responsible for the high CPU usage. |
Pretty sure it's not running since it is just one HTTP and DB call. Logging must have been lost due to an error. I haven't really done good error reporting in the code. It would be great if you could associate the CPU spike to a specific job/event. Next time you observe a spike, just have a look at the time on the graph and then the corresponding logs of the Ryot container. |
Unfortunately the standard Linux DNS resolvers are bad and dumb and don't cache requests, so if you make 500 calls to a site, you're also making 500 DNS requests and it soon adds up. My workaround in the past has been a cron job to fetch the DNS records and add them to the container hosts file so that it doesn't have to do a lookup every time. Something like: #! /bin/bash
cp /etc/hosts /tmp/hosts.new && sed -i '/api.themoviedb.org/d' /tmp/hosts.new && /usr/bin/dig +short api.themoviedb.org | while read line; do $(echo "$line api.themoviedb.org") >> /tmp/hosts.new; done
echo "Last Updated $(date +'%Y-%m-%d %T')" > /config/dnsupdate.log |
Hmm this looks interesting @thespad. Do you think I should include this in the docker image? |
It's tricky because it's the kind of things people have Very Strong Opinions about, but personally my view is that if you know you're going to be making a ton of connections to a given domain it's probably wise to try and reduce the network load it generates. The TTL on the api.themoviedb.org record is the (IMO) very silly AWS default of 60 seconds, but in practice you could probably run the cron job every ~15 minutes and you'd probably be safe. Even if you ran it every minute you'd still only generate 1440 DNS queries a day instead of potentially millions. |
Yep I'm pretty sure which query is hogging the remaining cpu. I should get around to it by mid Feb. |
@ellsclytn Could you upgrade to the latest version and see if this has been fixed? |
Try setting env var |
I realised I've been running an older version of Postgres. I've upgraded to Postgres 16. I have |
Unfortunately it has continued even after the Postgres upgrade. Once the midnight jobs started CPU returned to ~10% and stayed there. Then when the next midnight rolled around it roughly doubled CPU usage. I'd post the logs except that they're quite big. 120MB for the debug logs in total. ~500KB if I exclude any lines mentioning "Progress update". I'll wait for #640 to release and try a few things with it. |
@ellsclytn I released it as |
I tried removing the number of monitored shows. There were more than I realised, but not many. It's now down to three. That's certainly helped, but it does appear the problem persists (and gets consistently worse with a higher monitoring count). The CPU usage seems to "stack" each time the midnight cron job fires off. I was able to confirm that setting As for logs, I haven't seen a great deal that jumped out at me. The only thing that seemed a little different now was that some shows have quite a large log output which I think coincided with some more recent changes where the webhook path changed for the Jellyfin integration. Example of large output:
I suspect this is unrelated though as the issue has been going on longer than this sort of data has been present. For each of the monitored shows, they all have I also don't have any notification systems configured (intentionally). Would I be right in saying Ryot will avoid trying to send any notifications if there is no notification system configured? |
Yep, you're right. I believe it is not actual metadata update that is causing the problem but the associated I haven't been able to come up with a satisfactory way to deal with this problem. But I will be focusing on this in the coming months.
Sending notifications is just a |
I wonder, would it be a complex change to allow the |
@ellsclytn I think I finally got it. The idle CPU usage is still a bit high (~8-15%) for my liking, but the spike seems to be no longer happening. Can you confirm on your instance too? |
Sure thing, I've just switched to |
I think this can be finally closed. Thanks everyone who helped me debug it! If anyone is interested in knowing what the problem was, I wrote a small write up on my blog. |
I've noticed that my Ryot container enters a state of high CPU usage daily.
The above shows CPU usage over a 48h period. It seems to recover temporarily around 00:00, which leads me to believe it's something happening within one of the daily background jobs, but I'm just not sure which one as of yet.
The biggest spike seems to happen around 03:23-03:26.
I looked at the logs (where I've set
RUST_LOG=ryot=trace,sea_orm=debug
), but there doesn't seem to be much to look at beyond a lot of SQL queries.This is the last few lines of logs between
2023-10-25T03:23:00
to2023-10-25T04:26:00
.I'm wondering if anyone is/has experienced similar, or might have some further leads I could explore.
The text was updated successfully, but these errors were encountered: