Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gitea on FreeBSD enters a CPU busy loop (sched_yield()) before ultimately crashing #14815

Closed
2 of 6 tasks
rtyler opened this issue Feb 27, 2021 · 23 comments
Closed
2 of 6 tasks
Labels
issue/needs-feedback For bugs, we need more details. For features, the feature must be described in more detail

Comments

@rtyler
Copy link

rtyler commented Feb 27, 2021

  • Gitea version (or commit ref): 1.13.2
  • Git version: 2.30.1
  • Operating system: FreeBSD 12.2 with gitea from pkgs
  • Database (use [x]):
    • PostgreSQL
    • MySQL
    • MSSQL
    • SQLite
  • Can you reproduce the bug at https://try.gitea.io:
    • Yes (provide example URL)
    • No
  • Log gist:

Description

I had migrated the sqlite database and repositories from a Linux machine to a FreeBSD machine (both amd64 of course) and have found that the Gitea daemon will be up and running for 10-20 minutes before it goes into consuming all CPU. I attached truss (like strace) and found that the daemon was just spinning and spinning on sched_yield().

I cannot find any reproduction case, but the web UI goes completely unresponsive. Eventually, usually after I go to bed 😄 , the daemon seems to crash entirely

I'm honestly not sure how to make this a better report, if there's any useful debugging or flags I can fiddle with I'm happy to try.

Screenshots

@rtyler
Copy link
Author

rtyler commented Mar 1, 2021

I dumped from sqlite to postgresql, and the time before gitea spun into a busy loop seemed to take a lot longer, but still eventually fell over.

@zeripath
Copy link
Contributor

zeripath commented Mar 1, 2021

Any logs? You've not given us anything to work on - I don't know if it's potentially a scheduled task or something like a leak in go's waits.

Does pprof yield anything?

What version of go are you building with? Are you building with the absolute latest?

Are you running bleve code indexing?

Do you have mirrors?

But seriously give us some logs and ensure you're using the latest go version.

@rtyler
Copy link
Author

rtyler commented Mar 1, 2021

Go 1.15.8, as far as log goes there's nothing that indicates any problem.

2021/03/01 06:00:55 ...m.io/xorm/core/db.go:286:afterProcess() [I] [SQL] SELECT "id", "lower_name", "name", "full_name", "email", "keep_email_private", "email_notifications_preference", "passwd", "passwd_hash_algo", "must_change_password", "login_type", "login_source", "login_name", "type", "location", "website", "rands", "salt", "language", "description", "created_unix", "updated_unix", "last_login_unix", "last_repo_visibility", "max_repo_creation", "is_active", "is_admin", "is_restricted", "allow_git_hook", "allow_import_local", "allow_create_organization", "prohibit_login", "avatar", "avatar_email", "use_custom_avatar", "num_followers", "num_following", "num_stars", "num_repos", "num_teams", "num_members", "visibility", "repo_admin_change_team_access", "diff_view_style", "theme", "keep_activity_private" FROM "public"."user" WHERE "lower_name"=$1 LIMIT 1 [rtyler] - 750.404_s
2021/03/01 06:00:55 ...m.io/xorm/core/db.go:286:afterProcess() [I] [SQL] SELECT "id", "owner_id", "owner_name", "lower_name", "name", "description", "website", "original_service_type", "original_url", "default_branch", "num_watches", "num_stars", "num_forks", "num_issues", "num_closed_issues", "num_pulls", "num_closed_pulls", "num_milestones", "num_closed_milestones", "num_projects", "num_closed_projects", "is_private", "is_empty", "is_archived", "is_mirror", "status", "is_fork", "fork_id", "is_template", "template_id", "size", "is_fsck_enabled", "close_issues_via_commit_in_any_branch", "topics", "trust_model", "avatar", "created_unix", "updated_unix" FROM "public"."repository" WHERE "owner_id"=$1 AND "lower_name"=$2 LIMIT 1 [1 twitterbot] - 773.491_s
2021/03/01 06:00:55 ...m.io/xorm/core/db.go:286:afterProcess() [I] [SQL] SELECT "id", "lower_name", "name", "full_name", "email", "keep_email_private", "email_notifications_preference", "passwd", "passwd_hash_algo", "must_change_password", "login_type", "login_source", "login_name", "type", "location", "website", "rands", "salt", "language", "description", "created_unix", "updated_unix", "last_login_unix", "last_repo_visibility", "max_repo_creation", "is_active", "is_admin", "is_restricted", "allow_git_hook", "allow_import_local", "allow_create_organization", "prohibit_login", "avatar", "avatar_email", "use_custom_avatar", "num_followers", "num_following", "num_stars", "num_repos", "num_teams", "num_members", "visibility", "repo_admin_change_team_access", "diff_view_style", "theme", "keep_activity_private" FROM "public"."user" WHERE "id"=$1 LIMIT 1 [1] - 650.144_s

That's the last entry from the log after this process crashed last night. I do have code indexing and mirrored repositories in this environment.

@zeripath
Copy link
Contributor

zeripath commented Mar 1, 2021

Those queries are taking a very very long time 750.404_s !

Can you please read https://docs.gitea.io/en-us/logging-configuration/#debugging-problems and adjust your logging to give us a bit more detail. In your case the SQL logs would be actually helpful.

I'll need a bit more context than three lines though - clearly all of those are very wrong so we need to see the OK lines first.

(One other thing is consider changing your default password algorithm as argon2 might be the problem if you're running out of memory.) I would also check if you have an appropriate robots.txt in case you're being crawled and that is what the issue is.

@zeripath zeripath added the issue/needs-feedback For bugs, we need more details. For features, the feature must be described in more detail label Mar 12, 2021
@rtyler
Copy link
Author

rtyler commented Mar 13, 2021

I hadn't noticed this comment @zeripath, thanks for some pointers! I just noticed this evening that the gitea process looks like it's leaking gigabytes of memory before crashing, so will definitely try to do some tinkering this weekend

@lunny
Copy link
Member

lunny commented Mar 15, 2021

It seems the version on pkg of Freebsd is broken because of missing assets.

@rtyler
Copy link
Author

rtyler commented Mar 20, 2021

2021/03/20 15:13:42 ...m.io/xorm/core/db.go:286:afterProcess() [I] [SQL] SELECT "id", "repo_id", "type", "config", "created_unix" FROM "public"."repo_unit" WHERE (repo_id = $1) [256] - 254.005_s
2021/03/20 15:13:42 ...m.io/xorm/core/db.go:286:afterProcess() [I] [SQL] SELECT "id", "repo_id", "interval", "enable_prune", "updated_unix", "next_update_unix" FROM "public"."mirror" WHERE "repo_id"=$1 LIMIT 1 [256] - 239.612_s
2021/03/20 15:13:42 ...m.io/xorm/core/db.go:286:afterProcess() [I] [SQL] SELECT "id", "owner_id", "owner_name", "lower_name", "name", "description", "website", "original_service_type", "original_url", "default_branch", "num_watches", "num_stars", "num_forks", "num_issues", "num_closed_issues", "num_pulls", "num_closed_pulls", "num_milestones", "num_closed_milestones", "num_projects", "num_closed_projects", "is_private", "is_empty", "is_archived", "is_mirror", "status", "is_fork", "fork_id", "is_template", "template_id", "size", "is_fsck_enabled", "close_issues_via_commit_in_any_branch", "topics", "trust_model", "avatar", "created_unix", "updated_unix" FROM "public"."repository" WHERE "id"=$1 LIMIT 1 [256] - 367.187_s
2021/03/20 15:13:42 ...m.io/xorm/core/db.go:286:afterProcess() [I] [SQL] SELECT count(*) FROM "public"."release" WHERE repo_id=$1 AND is_draft=$2 [256 false] - 296.864_s
2021/03/20 15:13:43 ...m.io/xorm/core/db.go:286:afterProcess() [I] [SQL] SELECT topic.* FROM "public"."topic" INNER JOIN "public"."repo_topic" ON repo_topic.topic_id = topic.id WHERE repo_topic.repo_id=$1 ORDER BY "topic"."repo_count" DESC [256] - 542.493_s
2021/03/20 15:13:43 ...m.io/xorm/core/db.go:286:afterProcess() [I] [SQL] SELECT "id", "repo_id", "commit_id", "is_primary", "language", "size", "created_unix" FROM "public"."language_stat" WHERE ("repo_id" = $1) ORDER BY "size" DESC [256] - 416.704_s
2021/03/20 15:13:43 ...m.io/xorm/core/db.go:286:afterProcess() [I] [SQL] SELECT "id", "repo_id", "owner_id", "path", "created" FROM "public"."lfs_lock" WHERE "repo_id"=$1 [256] - 279.596_s
2021/03/20 15:13:43 ...s/charset/charset.go:122:DetectEncoding() [D] Detected encoding: utf-8 (fast)
2021/03/20 15:13:43 ...s/charset/charset.go:122:DetectEncoding() [D] Detected encoding: utf-8 (fast)
2021/03/20 15:13:43 ...s/context/context.go:139:HTML() [D] Template: repo/home
2021/03/20 15:13:43 ...m.io/xorm/core/db.go:286:afterProcess() [I] [SQL] SELECT team.lower_name FROM "public"."team_repo" INNER JOIN "public"."team" ON team.id = team_repo.team_id WHERE (team_repo.repo_id = $1) ORDER BY team.lower_name [256] - 506.988_s
2021/03/20 15:13:47 ...m.io/xorm/core/db.go:286:afterProcess() [I] [SQL] SELECT user_id, count(*) AS count FROM notification WHERE user_id IN (SELECT user_id FROM notification WHERE updated_unix >= $1 AND updated_unix < $2) AND status = $3 GROUP BY user_id [1616278415 1616278425 1] - 665.573_s
2021/03/20 15:13:48 ...s/context/context.go:332:func1() [D] Session ID: bfbf4a03d048c43c
2021/03/20 15:13:48 ...s/context/context.go:333:func1() [D] CSRF Token: aWTG8h9M8iEHPnYe0PH0TEgTzC06MTYxNjI3ODQyODE4MjU2Nzg3NQ
2021/03/20 15:13:48 ...m.io/xorm/core/db.go:286:afterProcess() [I] [SQL] SELECT "id", "lower_name", "name", "full_name", "email", "keep_email_private", "email_notifications_preference", "passwd", "passwd_hash_algo", "must_change_password", "login_type", "login_source", "login_name", "type", "location", "website", "rands", "salt", "language", "description", "created_unix", "updated_unix", "last_login_unix", "last_repo_visibility", "max_repo_creation", "is_active", "is_admin", "is_restricted", "allow_git_hook", "allow_import_local", "allow_create_organization", "prohibit_login", "avatar", "avatar_email", "use_custom_avatar", "num_followers", "num_following", "num_stars", "num_repos", "num_teams", "num_members", "visibility", "repo_admin_change_team_access", "diff_view_style", "theme", "keep_activity_private" FROM "public"."user" WHERE "lower_name"=$1 LIMIT 1 [scribd] - 637.003_s
2021/03/20 15:13:48 ...m.io/xorm/core/db.go:286:afterProcess() [I] [SQL] SELECT "id", "owner_id", "owner_name", "lower_name", "name", "description", "website", "original_service_type", "original_url", "default_branch", "num_watches", "num_stars", "num_forks", "num_issues", "num_closed_issues", "num_pulls", "num_closed_pulls", "num_milestones", "num_closed_milestones", "num_projects", "num_closed_projects", "is_private", "is_empty", "is_archived", "is_mirror", "status", "is_fork", "fork_id", "is_template", "template_id", "size", "is_fsck_enabled", "close_issues_via_commit_in_any_branch", "topics", "trust_model", "avatar", "created_unix", "updated_unix" FROM "public"."repository" WHERE "owner_id"=$1 AND "lower_name"=$2 LIMIT 1 [13 scribd.github.io] - 646.81_s
2021/03/20 15:13:48 ...m.io/xorm/core/db.go:286:afterProcess() [I] [SQL] SELECT "id", "repo_id", "type", "config", "created_unix" FROM "public"."repo_unit" WHERE (repo_id = $1) [255] - 708.15_s
2021/03/20 15:13:48 ...m.io/xorm/core/db.go:286:afterProcess() [I] [SQL] SELECT "id", "repo_id", "interval", "enable_prune", "updated_unix", "next_update_unix" FROM "public"."mirror" WHERE "repo_id"=$1 LIMIT 1 [255] - 791.583_s
2021/03/20 15:13:48 ...m.io/xorm/core/db.go:286:afterProcess() [I] [SQL] SELECT "id", "owner_id", "owner_name", "lower_name", "name", "description", "website", "original_service_type", "original_url", "default_branch", "num_watches", "num_stars", "num_forks", "num_issues", "num_closed_issues", "num_pulls", "num_closed_pulls", "num_milestones", "num_closed_milestones", "num_projects", "num_closed_projects", "is_private", "is_empty", "is_archived", "is_mirror", "status", "is_fork", "fork_id", "is_template", "template_id", "size", "is_fsck_enabled", "close_issues_via_commit_in_any_branch", "topics", "trust_model", "avatar", "created_unix", "updated_unix" FROM "public"."repository" WHERE "id"=$1 LIMIT 1 [255] - 1.217322ms
2021/03/20 15:13:48 ...m.io/xorm/core/db.go:286:afterProcess() [I] [SQL] SELECT count(*) FROM "public"."release" WHERE repo_id=$1 AND is_draft=$2 [255 false] - 817.945_s
2021/03/20 15:13:48 ...m.io/xorm/core/db.go:286:afterProcess() [I] [SQL] SELECT max( id ) as id FROM "public"."commit_status" WHERE (repo_id = $1) AND (sha = $2) GROUP BY context_hash ORDER BY max( id ) desc LIMIT 10 [255 10401f67986d98be568b1d85a93d40f0ee1d4fc3] - 542.583_s
2021/03/20 15:13:48 ...s/charset/charset.go:122:DetectEncoding() [D] Detected encoding: utf-8 (fast)
2021/03/20 15:13:48 ...s/charset/charset.go:122:DetectEncoding() [D] Detected encoding: utf-8 (fast)
2021/03/20 15:13:48 ...s/charset/charset.go:122:DetectEncoding() [D] Detected encoding: utf-8 (fast)
2021/03/20 15:13:48 ...m.io/xorm/core/db.go:286:afterProcess() [I] [SQL] SELECT "id", "lower_name", "name", "full_name", "email", "keep_email_private", "email_notifications_preference", "passwd", "passwd_hash_algo", "must_change_password", "login_type", "login_source", "login_name", "type", "location", "website", "rands", "salt", "language", "description", "created_unix", "updated_unix", "last_login_unix", "last_repo_visibility", "max_repo_creation", "is_active", "is_admin", "is_restricted", "allow_git_hook", "allow_import_local", "allow_create_organization", "prohibit_login", "avatar", "avatar_email", "use_custom_avatar", "num_followers", "num_following", "num_stars", "num_repos", "num_teams", "num_members", "visibility", "repo_admin_change_team_access", "diff_view_style", "theme", "keep_activity_private" FROM "public"."user" WHERE "email"=$1 LIMIT 1 [rtyler@brokenco.de] - 725.053_s
2021/03/20 15:13:48 ...m.io/xorm/core/db.go:286:afterProcess() [I] [SQL] SELECT "id", "owner_id", "key_id", "primary_key_id", "content", "created_unix", "expired_unix", "added_unix", "emails", "can_sign", "can_encrypt_comms", "can_encrypt_storage", "can_certify" FROM "public"."gpg_key" WHERE (key_id=$1) [E5C92681BEF6CEA2] - 425.546_s
2021/03/20 15:13:48 ...m.io/xorm/core/db.go:286:afterProcess() [I] [SQL] SELECT "id", "owner_id", "key_id", "primary_key_id", "content", "created_unix", "expired_unix", "added_unix", "emails", "can_sign", "can_encrypt_comms", "can_encrypt_storage", "can_certify" FROM "public"."gpg_key" WHERE (owner_id=$1 AND primary_key_id='') [1] - 334.136_s

Last snippets of the log now that I've been running with more debugging. Unfortunately nothing is jumping out to me, but I can upload an even longer log if that's helpful.

@zeripath
Copy link
Contributor

This looks like a problem with sqlite - look at the time those queries are taking. I don't know what is happening but they shouldn't be taking that long.

It may be worth examining your sqlite database - perhaps it needs vacuuming perhaps there's additional information that can be gleaned from it but the problem doesn't seem to be at the go layer.

But really we need more logs - none of those values are normal. I need to see when it normal and when it stopped being normal. A good marker for the point of change will be the timestamp at that end of a line containing xorm being listed in _s rather than _ms.

@rtyler
Copy link
Author

rtyler commented Mar 21, 2021

@zeripath The backing database is PostgreSQL 13 in this case on the same host. I've gone and run a vacuum anyways and we'll see what happens.

Because there's PII in this full log file, I would prefer not to share it with the whole wide word, I'm happy to email it along to you @zeripath however (i'm rtyler at brokenco de)

@zeripath
Copy link
Contributor

Sorry it's taken me so long to get back to you - I've sent you an email.

@zeripath
Copy link
Contributor

OK so it turns out the _s are actually µs so those are normal!

What was the configuration of your logging you had for those logs? I think LOG_SQL can now be set to false.

I don't see any GET or POST requests logs in your logs but yet I do see template 404 - so I'm not seeing the router logs. I need to see those too. It might be in the router.log file.

Could you also update to 1.13.6 - there's a really stupid bug in 1.13.2 that was fixed in 1.13.3 in #14598 I wonder if that might be the issue.

@rtyler
Copy link
Author

rtyler commented Mar 28, 2021

My logging configuration is:

[log]
ROOT_PATH = /var/log/gitea
MODE      = file
LEVEL     = debug
ROUTER    = console

This is with 1.13.6, but I'm not sure I'm doing this right since the logs don't look too impressive to me (sent via email)

@zeripath
Copy link
Contributor

Please see:

https://docs.gitea.io/en-us/logging-configuration

I need the router logs too. If you want to use files rather than redirecting the console do:

[log]
MODE = file
LEVEL = debug 
ROUTER = ,
COLORIZE = false

If you're able to redirect the console then:

[log]
MODE = console
LEVEL = debug
ROUTER = console
COLORIZE = false

@rtyler
Copy link
Author

rtyler commented Mar 31, 2021

Even with the following, I don't believe i'm getting router logs out of this daemon 😕

[log]
ROOT_PATH = /var/log/gitea
MODE      = file
LEVEL     = debug
ROUTER_LOG_LEVEL = debug
ROUTER = ,

Would it be possible that some build flag might have caused this to be functionality to be disabled?

@zeripath
Copy link
Contributor

Is it possible you have [log] DISABLE_ROUTER_LOG=true ? It might be helpful to explicitly set that to false DISABLE_ROUTER_LOG=false

ROUTER_LOG_LEVEL does not mean anything.

What is the contents of your app.ini as relates to [log] and [log.*]?

@rtyler
Copy link
Author

rtyler commented Apr 11, 2021

My log section is here:

[log]
ROOT_PATH = /var/log/gitea
MODE      = file
LEVEL     = debug
ROUTER = ,

And ROUTER_LOG_LEVEL is mentioned in the documentation so that may be a doc bug if that's not really a thing anymore.

Honestly I'm quite perplexed as to why no router logs are showing up.

@zeripath
Copy link
Contributor

I think this is #15826

@rtyler
Copy link
Author

rtyler commented May 12, 2021

Oooo, I'll keep an eye on that 😄

@rtyler
Copy link
Author

rtyler commented Jun 23, 2021

Just jumped to 0.14.3 and still see the issue, but it looks like one of the related fixes is slated for 0.15 🤞

@zeripath
Copy link
Contributor

which one?

@rtyler
Copy link
Author

rtyler commented Jun 23, 2021

#15831 was linked as "still being needed" in the close of #15826

@zeripath
Copy link
Contributor

1.14.3 should contain that and another fix for it.

So it's not that unfortunately.

We still need to see those routers logs.

It may be that you find that 1.15 fixes this though.

@wxiaoguang
Copy link
Contributor

Outdated, please open a new issue if there is still any problem

@go-gitea go-gitea locked and limited conversation to collaborators May 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
issue/needs-feedback For bugs, we need more details. For features, the feature must be described in more detail
Projects
None yet
Development

No branches or pull requests

4 participants