Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? #2346

Closed
2 tasks done
matthewcoopergrt opened this issue Nov 23, 2022 · 32 comments · Fixed by #3515
Labels
bug Something isn't working wontfix This will not be worked on

Comments

@matthewcoopergrt
Copy link

⚠️ Please verify that this bug has NOT been raised before.

  • I checked and didn't find similar issue

🛡️ Security Policy

Description

Logged in this evening to find no monitors and the following error displayed:

Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?

Full startup log below. Is this a known issue?

Matt

👟 Reproduction steps

  • Login to Kuma
  • No monitors or status pages displayed
  • Error message appears on screen
  • Error logged

👀 Expected behavior

Login is normal and view monitors/status pages etc.

😓 Actual Behavior

  • No monitors or status pages displayed
  • Error message appears on screen
  • Error logged

🐻 Uptime-Kuma Version

1.18.5

💻 Operating System and Arch

louislam/uptime-kuma Container Image

🌐 Browser

107.0.5304.110

🐋 Docker Version

Amazon Fargate LATEST(1.4.0)

🟩 NodeJS Version

No response

📝 Relevant log output

2022-11-23 21:04:59Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
2022-11-23 21:04:59at Client_SQLite3.acquireConnection (/app/node_modules/knex/lib/client.js:305:26)
2022-11-23 21:04:59at runNextTicks (node:internal/process/task_queues:61:5)
2022-11-23 21:04:59at listOnTimeout (node:internal/timers:528:9)
2022-11-23 21:04:59at processTimers (node:internal/timers:502:7)
2022-11-23 21:04:59at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:259:28)
2022-11-23 21:04:59at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19)
2022-11-23 21:04:59at async RedBeanNode.normalizeRaw (/app/node_modules/redbean-node/dist/redbean-node.js:588:22)
2022-11-23 21:04:59at async RedBeanNode.getRow (/app/node_modules/redbean-node/dist/redbean-node.js:574:22)
2022-11-23 21:04:59at async RedBeanNode.getCell (/app/node_modules/redbean-node/dist/redbean-node.js:609:19)
2022-11-23 21:04:592022-11-23T21:04:59.523Z [MONITOR] ERROR: Caught error
2022-11-23 21:04:592022-11-23T21:04:59.523Z [MONITOR] ERROR: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
2022-11-23 21:04:59If you keep encountering errors, please report to https://github.com/louislam/uptime-kuma/issues
2022-11-23 21:04:59Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
2022-11-23 21:04:59at Client_SQLite3.acquireConnection (/app/node_modules/knex/lib/client.js:305:26)
2022-11-23 21:04:59at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:259:28)
2022-11-23 21:04:59at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19)
2022-11-23 21:04:59at async RedBeanNode.normalizeRaw (/app/node_modules/redbean-node/dist/redbean-node.js:588:22)
2022-11-23 21:04:59at async RedBeanNode.getRow (/app/node_modules/redbean-node/dist/redbean-node.js:574:22)
2022-11-23 21:04:59at async Function.calcUptime (/app/server/model/monitor.js:826:22)
2022-11-23 21:04:59at async Function.sendUptime (/app/server/model/monitor.js:889:24)
2022-11-23 21:04:59at async Function.sendStats (/app/server/model/monitor.js:768:13) {
2022-11-23 21:04:59sql: '\n' +
2022-11-23 21:04:59' SELECT\n' +
2022-11-23 21:04:59' -- SUM all duration, also trim off the beat out of time window\n' +
2022-11-23 21:04:59' SUM(\n' +
2022-11-23 21:04:59' CASE\n' +
2022-11-23 21:04:59' WHEN (JULIANDAY(time) - JULIANDAY(?)) * 86400 < duration\n' +
2022-11-23 21:04:59' THEN (JULIANDAY(time) - JULIANDAY(?)) * 86400\n' +
2022-11-23 21:04:59' ELSE duration\n' +
2022-11-23 21:04:59' END\n' +
2022-11-23 21:04:59' ) AS total_duration,\n' +
2022-11-23 21:04:59'\n' +
2022-11-23 21:04:59' -- SUM all uptime duration, also trim off the beat out of time window\n' +
2022-11-23 21:04:59' SUM(\n' +
2022-11-23 21:04:59' CASE\n' +
2022-11-23 21:04:59' WHEN (status = 1)\n' +
2022-11-23 21:04:59' THEN\n' +
2022-11-23 21:04:59' CASE\n' +
2022-11-23 21:04:59' WHEN (JULIANDAY(time) - JULIANDAY(?)) * 86400 < duration\n' +
2022-11-23 21:04:59' THEN (JULIANDAY(time) - JULIANDAY(?)) * 86400\n' +
2022-11-23 21:04:59' ELSE duration\n' +
2022-11-23 21:04:59' END\n' +
2022-11-23 21:04:59' END\n' +
2022-11-23 21:04:59' ) AS uptime_duration\n' +
2022-11-23 21:04:59' FROM heartbeat\n' +
2022-11-23 21:04:59' WHERE time > ?\n' +
2022-11-23 21:04:59' AND monitor_id = ?\n' +
2022-11-23 21:04:59' ',
2022-11-23 21:04:59bindings: [
2022-11-23 21:04:59'2022-10-24 21:03:59',
2022-11-23 21:04:59'2022-10-24 21:03:59',
2022-11-23 21:04:59'2022-10-24 21:03:59',
2022-11-23 21:04:59'2022-10-24 21:03:59',
2022-11-23 21:04:59'2022-10-24 21:03:59',
2022-11-23 21:04:5927
2022-11-23 21:04:59]
2022-11-23 21:04:59}
2022-11-23 21:04:59at process. (/app/server/server.js:1728:13)
2022-11-23 21:04:59at process.emit (node:events:513:28)
2022-11-23 21:04:59at emit (node:internal/process/promises:140:20)
2022-11-23 21:04:59at processPromiseRejections (node:internal/process/promises:274:27)
2022-11-23 21:04:59at processTicksAndRejections (node:internal/process/task_queues:97:32)
2022-11-23 21:04:592022-11-23T21:04:59.514Z [MONITOR] ERROR: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
2022-11-23 21:04:592022-11-23T21:04:59.514Z [MONITOR] ERROR: Caught error
2022-11-23 21:04:592022-11-23T21:04:59.482Z [MONITOR] ERROR: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
2022-11-23 21:04:592022-11-23T21:04:59.482Z [MONITOR] ERROR: Caught error
2022-11-23 21:03:592022-11-23T21:03:59.649Z [MONITOR] WARN: Monitor https://github.com/louislam/uptime-kuma/issues/49 '': Failing: Request failed with status code 401 | Interval: 60 seconds | Type: http | Down Count: 0 | Resend Interval: 0
2022-11-23 21:03:412022-11-23T21:03:41.599Z [AUTH] INFO: Successfully logged in user . IP=
2022-11-23 21:03:412022-11-23T21:03:41.431Z [AUTH] INFO: Username from JWT:
2022-11-23 21:03:412022-11-23T21:03:41.428Z [AUTH] INFO: Login by token. IP=
2022-11-23 21:03:232022-11-23T21:03:23.632Z [SERVER] INFO: Listening on 3001
2022-11-23 21:03:232022-11-23T21:03:23.623Z [SERVER] INFO: Adding socket handler
2022-11-23 21:03:232022-11-23T21:03:23.623Z [SERVER] INFO: Init the server
2022-11-23 21:03:232022-11-23T21:03:23.588Z [SERVER] INFO: Adding route
2022-11-23 21:03:232022-11-23T21:03:23.550Z [SERVER] INFO: Load JWT secret from database.
2022-11-23 21:03:232022-11-23T21:03:23.398Z [DB] INFO: Your database version: 10
2022-11-23 21:03:232022-11-23T21:03:23.398Z [DB] INFO: Latest database version: 10
2022-11-23 21:03:232022-11-23T21:03:23.398Z [DB] INFO: Database patch not needed
2022-11-23 21:03:232022-11-23T21:03:23.398Z [DB] INFO: Database Patch 2.0 Process
2022-11-23 21:03:232022-11-23T21:03:23.384Z [DB] INFO: SQLite Version: 3.38.3
2022-11-23 21:03:232022-11-23T21:03:23.385Z [SERVER] INFO: Connected
2022-11-23 21:03:23[ { cache_size: -12000 } ]
2022-11-23 21:03:23[ { journal_mode: 'wal' } ]
2022-11-23 21:03:232022-11-23T21:03:23.377Z [DB] INFO: SQLite config:
2022-11-23 21:03:232022-11-23T21:03:23.046Z [SERVER] INFO: Connecting to the Database
2022-11-23 21:03:232022-11-23T21:03:23.044Z [DB] INFO: Data Dir: ./data/
2022-11-23 21:03:222022-11-23T21:03:22.966Z [SERVER] INFO: Version: 1.18.5
2022-11-23 21:03:222022-11-23T21:03:22.900Z [NOTIFICATION] INFO: Prepare Notification Providers
2022-11-23 21:03:222022-11-23T21:03:22.816Z [SERVER] INFO: Importing this project modules
2022-11-23 21:03:222022-11-23T21:03:22.813Z [SERVER] INFO: Server Type: HTTP
2022-11-23 21:03:222022-11-23T21:03:22.812Z [SERVER] INFO: Creating express and socket.io instance
2022-11-23 21:03:222022-11-23T21:03:22.065Z [SERVER] INFO: Importing 3rd-party libraries
2022-11-23 21:03:222022-11-23T21:03:22.064Z [SERVER] INFO: Welcome to Uptime Kuma
2022-11-23 21:03:222022-11-23T21:03:22.064Z [SERVER] INFO: Node Env: production
2022-11-23 21:03:222022-11-23T21:03:22.064Z [SERVER] INFO: Importing Node libraries
2022-11-23 21:03:22Your Node.js version: 16
2022-11-23 21:03:22Welcome to Uptime Kuma
2022-11-23 21:03:22==> Starting application with user 0 group 0
2022-11-23 21:03:21==> Performing startup jobs and maintenance tasks
@matthewcoopergrt matthewcoopergrt added the bug Something isn't working label Nov 23, 2022
@louislam louislam added the wontfix This will not be worked on label Nov 24, 2022
@louislam
Copy link
Owner

Amazon Fargate LATEST(1.4.0)

Please use a normal VPS with official docker.

@brhahlen
Copy link

@louislam
I'm getting this on Docker:

 $ docker version
Client: Docker Engine - Community
 Version:           20.10.17
 API version:       1.41
 Go version:        go1.17.11
 Git commit:        100c701
 Built:             Mon Jun  6 23:03:42 2022
 OS/Arch:           linux/arm
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.17
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.11
  Git commit:       a89b842
  Built:            Mon Jun  6 23:01:46 2022
  OS/Arch:          linux/arm
  Experimental:     false
 containerd:
  Version:          1.6.6
  GitCommit:        10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
 runc:
  Version:          1.1.2
  GitCommit:        v1.1.2-0-ga916309
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Logs:

uptime-kuma  | 2022-11-28T14:11:42.536Z [MONITOR] ERROR: Please report to https://github.com/louislam/uptime-kuma/issues
uptime-kuma  | Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
uptime-kuma  |     at Client_SQLite3.acquireConnection (/app/node_modules/knex/lib/client.js:305:26)
uptime-kuma  |     at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:259:28)
uptime-kuma  |     at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19)
uptime-kuma  |     at async RedBeanNode.storeCore (/app/node_modules/redbean-node/dist/redbean-node.js:166:26)
uptime-kuma  |     at async RedBeanNode.store (/app/node_modules/redbean-node/dist/redbean-node.js:126:20)
uptime-kuma  |     at async beat (/app/server/model/monitor.js:639:13)
uptime-kuma  |     at async Timeout.safeBeat [as _onTimeout] (/app/server/model/monitor.js:658:17) {
uptime-kuma  |   sql: undefined,
uptime-kuma  |   bindings: undefined
uptime-kuma  | }
uptime-kuma  |     at Timeout.safeBeat [as _onTimeout] (/app/server/model/monitor.js:660:25)

[...]

uptime-kuma  | 2022-11-28T14:12:37.183Z [MONITOR] WARN: Monitor #24 'PiHole - Backup': Failing: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? | Interval: 60 seconds | Type: docker | Down Count: 0 | Resend Interval: 0
uptime-kuma  | 2022-11-28T14:12:37.730Z [MONITOR] WARN: Monitor #23 'Adminer': Failing: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? | Interval: 60 seconds | Type: docker | Down Count: 0 | Resend Interval: 0

These are connections to two seperate docker servers.

@brhahlen
Copy link

I'm running on a Raspberry Pi 2 Model B, which I think may be a bit underpowered for Uptime-Kuma?

@Buri
Copy link

Buri commented Dec 8, 2022

I have the same issue and its not due to underpowered machine.

$ docker version
Client:
 Version:           20.10.17-ce
 API version:       1.41
 Go version:        go1.17.13
 Git commit:        a89b84221c85
 Built:             Wed Jun 29 12:00:00 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.17-ce
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.13
  Git commit:       a89b84221c85
  Built:            Wed Jun 29 12:00:00 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.6.6
  GitCommit:        10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-ga916309fff0f
 docker-init:
  Version:          0.1.5_catatonit
  GitCommit:        
$ inxi -F
System:    Kernel: 5.14.21-150400.22-default x86_64 bits: 64 Console: pty pts/2 Distro: openSUSE Leap 15.4
CPU:       Info: 8-Core model: AMD Ryzen 7 2700 bits: 64 type: MT MCP cache: L2: 4 MiB
Info:      Processes: 1450 Uptime: 29d 0h 34m Memory: 62.69 GiB used: 41.1 GiB (65.6%) Init: systemd runlevel: 3 Shell: Zsh
           inxi: 3.3.07
$ docker image ls | grep uptime
louislam/uptime-kuma                                                1                   930a7e08142f   8 weeks ago     350MB

@derekoharrow
Copy link

I get the same - a restart of the container fixes it.

Is there any way of monitoring for this - some kind of API that can be polled and cause a restart if no monitors are found?

2022-12-11T15:16:04.790Z [MONITOR] ERROR: Please report to https://github.com/louislam/uptime-kuma/issues
Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
    at Client_SQLite3.acquireConnection (/app/node_modules/knex/lib/client.js:305:26)
    at runMicrotasks (<anonymous>)
    at runNextTicks (node:internal/process/task_queues:61:5)
    at listOnTimeout (node:internal/timers:528:9)
    at processTimers (node:internal/timers:502:7)
    at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:259:28)
    at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19)
    at async RedBeanNode.normalizeRaw (/app/node_modules/redbean-node/dist/redbean-node.js:588:22)
    at async RedBeanNode.getRow (/app/node_modules/redbean-node/dist/redbean-node.js:574:22)
    at async RedBeanNode.getCell (/app/node_modules/redbean-node/dist/redbean-node.js:609:19) {
  sql: '\n' +
    '            SELECT AVG(ping)\n' +
    '            FROM heartbeat\n' +
    "            WHERE time > DATETIME('now', ? || ' hours')\n" +
    '            AND ping IS NOT NULL\n' +
    '            AND monitor_id = ?  limit ?',
  bindings: [ -24, 31, 1 ]
}
    at process.<anonymous> (/app/server/server.js:1728:13)
    at process.emit (node:events:513:28)
    at emit (node:internal/process/promises:140:20)
    at processPromiseRejections (node:internal/process/promises:274:27)
    at processTicksAndRejections (node:internal/process/task_queues:97:32)
    at runNextTicks (node:internal/process/task_queues:65:3)
    at listOnTimeout (node:internal/timers:528:9)
    at processTimers (node:internal/timers:502:7)
If you keep encountering errors, please report to https://github.com/louislam/uptime-kuma/issues
Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?

@dipakparmar
Copy link

I am getting the same running on Kubernetes. Found this while looking in the knex knex/knex#2820

@derekoharrow
Copy link

This is now getting more serious for me - Uptime Kuma is non-responsive more often than it isn't now. I have to do daily restarts of the Uptime Kuma docker container, and even then it doesn't always become responsive again. I've just had to restart the container about 10 times before it kicked back into life, and I've setup a monitor to monitor when it stops responding (having it call a Home Assistant Webhook as a form of reverse heartbeat).

Anyone got any ideas - is there a way to resolve this?

@matthewcoopergrt
Copy link
Author

From a very brief skim of Google some people are pointing towards sqlite being a limitation. @louislam support for MySQL still not likely to be considered?

I know it has been mentioned before that Kuma isn't a production ready monitoring tool but in reality it's not far off. Bar the above issues we have found it very useful.

@Computroniks
Copy link
Contributor

From a very brief skim of Google some people are pointing towards sqlite being a limitation. @louislam support for MySQL still not likely to be considered?

I can't remember which issue it was, but there was a suggestion about splitting up the config and results into two separate databases, something that would make sense. I think for the results database, a time series one would be an appropriate choice, then we could just stick to sqlite for config

@louislam
Copy link
Owner

From a very brief skim of Google some people are pointing towards sqlite being a limitation. @louislam support for MySQL still not likely to be considered?

I know it has been mentioned before that Kuma isn't a production ready monitoring tool but in reality it's not far off. Bar the above issues we have found it very useful.

It's in my 2.0 roadmap.
https://github.com/users/louislam/projects/4

@metadan
Copy link

metadan commented Jan 31, 2023

@louislam is there an estimate of timescale for 2.0 ?

@metadan
Copy link

metadan commented Jan 31, 2023

For ref currently seem to have ameliorated this issue by changing the connection pool settings to:

                min: 0,
                max: 10,
                reapIntervalMillis: 1000,
                createRetryIntervalMillis: 200,
                idleTimeoutMillis: 30 * 1000,
                propagateCreateError: false,
                acquireTimeoutMillis: acquireConnectionTimeout,

@chakflying
Copy link
Collaborator

chakflying commented Jan 31, 2023

// This ensures that an operating system crash or power failure will not corrupt the database.
// FULL synchronous is very safe, but it is also slower.
// Read more: https://sqlite.org/pragma.html#pragma_synchronous
await R.exec("PRAGMA synchronous = FULL");

Currently we are using synchronous = FULL, but the SQLite documentation explains that NORMAL should be enough in WAL mode. I think people with performance issues with less data integrity requirements should experiment with setting it to NORMAL or OFF. Just have a backup of your database (/data folder), then you won't have the risk of losing your configs.

I was also thinking about a split database, where the configs are stored with synchronous = FULL, and the heartbeats are stored with synchronous = NORMAL. That way we can keep using SQLite, requiring minimal changes. But I'm not sure if the benefits are significant.

@remyd1
Copy link

remyd1 commented Feb 14, 2023

I had the same issue. As soon as I am trying to delete a specific monitor (which as a lot of events associated to it), I get:

Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
    at Client_SQLite3.acquireConnection (/app/node_modules/knex/lib/client.js:312:26)
    at runNextTicks (node:internal/process/task_queues:61:5)
    at processTimers (node:internal/timers:499:9)
    at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:287:28)
    at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19)
    at async RedBeanNode.find (/app/node_modules/redbean-node/dist/redbean-node.js:464:24)
    at async Timeout.generateMaintenanceTimeslots [as _onTimeout] (/app/server/uptime-kuma-server.js:275:20) {
  sql: undefined,
  bindings: undefined
}
    at process.<anonymous> (/app/server/server.js:1794:13)
    at process.emit (node:events:513:28)
    at emit (node:internal/process/promises:140:20)
    at processPromiseRejections (node:internal/process/promises:274:27)
    at processTicksAndRejections (node:internal/process/task_queues:97:32)
    at runNextTicks (node:internal/process/task_queues:65:3)
    at processTimers (node:internal/timers:499:9)
If you keep encountering errors, please report to https://github.com/louislam/uptime-kuma/issues
2023-02-14T10:51:52+01:00 [MONITOR] WARN: Monitor #1 'mescal-anr': Pending: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? | Max retries: 3 | Retry: 1 | Retry Interval: 60 seconds | Type: http
Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
    at Client_SQLite3.acquireConnection (/app/node_modules/knex/lib/client.js:312:26)
    at async Runner.ensureConnection (/app/node_modules/knex/lib/execution/runner.js:287:28)
    at async Runner.run (/app/node_modules/knex/lib/execution/runner.js:30:19)
    at async RedBeanNode.normalizeRaw (/app/node_modules/redbean-node/dist/redbean-node.js:569:22)
    at async RedBeanNode.getRow (/app/node_modules/redbean-node/dist/redbean-node.js:555:22)
    at async RedBeanNode.getCell (/app/node_modules/redbean-node/dist/redbean-node.js:590:19)
    at async Function.get (/app/server/settings.js:54:21)
    at async exports.setting (/app/server/util-server.js:416:12)
    at async /app/server/server.js:186:13 {
  sql: 'SELECT `value` FROM setting WHERE `key` = ?  limit ?',
  bindings: [ 'trustProxy', 1 ]
}
    at process.<anonymous> (/app/server/server.js:1794:13)
    at process.emit (node:events:513:28)
    at emit (node:internal/process/promises:140:20)
    at processPromiseRejections (node:internal/process/promises:274:27)
    at processTicksAndRejections (node:internal/process/task_queues:97:32)
If you keep encountering errors, please report to https://github.com/louislam/uptime-kuma/issues

Then, the DB is corrupted and I have to (force) stop the container and restore an old DB to get uptime kuma working. Other monitor deletion worked.

I think it is a timeout somewhere related to a big SQL query. I have no issue related to performances (it is a big VM).

docker version:

docker version
Client:
 Version:      17.05.0-ce
 API version:  1.29
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:10:54 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.05.0-ce
 API version:  1.29 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:10:54 2017
 OS/Arch:      linux/amd64
 Experimental: false

Uptime kuma is the last version.

Thanks,

EDIT : after stopping the container, and waiting for it to be stopped (...very long time), removing it and restarting it back (long period to wait before it becomes healthy and is available again), it worked.

@fuomag9
Copy link

fuomag9 commented Feb 28, 2023

I had the same issue. As soon as I am trying to delete a specific monitor (which as a lot of events associated to it), I get:

Happening to me as well, editing a monitor kills uptime Kuma

@toontoet
Copy link

toontoet commented Mar 7, 2023

Currently we are using synchronous = FULL, but the SQLite documentation explains that NORMAL should be enough in WAL mode. I think people with performance issues with less data integrity requirements should experiment with setting it to NORMAL or OFF.

Is it perhaps possible to set the desired setting by means of an environment variable? Maybe for some deployments NORMAL or OFF is good enough.

@strarsis
Copy link

I have this error from time to time, causing a burst of downtime notifications that are quickly resolved. It would be nice to get rid of those false-positives.

@yoerin
Copy link

yoerin commented Apr 26, 2023

I am having this error quite often lately, is there any resolution on the horizon?
Working with docker and latest version of docker and uptime kuma

@yoerin
Copy link

yoerin commented May 1, 2023

I am having this error quite often lately, is there any resolution on the horizon? Working with docker and latest version of docker and uptime kuma

I have discovered the problem for me at least. I am running uptime inside docker on a NAS. When the disk activity was high i would get this error message.

Once i addressed the continues high disk read/write actions the messages stayed away.

Hopefully someone else can benefit from this respons.

@ofifoto
Copy link

ofifoto commented May 1, 2023

Seeing the same; my smaller installation with 25 (2 paused) monitors seems a-ok; but the one with 54 (19 paused) monitors has quite fallen over lately (even after clearing stats/shrinking the DB as much as possible) to the point I've just ignored it for now

image

@00ihsan
Copy link

00ihsan commented Jul 17, 2023

Same error, and it happens every night at a specific time. (3.17 AM)

@ninthwalker
Copy link

Same error, and it happens every night at a specific time. (3.17 AM)

Same.. not sure what uptime kuma does at that time, but multiple monitors go offline with this error at 3:14 for me then come back online like 4min later. Maybe it's some DB cleanup process that hammers the DB and causes it I suppose.

@slurdge
Copy link

slurdge commented Aug 14, 2023

Same here, also at night, around 2AM - 4AM.

@chakflying
Copy link
Collaborator

chakflying commented Aug 14, 2023

Users are strongly encouraged to update to 1.23 before reporting related issues. You can either try the beta now or wait for an official release soon.

The server runs the task clearing monitor history data beyond the defined period at 03:14am each day (server time). 1.23 includes PR #2800 and #3380, which improves database write to disk behavior and how deletes are handled. Database operations are still blocking, but it should now takes less time to process them.

If you are still having issues, pressing the "Settings" -> "Monitor History" -> "Shrink Database" button should also help in the short term (the description previously written is not entirely accurate). Finally, disk performance is important and if your server has poor IO performance and/or you are running a large number of monitors, the chance of this error occurring will increase.

@toineenzo
Copy link

Users are strongly encouraged to update to 1.23 before reporting related issues. You can either try the beta now or wait for an official release soon.

The server runs the task clearing monitor history data beyond the defined period at 03:14am each day (server time). 1.23 includes PR #2800 and #3380, which improves database write to disk behavior and how deletes are handled. Database operations are still blocking, but it should now takes less time to process them.

If you are still having issues, pressing the "Settings" -> "Monitor History" -> "Shrink Database" button should also help in the short term (the description previously written is not entirely accurate). Finally, disk performance is important and if your server has poor IO performance and/or you are running a large number of monitors, the chance of this error occurring will increase.

Awesome! Will try it out

@Uthpal-p
Copy link

@toineenzo Did upgrading to 1.23 work? I'm still facing this error after the upgrade.

@00ihsan
Copy link

00ihsan commented Sep 10, 2023 via email

@Ashkaan
Copy link

Ashkaan commented Sep 18, 2023

I constantly get this. Just happened again on Version: 1.23.2 today.

I'm running bare-metal on a quad-core Xeon with the latest docker, 16GB RAM, and 12x RAID6. I don't think this system is underpowered.

Any new ideas?

@gvkhna
Copy link

gvkhna commented Sep 18, 2023

What fixed this for me was Settings -> Monitor History -> Clear all Statistics. Then change Keep monitor history for 7 days.

This is likely not a cpu power issue but an issue of having too much data in sqlite which takes longer (and ultimately times out) to run queries with so much data. I believe the old default was 0 for keep monitor history (forever) which that default should be changed to something like 7 or 14. I probably had a years worth of data which is also pretty useless but since I cleared everything I haven't had any issues.

@toineenzo
Copy link

I got it one or two times. At least not daily. Op 10 sep 2023 om 10:14 heeft Uthpal P @.> het volgende geschreven: @toineenzo Did upgrading to 1.23 work? I'm still facing this error after the upgrade. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.>

Forgot to answer, but yes! The latest updates fixed it. Now I rarely get this error but thats only when my NAS CPU/RAM usage is really high. So seems like its fixed. At least I dont get spammed on my Telegram webhook with this error

@mmospanenko
Copy link

having the same, enough powerful server (8 cores, arm based), 16ram, but seems Kuma need normal database. Will be great to have Postgres and/or Radis for fixing this limitations.

I see ping shows ~10 seconds, can't say that it is true, looks like it has queue.

@CommanderStorm
Copy link
Collaborator

@mmospanenko the current architecture will not use more than one core. But this is not the limit of the current Architecutre in any sense, IO-Throughput and latency is. See #4500 for ways to mitigate this until the v2 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working wontfix This will not be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.