Skip to content

Bitnami stack in unrecoverable state after a node termination #36

Closed
@solsson

Description

@solsson

We run ephemeral nodes with termination handlers and have survived thousands of node terminations with v2.1.0 of this repo. On two or three occasions recovery has required manual intervention because we were unlucky enough to lose two out of three pods concurrently. A simple method of recovery has been to scale down to zero and back up to X>=3 again.

With the bitnami stack #35 we ended up in an unrecoverable state after about 1 node termination.

One newly started pod would join the galera cluster but fail to do SST

[Warning] WSREP: Member 1.0 (ystack-mariadb-galera-0) requested state transfer from '*any*', but it is impossible to select State Transfer donor: Resource temporarily unavailable

The other pods appearing Ready the failing pod restarted to this state:

2021-06-24 16:02:11 2 [Note] WSREP: Server status change joiner -> initializing
2021-06-24 16:02:11 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-06-24 16:02:11 0 [Note] mysqld: Aria engine: starting recovery
recovered pages: 0% 10% 20% 41% 53% 65% 80% 92% 100% (0.0 seconds); tables to flush: 2 1 0
 (0.0 seconds); 
2021-06-24 16:02:11 0 [Note] mysqld: Aria engine: recovery done
2021-06-24 16:02:11 0 [Warning] The parameter innodb_file_format is deprecated and has no effect. It may be removed in future releases. See https://mariadb.com/kb/en/library/xtradbinnodb-file-format/
2021-06-24 16:02:11 0 [Warning] The parameter innodb_log_files_in_group is deprecated and has no effect.
2021-06-24 16:02:11 0 [Note] InnoDB: Uses event mutexes
2021-06-24 16:02:11 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2021-06-24 16:02:11 0 [Note] InnoDB: Number of pools: 1
2021-06-24 16:02:11 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2021-06-24 16:02:11 0 [Note] mysqld: O_TMPFILE is not supported on /opt/bitnami/mariadb/tmp (disabling future attempts)
2021-06-24 16:02:11 0 [Note] InnoDB: Using Linux native AIO
2021-06-24 16:02:11 0 [Note] InnoDB: Initializing buffer pool, total size = 2147483648, chunk size = 134217728
2021-06-24 16:02:11 0 [Note] InnoDB: Completed initialization of buffer pool
2021-06-24 16:02:11 0 [Note] InnoDB: Setting log file ./ib_logfile101 size to 134217728 bytes
2021-06-24 16:02:12 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
2021-06-24 16:02:12 0 [Note] InnoDB: New log file created, LSN=151017
2021-06-24 16:02:12 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 1 row operations to undo
2021-06-24 16:02:12 0 [Note] InnoDB: Trx id counter is 235104
2021-06-24 16:02:12 0 [Note] InnoDB: 128 rollback segments are active.
2021-06-24 16:02:12 0 [Note] InnoDB: Starting in background the rollback of recovered transactions
2021-06-24 16:02:12 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2021-06-24 16:02:12 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2021-06-24 16:02:12 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2021-06-24 16:02:12 0 [ERROR] InnoDB: preallocating 12582912 bytes for file ./ibtmp1 failed with error 28
2021-06-24 16:02:12 0 [ERROR] InnoDB: Could not set the file size of './ibtmp1'. Probably out of disk space
2021-06-24 16:02:12 0 [ERROR] InnoDB: Unable to create the shared innodb_temporary
2021-06-24 16:02:12 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
210624 16:02:12 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.

Server version: 10.5.10-MariaDB-log
key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=0
max_threads=502
thread_count=2
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1137879 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
2021-06-24 16:02:12 0 [Note] InnoDB: Rolled back recovered transaction 235103
2021-06-24 16:02:12 0 [Note] InnoDB: Rollback of non-prepared transactions completed
stack_bottom = 0x0 thread_stack 0x49000
/opt/bitnami/mariadb/sbin/mysqld(my_print_stacktrace+0x2e)[0x5646a8de15fe]
/opt/bitnami/mariadb/sbin/mysqld(handle_fatal_signal+0x485)[0x5646a889e735]

I suppose there is recovery tools for this state, but we're reverting back to maintaining our own stack.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions