-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PostgreSQL pod fails to restart properly if DB initialization didn't finish properly in previous pod run #309
Comments
JFTR, here's the log of the pod, when the DB initialization finished successfully:
Compared to the previous / above one, it's visible the pod got restarted sooner, than it has had chance to sync the necessary DB init configuration to the disk / PVC. |
Looking at the code, the problem seems to be this test. Since there are more steps performed within the E.g. what if the pod was restarted somewhere around this step ? Since in the next run of the pod, the The quick (but untested) approach might be to make the DB initialization procedure atomic one (either to perform all parts of it, or retry by next run). Will submit an untested patch in a bit. |
This sounds like we'd have to try to initdb, and if that failed -- we'd have to remove the leftovers, and try again. We need to be very careful here. Doing Doing We can touch "$PGDATA/../initdb_in_progress" probably, but it's not 100% clear that the container user can write anywhere else then "$PGDATA" nowadays (== |
@praiskup Thanks for looking (sorry, mtgs in the morning). Are we able to tell how the proper DB initialization should look like? (re: This sounds like we'd have to try to initdb, and if that failed -- we'd have to remove the leftovers, and try again.) From what I have tried on Fedora, if the DB is already initialized, calling another DB init won't do anything. But it won't solve the case if the DB wasn't initialized properly completely in previous run. |
@praiskup Was originally thinking about enhancing the original |
|
Ok, fair enough. So what about this? -- instead of trying to identify the leftover content, that should be removed if the initdb failed in previous run, we would rather change the $PGDATA directory to point to some temporary directory. Run the If it failed in 1th pod run, next time it would be generated again into another temporary directory (no hurt). If it succeeded, it succeeded as a whole. Could this work? |
Reproducer: (one way how to simulate slow PVC (maybe there are more) is as follows):
and restart NFS server:
I have used simple scenario when the OpenShift node, where PostgreSQL pod is to be scheduled at, is / was also the NFS server. If you are using different setup, you will need to modify the scenario above appropriately.
Current result: Expected result: |
If the POD is artificially slow, one should probably ensure that the initial liveness probe delay is long enough, I'm afraid. |
@praiskup JFTR, the reproducer above was just to emulate scenario that happens in real clusters hosted in cloud (having network latencies etc). Not to artificially invent some unrealistic one. It is just a helper to see, what happens when PVC is slow. The issue is, often in these kind of environments, the pod deployer doesn't have control / ability to modify the PVC configuration (they can change deployment config of PostgreSQL yes, but it's impractical in the case when you want to have one universal template, that would work for every environment). Besides that, images intended for OpenShift should be designed to be stateless (since OpenShift can at any time bounce the pod due to some before unclear, cluster internal / specific reason). Using persistent storage & having the chance to launch the pod only once (because on 2nd run initdb will fail due to the |
Hmpfs, let's skip the "stateless VS databases" topic. OpenShift seems to have only static limits for liveness/readiness/start-up... (there's no way to ask OpenShift how fast it is to set-up the limits) so we have to pick some value, which is sane for general use. And those who are running very slow OpenShift instance or have a non-standard use-case need to pick different values for timeouts. Yes, the initdb shouldn't end-up in inconsistent state (that's what your PRs are about). But that is just corner-case handling; you are dealing with "for PG" unusable stack, with very low limits, and just making the "initdb" part atomic won't help you (even if that was atomic, OpenShift would keep retrying && failing, till it fails entirely anyways...). |
Since we have no real deadlocks to be detected in livenessProbe for now, lets have it (effectively) disabled for now. The liveness probe used to cause issues before, namely because it: - killed initdb on a rather slow storage - would pg_upgrade for rather large data directory - killed pod when PostgreSQL was in crash recovery Fixes: sclorg#313, sclorg#309 Relates: sclorg#316
Since we have no real deadlocks to be detected in livenessProbe for now, lets have it (effectively) disabled for now. The liveness probe used to cause issues before, namely because it: - killed initdb on a rather slow storage - would pg_upgrade for rather large data directory - killed pod when PostgreSQL was in crash recovery Fixes: #313, #309 Relates: #316
This should be now fixed by #320 |
…nitialDelaySeconds" to prevent readiness / liveness probe to end up DB pod lifecycle prematurely (yet before the DB server has had chance properly to initialize) Customize: * "timeoutSeconds" to 10 and * "initialDelaySeconds" to 90 and also specify: * "successThreshold:" to 1 and * "failureThreshold" to 3 on readiness and liveness probes in persistent templates to avoid readiness / liveness probe to bail out DB pod lifecycle prematurely due to: * "Inappropriate ioctl for device" event (case of MySQL probes), or * Issues like sclorg/postgresql-container#309 (case of PostgreSQL probes) Signed-off-by: Jan Lieskovsky <jlieskov@redhat.com>
…nitialDelaySeconds" to prevent readiness / liveness probe to end up DB pod lifecycle prematurely (yet before the DB server has had chance properly to initialize) Customize: * "timeoutSeconds" to 10 and * "initialDelaySeconds" to 90 and also specify: * "successThreshold:" to 1 and * "failureThreshold" to 3 on readiness and liveness probes in persistent templates to avoid readiness / liveness probe to bail out DB pod lifecycle prematurely due to: * "Inappropriate ioctl for device" event (case of MySQL probes), or * Issues like sclorg/postgresql-container#309 (case of PostgreSQL probes) Signed-off-by: Jan Lieskovsky <jlieskov@redhat.com>
…nitialDelaySeconds" to prevent readiness / liveness probe to end up DB pod lifecycle prematurely (yet before the DB server has had chance properly to initialize) Customize: * "timeoutSeconds" to 10 and * "initialDelaySeconds" to 90 and also specify: * "successThreshold:" to 1 and * "failureThreshold" to 3 on readiness and liveness probes in persistent templates to avoid readiness / liveness probe to bail out DB pod lifecycle prematurely due to: * "Inappropriate ioctl for device" event (case of MySQL probes), or * Issues like sclorg/postgresql-container#309 (case of PostgreSQL probes) Signed-off-by: Jan Lieskovsky <jlieskov@redhat.com>
…nitialDelaySeconds" to prevent readiness / liveness probe to end up DB pod lifecycle prematurely (yet before the DB server has had chance properly to initialize) Customize: * "timeoutSeconds" to 10 and * "initialDelaySeconds" to 90 and also specify: * "successThreshold:" to 1 and * "failureThreshold" to 3 on readiness and liveness probes in persistent templates to avoid readiness / liveness probe to bail out DB pod lifecycle prematurely due to: * "Inappropriate ioctl for device" event (case of MySQL probes), or * Issues like sclorg/postgresql-container#309 (case of PostgreSQL probes) Signed-off-by: Jan Lieskovsky <jlieskov@redhat.com>
…nitialDelaySeconds" to prevent readiness / liveness probe to end up DB pod lifecycle prematurely (yet before the DB server has had chance properly to initialize) Customize: * "timeoutSeconds" to 10 and * "initialDelaySeconds" to 90 and also specify: * "successThreshold:" to 1 and * "failureThreshold" to 3 on readiness and liveness probes in persistent templates to avoid readiness / liveness probe to bail out DB pod lifecycle prematurely due to: * "Inappropriate ioctl for device" event (case of MySQL probes), or * Issues like sclorg/postgresql-container#309 (case of PostgreSQL probes) Signed-off-by: Jan Lieskovsky <jlieskov@redhat.com>
…nitialDelaySeconds" to prevent readiness / liveness probe to end up DB pod lifecycle prematurely (yet before the DB server has had chance properly to initialize) Customize: * "timeoutSeconds" to 10 and * "initialDelaySeconds" to 90 and also specify: * "successThreshold:" to 1 and * "failureThreshold" to 3 on readiness and liveness probes in persistent templates to avoid readiness / liveness probe to bail out DB pod lifecycle prematurely due to: * "Inappropriate ioctl for device" event (case of MySQL probes), or * Issues like sclorg/postgresql-container#309 (case of PostgreSQL probes) Signed-off-by: Jan Lieskovsky <jlieskov@redhat.com>
When the OpenShift cluster is under higher load (hundreds of tests running in sequential run, 3-5 tests running in parallel), it might take longer time for the PostgreSQL pod to start. If the database initialization of the pod started in 1th run, but didn't correctly finish, e.g.:
And the readiness / liveness probe "decided" to pod in question needs to be restarted, a subsequent pod will end up in Crash Loop Back-off state with the msg like the following one in the pod log:
And this scenario (try to restart the PostgreSQL pod, which subsequently fails with
psql: FATAL: database "postgres" does not exist
) is then retried couple of times, till the default deploymentconfig timeout (600seconds IIRC) is reached.JFTR, the aforementioned behaviour was observed with
rhscl/postgresql-95-rhel7:latest
image (but assuming looking at the code, the different image versions might be prone to the very same issue).The text was updated successfully, but these errors were encountered: