-
Notifications
You must be signed in to change notification settings - Fork 801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Game server container crash before Ready, should restart, not move to Unhealthy #956
Comments
I also met this issue, version table see below:
BTW, it seems that our gameserver\fleet crd not maintains POD replica count, if I found the pod is unhealthy, and maunally deleted it, things would be broken, and gameserver/fleet would never try to recreate a new POD... Thanks. |
We have an e2e test for this, so in theory this shouldn't happen, but if you are finding this an issue - please file a new bug with detailed replication instructions so that we can reproduce and fix. Thank you! 👍 |
I was trying to develop such E2E test. It seems that we need a separate
|
I think we could make it a command line argument for the simple-udp startup script, and then overwrite the entrypoint in the configuration (rather than a whole new image) |
I have created a branch which contains a UT with the part of health checks from https://agones.dev/site/docs/guides/health-checking/#health-failure-strategy |
This provides the CRASH command, to make GameServer crash testing easier. This also includes a new param to disable automatically moving to `Ready` on startup, to enable testing crash events before and after a `Ready` state has been achieved. Work on googleforgames#956
This provides the CRASH command, to make GameServer crash testing easier. This also includes a new param to disable automatically moving to `Ready` on startup, to enable testing crash events before and after a `Ready` state has been achieved. Work on #956
Also moves udp-simple from 0.14=>0.15 More preparation for googleforgames#956
Also moves udp-simple from 0.14=>0.15 More preparation for googleforgames#956
Also moves udp-simple from 0.14=>0.15 More preparation for googleforgames#956
Also moves udp-simple from 0.14=>0.15 More preparation for googleforgames#956
Also moves udp-simple from 0.14=>0.15 More preparation for googleforgames#956
FYI - I think I have a handle on this. It's a bit fun and tricky, but slowly taking it apart. |
Also moves udp-simple from 0.14=>0.15 More preparation for googleforgames#956
Also moves udp-simple from 0.14=>0.15 More preparation for googleforgames#956
Also moves udp-simple from 0.14=>0.15 More preparation for googleforgames#956
Also moves udp-simple from 0.14=>0.15 More preparation for googleforgames#956
Also moves udp-simple from 0.14=>0.15 More preparation for googleforgames#956
Also moves udp-simple from 0.14=>0.15 More preparation for googleforgames#956
* E2E test for Unhealthy GameServer on process crash More preparation for #956 * Include logic to test LastTerminatedState, since it looks like an update event can skip the State and move directly to LastTerminatedState.
This brings our implementation inline with what our health checking documentation states that we do. To solve this, we store the currently running GameServer containerID (which is unique to that running instance), as an annotation on the GameServer Pod. When the annotation is not there, we know the Pod is not yet Ready, so we can ignore it when our Unhealthy check occurs, and restarts of the GameServer container can happen as per usual. When the annotation is there, we know to check for failure, but to avoid ContainerStatus.LastTerminationState polluting the result since that crash/failure may have happened before the GameServer is ready, we can compare the current ContainerID to the one stored in the annotation -- which when equal means we can skip looking at the LastTerminationState, as it was before the GameServer was marked as Ready. Lots of unit and e2e test updates to go with this to test it out. Closes googleforgames#956
This brings our implementation inline with what our health checking documentation states that we do. To solve this, we store the currently running GameServer containerID (which is unique to that running instance), as an annotation on the GameServer Pod. When the annotation is not there, we know the Pod is not yet Ready, so we can ignore it when our Unhealthy check occurs, and restarts of the GameServer container can happen as per usual. When the annotation is there, we know to check for failure, but to avoid ContainerStatus.LastTerminationState polluting the result since that crash/failure may have happened before the GameServer is ready, we can compare the current ContainerID to the one stored in the annotation -- which when equal means we can skip looking at the LastTerminationState, as it was before the GameServer was marked as Ready. Lots of unit and e2e test updates to go with this to test it out. Closes googleforgames#956
This brings our implementation inline with what our health checking documentation states that we do. To solve this, we store the currently running GameServer containerID (which is unique to that running instance), as an annotation on the GameServer Pod. When the annotation is not there, we know the Pod is not yet Ready, so we can ignore it when our Unhealthy check occurs, and restarts of the GameServer container can happen as per usual. When the annotation is there, we know to check for failure, but to avoid ContainerStatus.LastTerminationState polluting the result since that crash/failure may have happened before the GameServer is ready, we can compare the current ContainerID to the one stored in the annotation -- which when equal means we can skip looking at the LastTerminationState, as it was before the GameServer was marked as Ready. Lots of unit and e2e test updates to go with this to test it out. Closes googleforgames#956
This brings our implementation inline with what our health checking documentation states that we do. To solve this, we store the currently running GameServer containerID (which is unique to that running instance), as an annotation on the GameServer Pod. When the annotation is not there, we know the Pod is not yet Ready, so we can ignore it when our Unhealthy check occurs, and restarts of the GameServer container can happen as per usual. When the annotation is there, we know to check for failure, but to avoid ContainerStatus.LastTerminationState polluting the result since that crash/failure may have happened before the GameServer is ready, we can compare the current ContainerID to the one stored in the annotation -- which when equal means we can skip looking at the LastTerminationState, as it was before the GameServer was marked as Ready. Lots of unit and e2e test updates to go with this to test it out. Closes googleforgames#956
This brings our implementation inline with what our health checking documentation states that we do. To solve this, we store the currently running GameServer containerID (which is unique to that running instance), as an annotation on the GameServer Pod. When the annotation is not there, we know the Pod is not yet Ready, so we can ignore it when our Unhealthy check occurs, and restarts of the GameServer container can happen as per usual. When the annotation is there, we know to check for failure, but to avoid ContainerStatus.LastTerminationState polluting the result since that crash/failure may have happened before the GameServer is ready, we can compare the current ContainerID to the one stored in the annotation -- which when equal means we can skip looking at the LastTerminationState, as it was before the GameServer was marked as Ready. Lots of unit and e2e test updates to go with this to test it out. Closes googleforgames#956
This brings our implementation inline with what our health checking documentation states that we do. To solve this, we store the currently running GameServer containerID (which is unique to that running instance), as an annotation on the GameServer Pod. When the annotation is not there, we know the Pod is not yet Ready, so we can ignore it when our Unhealthy check occurs, and restarts of the GameServer container can happen as per usual. When the annotation is there, we know to check for failure, but to avoid ContainerStatus.LastTerminationState polluting the result since that crash/failure may have happened before the GameServer is ready, we can compare the current ContainerID to the one stored in the annotation -- which when equal means we can skip looking at the LastTerminationState, as it was before the GameServer was marked as Ready. Lots of unit and e2e test updates to go with this to test it out. Closes googleforgames#956
This brings our implementation inline with what our health checking documentation states that we do. To solve this, we store the currently running GameServer containerID (which is unique to that running instance), as an annotation on the GameServer Pod. When the annotation is not there, we know the Pod is not yet Ready, so we can ignore it when our Unhealthy check occurs, and restarts of the GameServer container can happen as per usual. When the annotation is there, we know to check for failure, but to avoid ContainerStatus.LastTerminationState polluting the result since that crash/failure may have happened before the GameServer is ready, we can compare the current ContainerID to the one stored in the annotation -- which when equal means we can skip looking at the LastTerminationState, as it was before the GameServer was marked as Ready. Lots of unit and e2e test updates to go with this to test it out. Closes googleforgames#956
This brings our implementation inline with what our health checking documentation states that we do. This is done by implementing extra checks in the HealthController to determine if it's appropriate to move to Unhealthy rather than allow a restart to occur. Replaced PR googleforgames#1069 Closes googleforgames#956
This brings our implementation inline with what our health checking documentation states that we do. This is done by implementing extra checks in the HealthController to determine if it's appropriate to move to Unhealthy rather than allow a restart to occur. Replaced PR googleforgames#1069 Closes googleforgames#956
This brings our implementation inline with what our health checking documentation states that we do. This is done by implementing extra checks in the HealthController to determine if it's appropriate to move to Unhealthy rather than allow a restart to occur. Replaced PR googleforgames#1069 Closes googleforgames#956
This brings our implementation inline with what our health checking documentation states that we do. This is done by implementing extra checks in the HealthController to determine if it's appropriate to move to Unhealthy rather than allow a restart to occur. Replaced PR googleforgames#1069 Closes googleforgames#956
This brings our implementation inline with what our health checking documentation states that we do. This is done by implementing extra checks in the HealthController to determine if it's appropriate to move to Unhealthy rather than allow a restart to occur. Replaced PR googleforgames#1069 Closes googleforgames#956
…ter (#1099) This brings our implementation inline with what our health checking documentation states that we do. This is done by implementing extra checks in the HealthController to determine if it's appropriate to move to Unhealthy rather than allow a restart to occur. Replaced PR #1069 Closes #956
What happened:
If the game server container crashes at any stage, the GameServer moved to Unhealthy straight away.
What you expected to happen:
According to: https://agones.dev/site/docs/guides/health-checking/
How to reproduce it (as minimally and precisely as possible):
We will need a e2e test that has a CRASH command on udp-simple that does a
sdk.Exit(1)
and then test what happens when a crash occurs beforeReady
and afterReady
.Anything else we need to know?:
Logic can be found here:
https://github.com/googleforgames/agones/blob/master/pkg/gameservers/health.go#L87
Environment:
kubectl version
): 1.12The text was updated successfully, but these errors were encountered: