Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Game Server health checking #15

Closed
markmandel opened this issue Dec 8, 2017 · 0 comments
Closed

Game Server health checking #15

markmandel opened this issue Dec 8, 2017 · 0 comments
Assignees
Labels
area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc kind/design Proposal discussing new features / fixes and how they should be implemented kind/feature New features for Agones
Milestone

Comments

@markmandel
Copy link
Member

markmandel commented Dec 8, 2017

Design

Health Configuration

apiVersion: "stable.agon.io/v1alpha1"
kind: GameServer
metadata:
  name: "simple-udp"
spec:
  portPolicy: "static"
  containerPort: 7654
  hostPort: 7777
  # new health section
  health:
    # defaults to false, but can be set to true
    disabled: false
    # If the `Health()` function doesn't get called at least once every timeout seconds, then
    # the game server is not healthy. Defaults to "5"
    periodSeconds: 3
    # Minimum consecutive failures for the probe to be considered failed after having succeeded. Defaults to 3. Minimum value is 1.
    failureThreshold: 3
    # Number of seconds after the container has started before health check is initiated. Defaults to 5 seconds
    initialDelaySeconds: 5
  template:
    spec:
      containers:
      - name: simple-udp
        image: gcr.io/agon-images/udp-server:0.1

SDK API

SDK.Health()

The Health() function on the SDK object ill need to be called regularly below the timeout threshold time to be considered healthy.

Failure

  1. If any of the backing Pod containers fails for any reason before the GameServer moves to Ready then, it should restart as per the restartPolicy (which defaults to "Always")
  2. If the GameServer Pod fails (for any reason) after the Ready state, then it doesn't restart, but moves the GameServer to an Unhealthy state - and then it's up to the managing code to determine what to do at that point.
  3. If the SDK sidecar fails, then it should always be restarted.

Outside scope

  • The GameServer Pod fails to be scheduled for any reason, such as lack of resources (cpu, memory) or a being allocated an unavailable port - this will be managed in a separate ticket

Implementation

gRPC

Health is a unidirectional stream from the gameserver client -> the sidecar. The sidecar will update the State to UnHealthy if it doesn't receive a healthcheck event within the allotted time.

Kubernetes

(Theory - need investigation)

  1. The sidecar will proxy the liveness check for the GameServer container through a gshealthz url endpoint. It will track Health() messages and if they drop below the set threshhold, return a 500.
  2. Once the GameServer is Ready, then this will always return 200 - which (in theory) should mean that Kubernetes will never restart the GameServer container.
@markmandel markmandel added area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc kind/design Proposal discussing new features / fixes and how they should be implemented kind/feature New features for Agones labels Dec 8, 2017
@markmandel markmandel self-assigned this Jan 11, 2018
@markmandel markmandel added this to the 0.1 milestone Jan 17, 2018
markmandel added a commit that referenced this issue Feb 3, 2018
This includes updates to the SDK, and
controller code to manage the health
lifecycle of a GameServer.

Examples have also been updated, as well
as some basic documentation.

Closes #15
markmandel added a commit that referenced this issue Feb 4, 2018
This includes updates to the SDK, and
controller code to manage the health
lifecycle of a GameServer.

Examples have also been updated, as well
as some basic documentation.

Closes #15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc kind/design Proposal discussing new features / fixes and how they should be implemented kind/feature New features for Agones
Projects
None yet
Development

No branches or pull requests

1 participant