Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add healthz #722

Merged
merged 1 commit into from
Jun 6, 2017
Merged

add healthz #722

merged 1 commit into from
Jun 6, 2017

Conversation

andyxning
Copy link
Contributor

Fix #616

This PR will add a healthz endpoint in http for flanneld by default in 0.0.0.0:8285.

Note: This PR is based on #632 .

/cc @eyakubovich @tomdee @zbwright

@andyxning andyxning force-pushed the add_healthz branch 3 times, most recently from d74069e to d66bf59 Compare May 17, 2017 09:01
@andyxning
Copy link
Contributor Author

@eyakubovich @tomdee @zbwright Please take a look at the CI error. It seems nothing is related with the change. Seems there are more than one flannel instance is running simultaneous in ./dist/functional-test.sh.

@tomdee
Copy link
Contributor

tomdee commented May 17, 2017

@andyxning I'm not sure I see the benefit of this PR - what's it actually checking (except maybe that the Go runtime hasn't crashed)

@andyxning
Copy link
Contributor Author

andyxning commented May 18, 2017

@tomdee Currently, flannel listen on udp port 8285 by default, we have no methods to do a health check about the status of flanneld daemon, i.e., whether it is running or stopped accidentally. We have encountered a situation where flanneld has been stopped accidentally for more than two days without any notice, and when we found this and restart it, the newly allocated subnet is different from the original one cause we have not renew the lease about the original subnet. This makes it possible that we have two nodes with the same subnet which is not acceptable.

So, the original thought is we need a way to check whether flanneld is running or not. No much other logic. :)

Maybe you want to add more real checks for the health check endpoint, such as where flanneld can connect to etcd successfully. This is more good. But, for now, we just want to check whether flanneld daemon is running or stopped.

@andyxning
Copy link
Contributor Author

/ping @tomdee

@tomdee
Copy link
Contributor

tomdee commented May 19, 2017

I'm not totally against this, but before I can merge it you need to add some documentation for it and ensure the tests pass. I also think it should be an optional feature - and probably off by default.

@andyxning
Copy link
Contributor Author

@tomdee Will Do.

@andyxning andyxning force-pushed the add_healthz branch 3 times, most recently from 1c53100 to 4a78a4d Compare May 19, 2017 02:58
@andyxning
Copy link
Contributor Author

@tomdee Done. PTAL.

@andyxning
Copy link
Contributor Author

/ping @tomdee

@andyxning
Copy link
Contributor Author

Friendly ping @tomdee

@andyxning
Copy link
Contributor Author

@tomdee This health check is disable by default. I have not test whether it can pass all the tests when enabled by default. :)

@andyxning
Copy link
Contributor Author

Friendly Ping @tomdee :)

Copy link
Contributor

@tomdee tomdee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @andyxning I have a comment and question.


## Health Check

Flannel provides an health check http endpoint `healthz`. Currently this endpoint will blindly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: should be "a health check"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.


if err := http.ListenAndServe(address, nil); err != nil {
log.Errorf("Start healthz server error. %v", err)
panic(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why panic here? Is it standard practice to use panic with http.ListenAndServe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. panic because this method is called mustRunHealthz. According to golang convention, we should panic when we can not do some thing that must be done.

The reason for why we use mustRunHealthz is because is think we should must start the health check server once we enable it. If not, we should panic or stop flannel to give user more clear result.

@tomdee WDYT.

@tomdee tomdee merged commit 402fdde into flannel-io:master Jun 6, 2017
@andyxning andyxning deleted the add_healthz branch June 6, 2017 17:07
@andyxning
Copy link
Contributor Author

@tomdee Many thanks. 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants