-
Notifications
You must be signed in to change notification settings - Fork 9.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bitnami/etcd] redirect output to new_member_envs #1642
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sboschman
Thanks for the PR! As I mentioned at #1640
The issue isn't related with this redirection but a problem in the regex used to check whether the member was removed from the cluster.
This Pull Request has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thank you for your contribution. |
@sboschman I'm curious to know if this change actually made a difference to resolve the issue? I wonder of the pipe operation followed by the redirect somehow causes the issue. Could you be able to test it out with this instead: (etcdctl $AUTH_OPTIONS member add "$HOSTNAME" --peer-urls="{{ $etcdPeerProtocol }}://${HOSTNAME}.{{ $etcdHeadlessServiceName }}.{{ .Release.Namespace }}.svc.{{ $clusterDomain }}:{{ $peerPort }}" | grep "^ETCD_" > "$ETCD_DATA_DIR/new_member_envs") 1>&3 2>&4 |
Since #1650 it has become harder to fix my etcd clusters. It depends on which member you are trying to get up and running again what goes wrong. But setup.sh gets itself into a state it can't recover from. Steps to get it running again involve removing the data dirs from etcd-1 and/or etcd-2, running etcdctl to remove an unstarted member and multiple hacks in setup.sh. One thing is getting the new_member_envs file actually filled (before #1650 this seemed to be the only change needed). E.g. at https://github.com/bitnami/charts/blob/master/bitnami/etcd/templates/scripts-configmap.yaml#L136 it tries to save the member id of etcd-2, but etcd-2 is not a member yet, so member_id is empty and later on something fails on this missing member_id. So I added https://github.com/bitnami/charts/blob/master/bitnami/etcd/templates/scripts-configmap.yaml#L165 to it, so I can rejoin etcd-2 to an existing cluster. Getting etcd-0 and etcd-1 up and running is a tricky thing as well. Start etcd-0, it will not complete startup cause it tries to find etcd-1 and etcd-2. While etcd-0 is looping, and before k8s terminates it, you have to start etcd-1, with a cleaned date dir. If you are lucky they will find each other and etcd-0 will complete startup, followed by etcd-1. But the pod crashed faster than the logging is written out, so I don't have the root issue, all I see is the error from restart > 1, where it already has put itself into a state it can't recover from. |
@sameersbn there are several known issues with the current logic to "automagically" recover the etcd cluster for each of the different scenarios. I already created a task to deeply investigate/fix this issue but we didn't have the chance to work on it yet. More info at: #1513 |
Saving the env vars to file combined with stdout/stderr redirection does not seem to work. The
new_member_envs
stays empty andETCD_INITIAL_CLUSTER_STATE
is alwaysNEW
triggering the wrong path in the setup script.fixes #1640