Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bitnami/etcd] redirect output to new_member_envs #1642

Closed

Conversation

sboschman
Copy link
Contributor

Saving the env vars to file combined with stdout/stderr redirection does not seem to work. The new_member_envs stays empty and ETCD_INITIAL_CLUSTER_STATE is always NEW triggering the wrong path in the setup script.

fixes #1640

@sboschman sboschman changed the title [bitami/etcd] redirect output to new_member_envs [bitnami/etcd] redirect output to new_member_envs Nov 21, 2019
Copy link
Contributor

@juan131 juan131 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sboschman

Thanks for the PR! As I mentioned at #1640
The issue isn't related with this redirection but a problem in the regex used to check whether the member was removed from the cluster.

@stale
Copy link

stale bot commented Dec 7, 2019

This Pull Request has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thank you for your contribution.

@stale stale bot added the stale 15 days without activity label Dec 7, 2019
@sameersbn
Copy link
Contributor

@sboschman I'm curious to know if this change actually made a difference to resolve the issue? I wonder of the pipe operation followed by the redirect somehow causes the issue. Could you be able to test it out with this instead:

(etcdctl $AUTH_OPTIONS member add "$HOSTNAME" --peer-urls="{{ $etcdPeerProtocol }}://${HOSTNAME}.{{ $etcdHeadlessServiceName }}.{{ .Release.Namespace }}.svc.{{ $clusterDomain }}:{{ $peerPort }}" | grep "^ETCD_" > "$ETCD_DATA_DIR/new_member_envs") 1>&3 2>&4

@stale stale bot removed the stale 15 days without activity label Dec 10, 2019
@sboschman
Copy link
Contributor Author

sboschman commented Dec 11, 2019

Since #1650 it has become harder to fix my etcd clusters. It depends on which member you are trying to get up and running again what goes wrong. But setup.sh gets itself into a state it can't recover from.

Steps to get it running again involve removing the data dirs from etcd-1 and/or etcd-2, running etcdctl to remove an unstarted member and multiple hacks in setup.sh. One thing is getting the new_member_envs file actually filled (before #1650 this seemed to be the only change needed).

E.g. at https://github.com/bitnami/charts/blob/master/bitnami/etcd/templates/scripts-configmap.yaml#L136 it tries to save the member id of etcd-2, but etcd-2 is not a member yet, so member_id is empty and later on something fails on this missing member_id. So I added https://github.com/bitnami/charts/blob/master/bitnami/etcd/templates/scripts-configmap.yaml#L165 to it, so I can rejoin etcd-2 to an existing cluster.

Getting etcd-0 and etcd-1 up and running is a tricky thing as well. Start etcd-0, it will not complete startup cause it tries to find etcd-1 and etcd-2. While etcd-0 is looping, and before k8s terminates it, you have to start etcd-1, with a cleaned date dir. If you are lucky they will find each other and etcd-0 will complete startup, followed by etcd-1. But the pod crashed faster than the logging is written out, so I don't have the root issue, all I see is the error from restart > 1, where it already has put itself into a state it can't recover from.

@juan131
Copy link
Contributor

juan131 commented Dec 13, 2019

@sameersbn there are several known issues with the current logic to "automagically" recover the etcd cluster for each of the different scenarios.

I already created a task to deeply investigate/fix this issue but we didn't have the chance to work on it yet.

More info at: #1513

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bitnami/etcd]
3 participants