-
Notifications
You must be signed in to change notification settings - Fork 95
Shared plugin: etcd starting up functions. #1611
Shared plugin: etcd starting up functions. #1611
Conversation
etcdPeerPort = ":2380" | ||
etcdClusterToken = "vsphere-shared-etcd-cluster" | ||
etcdListenURL = "0.0.0.0" | ||
etcdScheme = "http://" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
etcdProtocol? Or etcdTransport?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"scheme" is the corresponding member name of URL struct: https://golang.org/pkg/net/url/#URL
|
||
// InitEtcd start or join ETCD cluster depending on the role of the node | ||
func InitEtcd() error { | ||
// create new docker client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was adding similar things to shared_plugin.go. Should we move the docker Client to VolumeDriver structure and the docker info part to a function in shared_plugin.go?
We might want to call docker info multiple times at various places. Perhaps a better place will be shared_plugin.go. A function getsDockerInfo() can return a structure similar to this:
53 // Docker swarm information
54 type SwarmMetadata struct {
55 isManager bool
56 isLeader bool
57 selfIP string
58 managers []string
59 }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will depend on how and where we will use docker client.
I am fine to put it into the VolumeDriver structure later when it's used in other situations.
I am not sure about getsDockerInfo() function though.
The returned info already has those information, why bother setting them into another struct?
And those information are changing so we cannot reuse them...
node, _, err := cli.NodeInspectWithRaw(ctx, nodeID) | ||
if err != nil { | ||
log.WithFields(log.Fields{"nodeID": nodeID, | ||
"error": err}).Error("Failted to inspect node ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failed. Typo.
return err | ||
} | ||
|
||
// get the IP address of current node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info.Swarm.NodeAddr can give you this node's IP address.
BTW this is docker CLI code. Might be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes for here we can get NodeAddr from Info.Swarm.NodeAddr.
However later when we need to get the address of leader from a non-leader manager,
we cannot do "docker info" for a remote node, but can only do "docker node inspect".
The returned struct of NodeInspectWithRaw is swarm.Node structure, instead of types.Info.
So we have to go through swarm.Node.ManagerStatus.Addr to get the addresses...
And then I decide to reuse that code :)
nodes, err := cli.NodeList(ctx, types.NodeListOptions{}) | ||
if err != nil { | ||
log.WithFields(log.Fields{"nodeID": nodeID, | ||
"error": err}).Error("Failted get NodeList from swarm manager") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo. Failed.
return err | ||
} | ||
|
||
peerAddr := etcdScheme + nodeAddr + etcdPeerPort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can peerAddr be renamed to nodeURL or something? this address is of local node. "Peer" word is confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
peerAddr is corresponding to the peerURLs inside etcd member structure:
https://coreos.com/etcd/docs/latest/v2/members_api.html
which means the addresses for peer traffic.
And it's matching with etcd MemberAdd function:
https://github.com/coreos/etcd/blob/d51d381eca406156fd7e44118d4b0dab03589354/clientv3/cluster.go#L36
Since we are dealing with etcd here, I think it's better to keep consistent name with etcd structures and functions.
|
||
initCluster := "" | ||
if !existing { | ||
peerAddrs := []string{peerAddr} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am guessing member add() accepts a list of strings so we converted it to string[]. But "peer" is confusing. If peerAddrs holds addr of this node, can the name nodeURLs be picked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/coreos/etcd/blob/d51d381eca406156fd7e44118d4b0dab03589354/clientv3/cluster.go#L36
It's the right name in ETCD world...
It especially means the URL addresses for listening peer traffic.
dbea5c7
to
a811f9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I am looking for is a function which returns:
- Who am I? Worker/Manager
As this information is needed in most workflows - My IP address
- Manager's IP address
Since you're already fetching this info, if you can put it in form of a function that returns the same, we can reuse that in many workflows.
Also, in my opinion, code interacting with docker should be moved to shared_plugin.go
Other than that this LGTM.
Since we anyway need to call docker info every time and there is no extra processing needed, IMO there is no need to add an extra function wrapper... How it looks like with the function wrapper:
How it looks like without function wrapper:
So if we compare above, the function wrapper is not saving any code reuse. BTW it can add an extra loop in order to get manager IP addresses. So I would prefer not using a wrapper here. |
@luomiao - I have a few questions, added with |
return nil | ||
} | ||
|
||
func etcdRoutine(cmd []string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name feels wrong. etcdService() or etcdLoop() or even etcdDaemon() are closer to what's really going on
119c888
to
db8f921
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
A few string/err messages need improvement but overall the logic is clear. I did not check for releasing objects like ctx or client (I assume you checked if it's needed) .
One request - please take a look to either flatten out the nesting a bit, or comment all nodes of the decision tree.
Other that, ready to merge
if info.Swarm.ControlAvailable == false { | ||
log.WithFields( | ||
log.Fields{"nodeID": nodeID}, | ||
).Info("Swarm node role: worker, return from InitEtcd ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this info message is confusing.... maybe "Swarm node role: worker. Action: return from InitEtcd "
(same comment to other tracking Info messages)
nodes, err := cli.NodeList(ctx, types.NodeListOptions{}) | ||
if err != nil { | ||
log.WithFields(log.Fields{"nodeID": nodeID, | ||
"error": err}).Error("Failed get NodeList from swarm manager") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'failed to get...'
peerAddr := etcdScheme + nodeAddr + etcdPeerPort | ||
existing := false | ||
for _, member := range lresp.Members { | ||
if member.PeerURLs[0] == peerAddr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should be a better way to find it . Have you checked existing packages ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Etcd clientv3 only have 4 member API functions:
https://github.com/coreos/etcd/blob/d51d381eca406156fd7e44118d4b0dab03589354/clientv3/cluster.go#L31
And there is no API function to locate a certain member...
} | ||
} | ||
|
||
log.Errorf("Failed to create etcd client according to manager info") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to print extra inffo here; if we got here it's likely there was no debug print yet
} | ||
|
||
func addrToEtcdClient(addr string) (*etcdClient.Client, error) { | ||
s := strings.Split(addr, ":") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment the function please. Expecially the string parsing part...
return err | ||
} | ||
} | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why break? Generally, this loop nesting looks too deep , so it's either need to be flattened out or (more likley) more comments added at each node of the decision tree here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we don't need to continue the comparison since one IP address can only join once. No need to continue the check.
I will add some more comments here.
Design: Swarm leader node starts an etcd cluster when the shared volume driver is initialized. Swarm manager nodes join the etcd cluster through the leader node. Swarm worker nodes do nothing. The etcd cluster resides inside plugin runc container. Thus when plugin is stopped, the etcd member inside the container dies. When the plugin/VM is restarted, if the plugin find there is already a started member in the etcd cluster with the same peer url, this entry should be removed and re-joined. Due to etcd limitation, users should start the swarm cluster with the number of managers (including leader) as an odd number, in order to avoid the situation of quorum loss. For now the etcd cluster start/join only happens when the plugin is initialized. Thus if a swarm worker node has the plugin installed first and promoted to manager later, it won't join the etcd cluster. Before this is resolved, users should reinstall the plugin on a promoted node.
db8f921
to
44d3246
Compare
Design:
Swarm leader node starts an etcd cluster when the shared volume driver is initialized.
Swarm manager nodes join the etcd cluster through the leader node.
Swarm worker nodes do nothing.
The etcd cluster resides inside plugin runc container.
Thus when plugin is stopped, the etcd member inside the container dies.
When the plugin/VM is restarted, if the plugin find there is already a started member
in the etcd cluster with the same peer url, this entry is removed by current plugin and continue to re-join.
msterin>
Who should do it ? Is it a part of this PR of future PRs ?miao>
This PR include the remove and re-join already. modified the statement above to clarify.Note 1:
Due to etcd limitation, users should make sure the swarm cluster has the number of managers
(including leader) = 1 or >2, in order to avoid the situation of quorum loss.
msterin>
what happens if they don't ? Do we print a log message with recommendation? What happens if they loose quorum ?miao>
I updated the statement above to be more accurate. The quorum loss in our case mostly happens when 1 member adding into 1-node cluster and left. This member cannot rejoin due to quorum loss. But if 2 members joined to 1-node cluster (became 3-node cluster) and 1 member left, the quorum still exist (etcd-io/etcd#6420). When the quorum loss happens, etcd cluster can still work but in unhealthy mode, which means it cannot add/remove members. We throw out error in log if users tried to start a new master plugin and failed. This is not recoverable for now but etcd has an opening progressing issue on it. I think it's ok for us to add recommendation in Readme and simply fail the new manager plugin when quorum loss happened, before ETCD make progress on this quorum loss situation.Note 2:
For now the etcd cluster start/join only happens when the plugin is initialized.
Thus if a swarm worker node has the plugin installed first and promoted to manager later,
it won't join the etcd cluster. Before this is resolved, users should reinstall the plugin
on a promoted node.
msterin>
- is there an issue for this (or work item on internal tracking) ? Also, can we print a message if this happens ? We can do a periodic (each 5 min ? ) poll to check the node status and the etcd node status. Unless the issue is planned to be fixed soon.miao>
This needs more investigation and I will create a work item on this issue. We can do periodic poll to trigger etcd member operations if the plugin cannot get notification when a node is promoted.Note 3:
We are using the default docker swarm network (IP address from docker info) for now. And as a result, the new plugin configuration requires Network type = host.
msterin>
Same comment - we need either an issue or internal work item to track, and in any case please write up specific commands to configure pluginmiao>
yes we have a work item on it. There is no command needed to be done by users though. The config.json of the plugin build already has it by default, included with this PR.Dockerfile change:
Create another dockerfile template so we have two dockerfile templates for the two plugins separately. This is because the new shared plugin requires etcd to be installed. The difference is too big for using the same template.
Test:
e2e test passed.