Shared plugin: etcd starting up functions. #1611

luomiao · 2017-07-20T00:49:51Z

Design:
Swarm leader node starts an etcd cluster when the shared volume driver is initialized.
Swarm manager nodes join the etcd cluster through the leader node.
Swarm worker nodes do nothing.

The etcd cluster resides inside plugin runc container.
Thus when plugin is stopped, the etcd member inside the container dies.
When the plugin/VM is restarted, if the plugin find there is already a started member
in the etcd cluster with the same peer url, this entry is removed by current plugin and continue to re-join.
msterin> Who should do it ? Is it a part of this PR of future PRs ?
miao> This PR include the remove and re-join already. modified the statement above to clarify.

Note 1:
Due to etcd limitation, users should make sure the swarm cluster has the number of managers
(including leader) = 1 or >2, in order to avoid the situation of quorum loss.
msterin> what happens if they don't ? Do we print a log message with recommendation? What happens if they loose quorum ?
miao> I updated the statement above to be more accurate. The quorum loss in our case mostly happens when 1 member adding into 1-node cluster and left. This member cannot rejoin due to quorum loss. But if 2 members joined to 1-node cluster (became 3-node cluster) and 1 member left, the quorum still exist (etcd-io/etcd#6420). When the quorum loss happens, etcd cluster can still work but in unhealthy mode, which means it cannot add/remove members. We throw out error in log if users tried to start a new master plugin and failed. This is not recoverable for now but etcd has an opening progressing issue on it. I think it's ok for us to add recommendation in Readme and simply fail the new manager plugin when quorum loss happened, before ETCD make progress on this quorum loss situation.

Note 2:
For now the etcd cluster start/join only happens when the plugin is initialized.
Thus if a swarm worker node has the plugin installed first and promoted to manager later,
it won't join the etcd cluster. Before this is resolved, users should reinstall the plugin
on a promoted node.
msterin> - is there an issue for this (or work item on internal tracking) ? Also, can we print a message if this happens ? We can do a periodic (each 5 min ? ) poll to check the node status and the etcd node status. Unless the issue is planned to be fixed soon.
miao> This needs more investigation and I will create a work item on this issue. We can do periodic poll to trigger etcd member operations if the plugin cannot get notification when a node is promoted.

Note 3:
We are using the default docker swarm network (IP address from docker info) for now. And as a result, the new plugin configuration requires Network type = host.
msterin> Same comment - we need either an issue or internal work item to track, and in any case please write up specific commands to configure plugin
miao> yes we have a work item on it. There is no command needed to be done by users though. The config.json of the plugin build already has it by default, included with this PR.

Dockerfile change:
Create another dockerfile template so we have two dockerfile templates for the two plugins separately. This is because the new shared plugin requires etcd to be installed. The difference is too big for using the same template.

Test:
e2e test passed.

shivanshu21 · 2017-07-20T04:32:34Z

client_plugin/drivers/shared/etcdops.go

+	etcdPeerPort             = ":2380"
+	etcdClusterToken         = "vsphere-shared-etcd-cluster"
+	etcdListenURL            = "0.0.0.0"
+	etcdScheme               = "http://"


etcdProtocol? Or etcdTransport?

"scheme" is the corresponding member name of URL struct: https://golang.org/pkg/net/url/#URL

shivanshu21 · 2017-07-20T04:39:05Z

client_plugin/drivers/shared/etcdops.go

+
+// InitEtcd start or join ETCD cluster depending on the role of the node
+func InitEtcd() error {
+	// create new docker client


I was adding similar things to shared_plugin.go. Should we move the docker Client to VolumeDriver structure and the docker info part to a function in shared_plugin.go?

We might want to call docker info multiple times at various places. Perhaps a better place will be shared_plugin.go. A function getsDockerInfo() can return a structure similar to this:

53 // Docker swarm information
54 type SwarmMetadata struct {
55 isManager bool
56 isLeader bool
57 selfIP string
58 managers []string
59 }

It will depend on how and where we will use docker client.
I am fine to put it into the VolumeDriver structure later when it's used in other situations.

I am not sure about getsDockerInfo() function though.
The returned info already has those information, why bother setting them into another struct?
And those information are changing so we cannot reuse them...

shivanshu21 · 2017-07-20T04:41:19Z

client_plugin/drivers/shared/etcdops.go

+	node, _, err := cli.NodeInspectWithRaw(ctx, nodeID)
+	if err != nil {
+		log.WithFields(log.Fields{"nodeID": nodeID,
+			"error": err}).Error("Failted to inspect node ")


Failed. Typo.

shivanshu21 · 2017-07-20T04:53:47Z

client_plugin/drivers/shared/etcdops.go

+		return err
+	}
+
+	// get the IP address of current node


info.Swarm.NodeAddr can give you this node's IP address.

BTW this is docker CLI code. Might be helpful.

https://github.com/docker/cli/blob/2bfac7fcdafeafbd2f450abb6d1bb3106e4f3ccb/cli/command/system/info.go

Yes for here we can get NodeAddr from Info.Swarm.NodeAddr.
However later when we need to get the address of leader from a non-leader manager,
we cannot do "docker info" for a remote node, but can only do "docker node inspect".
The returned struct of NodeInspectWithRaw is swarm.Node structure, instead of types.Info.
So we have to go through swarm.Node.ManagerStatus.Addr to get the addresses...
And then I decide to reuse that code :)

shivanshu21 · 2017-07-20T04:58:00Z

client_plugin/drivers/shared/etcdops.go

+	nodes, err := cli.NodeList(ctx, types.NodeListOptions{})
+	if err != nil {
+		log.WithFields(log.Fields{"nodeID": nodeID,
+			"error": err}).Error("Failted get NodeList from swarm manager")


Typo. Failed.

shivanshu21 · 2017-07-20T05:04:39Z

client_plugin/drivers/shared/etcdops.go

+		return err
+	}
+
+	peerAddr := etcdScheme + nodeAddr + etcdPeerPort


can peerAddr be renamed to nodeURL or something? this address is of local node. "Peer" word is confusing.

peerAddr is corresponding to the peerURLs inside etcd member structure:
https://coreos.com/etcd/docs/latest/v2/members_api.html
which means the addresses for peer traffic.

And it's matching with etcd MemberAdd function:
https://github.com/coreos/etcd/blob/d51d381eca406156fd7e44118d4b0dab03589354/clientv3/cluster.go#L36

Since we are dealing with etcd here, I think it's better to keep consistent name with etcd structures and functions.

shivanshu21 · 2017-07-20T05:07:07Z

client_plugin/drivers/shared/etcdops.go

+
+	initCluster := ""
+	if !existing {
+		peerAddrs := []string{peerAddr}


I am guessing member add() accepts a list of strings so we converted it to string[]. But "peer" is confusing. If peerAddrs holds addr of this node, can the name nodeURLs be picked?

https://github.com/coreos/etcd/blob/d51d381eca406156fd7e44118d4b0dab03589354/clientv3/cluster.go#L36
It's the right name in ETCD world...
It especially means the URL addresses for listening peer traffic.

shivanshu21

What I am looking for is a function which returns:

Who am I? Worker/Manager
As this information is needed in most workflows
My IP address
Manager's IP address

Since you're already fetching this info, if you can put it in form of a function that returns the same, we can reuse that in many workflows.

Also, in my opinion, code interacting with docker should be moved to shared_plugin.go

Other than that this LGTM.

luomiao · 2017-07-20T20:45:49Z

Since we anyway need to call docker info every time and there is no extra processing needed, IMO there is no need to add an extra function wrapper...

How it looks like with the function wrapper:

type SwarmMetadata struct {
isManager bool
isLeader bool
selfIP string
managers []string
}

func getsDockerInfo() SwarmMetadata {
var metadata SwarmMetadata
info, err := cli.Info(ctx)
metadata.isManager = info.ControlAvailable
metadata.selfIP = info.NodeAddr
for _, manager := range info.swarm.RemoteManagers {
append(metadata.managers, manager.Addr)
}
...
return metadata
}

// Below are repeated used code
// when the new info is needed
metadata := getsDockerInfo()

// when need to check if manager
if metadata.isManager != true ...

// when need to get selfIP
something := metadata.selfIP

// when need to loop managers
for _, managerIP := range metadata.managers {
...
do something to managerIP
}

How it looks like without function wrapper:

// Below are repeated used code
// when the new info is needed 
info, err := cli.Info(ctx)

// when need to check if manager
if info.ControlAvailable != true ...

// when need to get selfIP
something := info.NodeAddr

// when need to loop managers
for _, manager := range info.Swarm.RemoteManagers {
...
do something to manager.Addr
}

So if we compare above, the function wrapper is not saving any code reuse. BTW it can add an extra loop in order to get manager IP addresses. So I would prefer not using a wrapper here.

msterin · 2017-07-20T22:37:38Z

@luomiao - I have a few questions, added with msterin> to the description. Please take a look

msterin · 2017-07-20T22:49:27Z

client_plugin/drivers/shared/etcdops.go

+	return nil
+}
+
+func etcdRoutine(cmd []string) {


the name feels wrong. etcdService() or etcdLoop() or even etcdDaemon() are closer to what's really going on

msterin

Looks good.
A few string/err messages need improvement but overall the logic is clear. I did not check for releasing objects like ctx or client (I assume you checked if it's needed) .
One request - please take a look to either flatten out the nesting a bit, or comment all nodes of the decision tree.
Other that, ready to merge

msterin · 2017-07-24T23:18:06Z

client_plugin/drivers/shared/etcdops.go

+	if info.Swarm.ControlAvailable == false {
+		log.WithFields(
+			log.Fields{"nodeID": nodeID},
+		).Info("Swarm node role: worker, return from InitEtcd ")


this info message is confusing.... maybe "Swarm node role: worker. Action: return from InitEtcd " (same comment to other tracking Info messages)

msterin · 2017-07-24T23:22:00Z

client_plugin/drivers/shared/etcdops.go

+	nodes, err := cli.NodeList(ctx, types.NodeListOptions{})
+	if err != nil {
+		log.WithFields(log.Fields{"nodeID": nodeID,
+			"error": err}).Error("Failed get NodeList from swarm manager")


'failed to get...'

msterin · 2017-07-24T23:27:06Z

client_plugin/drivers/shared/etcdops.go

+	peerAddr := etcdScheme + nodeAddr + etcdPeerPort
+	existing := false
+	for _, member := range lresp.Members {
+		if member.PeerURLs[0] == peerAddr {


there should be a better way to find it . Have you checked existing packages ?

Etcd clientv3 only have 4 member API functions:
https://github.com/coreos/etcd/blob/d51d381eca406156fd7e44118d4b0dab03589354/clientv3/cluster.go#L31
And there is no API function to locate a certain member...

msterin · 2017-07-24T23:28:03Z

client_plugin/drivers/shared/etcdops.go

+		}
+	}
+
+	log.Errorf("Failed to create etcd client according to manager info")


Need to print extra inffo here; if we got here it's likely there was no debug print yet

msterin · 2017-07-24T23:28:35Z

client_plugin/drivers/shared/etcdops.go

+}
+
+func addrToEtcdClient(addr string) (*etcdClient.Client, error) {
+	s := strings.Split(addr, ":")


comment the function please. Expecially the string parsing part...

msterin · 2017-07-24T23:31:49Z

client_plugin/drivers/shared/etcdops.go

+					return err
+				}
+			}
+			break


why break? Generally, this loop nesting looks too deep , so it's either need to be flattened out or (more likley) more comments added at each node of the decision tree here.

Because we don't need to continue the comparison since one IP address can only join once. No need to continue the check.
I will add some more comments here.

Design: Swarm leader node starts an etcd cluster when the shared volume driver is initialized. Swarm manager nodes join the etcd cluster through the leader node. Swarm worker nodes do nothing. The etcd cluster resides inside plugin runc container. Thus when plugin is stopped, the etcd member inside the container dies. When the plugin/VM is restarted, if the plugin find there is already a started member in the etcd cluster with the same peer url, this entry should be removed and re-joined. Due to etcd limitation, users should start the swarm cluster with the number of managers (including leader) as an odd number, in order to avoid the situation of quorum loss. For now the etcd cluster start/join only happens when the plugin is initialized. Thus if a swarm worker node has the plugin installed first and promoted to manager later, it won't join the etcd cluster. Before this is resolved, users should reinstall the plugin on a promoted node.

luomiao requested review from msterin and shivanshu21 July 20, 2017 00:49

vmwclabot added the cla-not-required label Jul 20, 2017

shivanshu21 reviewed Jul 20, 2017

View reviewed changes

luomiao force-pushed the shared-plugin-etcd-init branch 2 times, most recently from dbea5c7 to a811f9a Compare July 20, 2017 08:13

shivanshu21 approved these changes Jul 20, 2017

View reviewed changes

msterin reviewed Jul 20, 2017

View reviewed changes

luomiao force-pushed the shared-plugin-etcd-init branch 2 times, most recently from 119c888 to db8f921 Compare July 24, 2017 07:18

msterin reviewed Jul 24, 2017

View reviewed changes

luomiao force-pushed the shared-plugin-etcd-init branch from db8f921 to 44d3246 Compare July 25, 2017 02:22

msterin approved these changes Jul 25, 2017

View reviewed changes

luomiao merged commit b22d0c0 into vmware-archive:master Jul 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared plugin: etcd starting up functions. #1611

Shared plugin: etcd starting up functions. #1611

luomiao commented Jul 20, 2017 •

edited

Loading

shivanshu21 Jul 20, 2017

luomiao Jul 20, 2017

shivanshu21 Jul 20, 2017

luomiao Jul 20, 2017

shivanshu21 Jul 20, 2017

shivanshu21 Jul 20, 2017

luomiao Jul 20, 2017

shivanshu21 Jul 20, 2017

shivanshu21 Jul 20, 2017

luomiao Jul 20, 2017 •

edited

Loading

shivanshu21 Jul 20, 2017

luomiao Jul 20, 2017

shivanshu21 left a comment

luomiao commented Jul 20, 2017

msterin commented Jul 20, 2017

msterin Jul 20, 2017

msterin left a comment

msterin Jul 24, 2017

msterin Jul 24, 2017

msterin Jul 24, 2017

luomiao Jul 25, 2017

msterin Jul 24, 2017

msterin Jul 24, 2017

msterin Jul 24, 2017

luomiao Jul 25, 2017 •

edited

Loading

Shared plugin: etcd starting up functions. #1611

Shared plugin: etcd starting up functions. #1611

Conversation

luomiao commented Jul 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luomiao Jul 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shivanshu21 left a comment

Choose a reason for hiding this comment

luomiao commented Jul 20, 2017

msterin commented Jul 20, 2017

Choose a reason for hiding this comment

msterin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luomiao Jul 25, 2017 • edited Loading

Choose a reason for hiding this comment

luomiao commented Jul 20, 2017 •

edited

Loading

luomiao Jul 20, 2017 •

edited

Loading

luomiao Jul 25, 2017 •

edited

Loading