-
Notifications
You must be signed in to change notification settings - Fork 741
Allow EtcdRestore to restore cluster without an existing EtcdCluster #2047
Allow EtcdRestore to restore cluster without an existing EtcdCluster #2047
Conversation
…resource Changes; - `EtcdRestore.spec.etcdCluster` is now `ClusterSpec` instead of `EtcdClusterRef` - added `Name` property to `ClusterSpec`, to make it compatible with `EtcdClusterRef` - Now, `etcd-restore-operator` checks for `EtcdCluster` resource existence - If exists; current behavior works (snapshot spec, delete old one, create new one) - If not; uses `EtcdRestore.spec.etcdCluster` as cluster spec
Can one of the admins verify this patch? |
2 similar comments
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
@etcd-bot ok to test |
@etcd-bot retest this please |
@furkanmustafa @hexfusion I do not feel that EtcdCluster should be created by EtcdRestore like this because of the following reasons:
However, this leads to a broader topic of discussion that I have been wondering for a while. I think the EtcdRestore Operator is unique in terms of functionality, i.e. upon a single successful run, the EtcdRestore CR goes into a |
@alaypatel07 Thanks for the comments. I think what you say makes sense. One option is, as you said, making EtcdRestore CRs temporary, like a Job. Then I think it wouldn't be an issue. There is another option that makes more sense to me, which is having What do you think about the second option? |
@furkanmustafa The approach you described in the second option is very similar to what I was referring to as And I totally agree with you, it makes the usage very clear for the user. It also makes the EtcdCluster Operator handle the failure scenario gracefully i.e., if the majority of pods die, automatically restore from backup. This is more effective with the periodic backup feature added to the backup operator, recently, and removes the extra admin intervention of deploying the restore operator. Having said that, there are significant challenges to implementing it. First, it will significantly increase the complexity of the EtcdCluster Operator reconciliation logic, since we are moving the entire restore functionality into the EtcdCluster Operator. The second concern I have is that an operator will always try to reconcile the cluster to a state specified in the spec. Hence, the EtcdCluster operator will somehow need to determine that it successfully restored the EtcdCluster from a backup during last reconciliation and it should skip the restore in this reconciliation but reconcile other parameters of spec like Since the approach in this PR should be refined and carefully thought out, do you mind closing this PR in favour of opening an issue stating the requirement of |
Sorry for the late review but I partly agree with Alay that this warrants more discussion in the form of a proposal before we decide how to support restoring without an existing EtcdCluster object. I say partly because trying to move EtcdBackup/EtcdRestore API back into the EtcdCluster CR would be reverting back to an older architecture of the etcd operator that was redesigned for good reasons. Also see the older revisions of our docs and EtcdCluster API for more context on how we used to handle backup and restore from a single EtcdCluster CR. etcd-operator/pkg/spec/cluster.go Lines 112 to 116 in 0fa2f7d
etcd-operator/pkg/cluster/cluster.go Lines 397 to 412 in 0fa2f7d
So we should discuss this in a proposal since this entails API changes but I think most likely we'll want to do this by keeping the EtcdRestore API decoupled from the EtcdCluster API. |
Example scenario;
Changes;
EtcdRestore.spec.etcdCluster
is nowClusterSpec
instead ofEtcdClusterRef
Name
property toClusterSpec
, to make it compatible withEtcdClusterRef
etcd-restore-operator
checks forEtcdCluster
resource existenceEtcdRestore.spec.etcdCluster
as cluster specExample EtcdRestore yaml;
I'm not used to golang code, please let me know if any part needs fixing, or a different approach.
Thanks