From 7ad662874890fcb85f81e8931c04e8a368d7a4fe Mon Sep 17 00:00:00 2001 From: xing-yang Date: Thu, 13 Feb 2020 04:01:48 +0000 Subject: [PATCH 01/19] Add Volume Group KEP --- keps/sig-storage/20200212-volume-group.md | 1131 +++++++++++++++++++++ 1 file changed, 1131 insertions(+) create mode 100644 keps/sig-storage/20200212-volume-group.md diff --git a/keps/sig-storage/20200212-volume-group.md b/keps/sig-storage/20200212-volume-group.md new file mode 100644 index 00000000000..2691cb39e77 --- /dev/null +++ b/keps/sig-storage/20200212-volume-group.md @@ -0,0 +1,1131 @@ +--- +title: Volume Group +authors: + - "@xing-yang" + - "@jingxu97" +owning-sig: sig-storage +participating-sigs: + - sig-storage +reviewers: + - "@msau42" + - "@saad-ali" + - "@thockin" +approvers: + - "@msau42" + - "@saad-ali" + - "@thockin" +editor: TBD +creation-date: 2020-02-12 +last-updated: 2022-03-24 +status: provisional +see-also: + - n/a +replaces: + - n/a +superseded-by: + - n/a +--- + +# Title + +Volume Group + +## Table of Contents + + +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) +- [Proposal for Consistency Groups and Group Snapshots](#proposal-for-consistency-groups-and-group-snapshots) + - [Create and Modify VolumeGroup](#create-and-modify-volumegroup) + - [Create new PVC and add to the VolumeGroup](#create-new-pvc-and-add-to-the-volumegroup) + - [Modify VolumeGroup with existing PVCs](#modify-volumegroup-with-existing-pvcs) + - [Create VolumeGroup from VolumeGroupSnapshot](#create-volumegroup-from-volumegroupsnapshot) + - [Create VolumeGroupSnapshot](#create-volumegroupsnapshot) + - [Delete VolumeGroupSnapshot](#delete-volumegroupsnapshot) + - [Restore](#restore) + - [API Definitions](#api-definitions) + - [Example Yaml Files](#example-yaml-files) + - [Volume Group Snapshot](#volume-group-snapshot) + - [CSI Changes](#csi-changes) + - [CSI Capabilities](#csi-capabilities) + - [CSI Controller RPC](#csi-controller-rpc) + - [CreateVolumeGroup](#createvolumegroup) + - [CreateVolume](#createvolume) + - [DeleteVolumeGroup](#deletevolumegroup) + - [ModifyVolumeGroup](#modifyvolumegroup) + - [ControllerGetVolumeGroup](#controllergetvolumegroup) + - [ListVolumeGroups](#listvolumegroups) + - [CreateVolumeGroupSnapshot](#createvolumegroupsnapshot) + - [CreateSnapshot](#createsnapshot) + - [DeleteVolumeGroupSnapshot](#deletevolumegroupsnapshot) + - [ControllerGetVolumeGroupSnapshot](#controllergetvolumegroupsnapshot) + - [ListVolumeGroupSnapshots](#listvolumegroupsnapshots) + - [Alternatives](#alternatives) + - [Immutable VolumeGroup](#immutable-volumegroup) +- [Proposal for Volume Placement](#proposal-for-volume-placement) + - [API Changes](#api-changes) + - [ModifyVolume](#modifyvolume) + - [Create VolumeGroup with Selector](#create-volumegroup-with-selector) + - [Example Yaml Files for Volume Placement](#example-yaml-files-for-volume-placement) + + +## Summary + +This proposal is to introduce a VolumeGroup API to manage multiple volumes together and a VolumeGroupSnapshot API to take a snapshot of a VolumeGroup. It also attempts to address other use cases such as volume placement. + +## Motivation + +While there is already a KEP (https://github.com/kubernetes/enhancements/pull/1051) that introduces APIs to do application snapshot, backup, and restore, there are other use cases not covered by that KEP. + +Use case 1: +A VolumeGroup allows users to manage multiple volumes belonging to the same application together and therefore it is very useful in general. For example, it can be used to group all volumes in the same StatefulSet together. + +Use case 2: +For some storage systems, volumes are always managed in a group. For these storage systems, they will have to create a group for a single volume if they need to implement a create volume function in Kubernetes. Providing a VolumeGroup API will be very convenient for them. + +Use case 3: +Instead of taking individual snapshots one after another, VolumeGroup can be used as a source for taking a snapshot of all the volumes in the same volume group. This may be a storage level consistent group snapshot if the storage system supports it. In any case, when used together with quiesce hooks, this group snapshot can be application consistent. For this use case, we will introduce another CRD VolumeGroupSnapshot. + +Use case 4: +VolumeGroup can be used to manage group replication or consistency group replication if the storage system supports it. Note replication is out of scope for this proposal. It is mentioned here as a potential future use case. + +Use case 5: +VolumeGroup can be used to manage volume placement to either spread the volumes across storage pools or stack the volumes on the same storage pool. Related KEPs proposing the concept of storage pool for volume placement is as follows: + https://github.com/kubernetes/enhancements/pull/1353 + https://github.com/kubernetes/enhancements/pull/1347 +We may not really need a VolumeGroup for this use case. A StoragePool is probably enough. This is to be determined. + +Use case 6: +VolumeGroup can also be used together with application snapshot. It can be a resource managed by the ApplicationSnapshot CRD. + +Use case 7: +Some applications may not want to use ApplicationSnapshot CRD because they don’t use Kubernetes workload APIs such as StatefulSet, Deployment, etc. Instead, they have developed their own operators. In this case it is more convenient to use VolumeGroup to manage persistent volumes used in those applications. + +Use case 8: +Application quiesce is time consuming. Some users may not want to do application quiesce very frequently for that reason. For example, a user may want to run weekly backups with application quiesce and nightly backups without application quiesce but with consistency group support which provides crash consistency across all volumes in the group. + +### Goals + +* Provide an API to manage multiple volumes together in a group. +* Provide an API to support consistency groups for snapshots, ensuring crash consistency across all volumes in the group. +* Provide an API to take a snapshot of a group of volumes, not ensuring crash consistency. +* Provide a design to facilitate volume placement using the group API (To be determined). +* The group API should be generic and extensible so that it may be used to support other features in the future. +* A VolumeGroup may potentially be used to support consistency group replication or group replication in the future, but providing design on replication group is not in the scope of this KEP. This can be discussed in the future. + +## Proposal for Consistency Groups and Group Snapshots + +This proposal introduces new CRDs VolumeGroup, VolumeGroupClass, and VolumeGroupSnapshot. + +Create new VolumeGroup can be done in several ways: +1. Create an empty group first, then create a new PVC with the group name which will add a volume to the already created group. +2. Create an empty group first, and then add an existing PVC to the group one by one. +3. Create a new volume group from an existing group snapshot. +4. Non-goal: Create a new empty group and in the same time create new empty PVCs and add to the new group. + +Modify an existing VolumeGroup: +Add new volume or remove existing volume from an existing VolumeGroup. Option 2 for create VolumeGroup above falls into this case. + +### Create and Modify VolumeGroup + +VolumeGroups can be created and/or modified in several ways as described in the following. + +#### Create new PVC and add to the VolumeGroup + +* Create a new empty VolumeGroup. +* Create a new PVC with existing VolumeGroup name. As a result, new PVC is created and added to VolumeGroup. VolumeGroup is modified so Status has this new PVC in PVCList. +* External-provisioner will be modified so that VolumeGroupName will be passed to the CSI driver when creating a volume. + +Only CSI drivers supporting VOLUMEGROUP capability can support the volume group this way. +When a new PVC is created with the existing VolumeGroup name, the VolumeGroup will be modified and the PVC will be added to PVCList in the Status. + +The same PVC can belong to different groups, i.e., different type of groups or different groups of the same type, if the storage system supports it. Storage system will decide whether to support this or not. We don't prevent it in the API or controller directly. + +#### Modify VolumeGroup with existing PVCs + +We can add an existing PVC to the group or remove a PVC from the group without deleting it. A VOLUMEGROUP_ADD_REMOVE_EXISTING_VOLUME capability will be added to CSI Spec. Only CSI drivers supporting both VOLUMEGROUP and VOLUMEGROUP_ADD_REMOVE_EXISTING_VOLUME capabilities can support the volume group this way. +* Create a new empty VolumeGroup. +* Add an existing PVC to an existing VolumeGroup (VolumeGroup can be empty to start with or it can have other PVCs already) by adding VolumeGroup name to the PVC Spec. + * The VolumeGroup name is added by user to each PVC Spec, not by the VolumeGroup controller. The VolumeGroup controller watches PVCs and reacts to the PVC updated with a VolumeGroup name event as described in the following step. +* VolumeGroup is modified so the existing PVC is added to the PVCList in the Status. + * Note: The VolumeGroup controller will be implemented to have a desired state + of the world and an actual state of the world. The desired state of the world + contains VolumeGroups with the desired PVCList while the actual state of the + world contains VolumeGroups with the actual PVCList. The controller will try + to reconcile the two by handling adding and removing multiple PVCs through a + single CSI ModifyVolumeGroup RPC call each time. +* External-provisioner will be modified to update the status of PVC. +* VolumeGroup controller will be triggered to update the VolumeGroup Status. +* If one volume fails to be added, it should not affect it if it is used by a pod, but there will be error messages. +* Deleting a PVC from a VolumeGroup will trigger external-provisioner and the VolumeGroup controller as well. + +#### Create VolumeGroup from VolumeGroupSnapshot + +Creating a new volume group from an existing group snapshot is supported if the CSI driver supports VOLUMEGROUP capability. As a result, PVCs will be created from source snapshots and placed in a new volume group. + +### Create VolumeGroupSnapshot + +A VolumeGroupSnapshot can be created with a VolumeGroup as the source if the CSI driver supports the GROUPSNAPSHOT capability. +* Create a VolumeGroupSnapshot with a VolumeGroup as the source. +* This will trigger the VolumeGroupSnapshot controller to call the CreateVolumeGroupSnapshot CSI function and also create multiple VolumeSnapshot API objects with VolumeGroupSnapshot name parameter in each VolumeSnapshot Spec. This will trigger the creation of VolumeSnapshotContent API objects in the snapshot controller and calls to the CreateSnapshot CSI function in the CSI snapshotter sidecar. The CSI snapshotter sidecar will pass the new group_snapshot_name parameter to the CSI Driver when calling CreatSnapshot. +* When CSI driver receives CreateSnapshot request for individual snapshots with a VolumeGroupSnapshot name: + * Case 1: If it knows how to create a group snapshot on the storage system, it returns (nil, nil), and leave it to the CreateVolumeGroupSnapshot function to handle the snapshot creation. + * Case 2: If it does not know how to create a group snapshot on the storage system, it will create an individual snapshot as usual and return the snapshot_id back. +* CreateVolumeGroupSnapshot CSI function response + * Case 1: The CreateVolumeGroupSnapshot CSI function should return a list of snapshots (Snapshot message defined in CSI Spec) in its response. The VolumeGroupSnapshot controller can use the returned list of snapshots to update corresponding individual VolumeSnapshotContents, wait for VolumeSnapshots and VolumeSnapshotContents to be bound, and update SnapshotList in the VolumeGroupSnapshot Status. + * Case 2: The CreateVolumeGroupSnapshot CSI function returns group_snapshot_id and volume_group_id, but leaves snapshots field as empty. The VolumeGroupSnapshot controller watches VolumeSnapshot and VolumeSnapshotContent API objects. If a VolumeSnapshot's volumeGroupSnapshotName field matches the VolumeGroupSnapshot name that is being created, it is an individual snapshot that belongs to the VolumeGroupSnapshot. When VolumeSnapshot and VolumeSnapshotContent are bound, it saves the VolumeSnapshot API object to SnapshotList in its Status. +apiVersion: snapshot.storage.k8s.io/v1 +``` +kind: VolumeSnapshot +metadata: + name: snapshot1 +spec: + volumeSnapshotClassName: snapClass1 + source: + persistentVolumeClaimName: pvc1 + volumeGroupSnapshotName: groupSnapshot1 +``` +* An admissions controller or finalizer should be added to prevent an individual snapshot from being deleted that belongs to a GroupSnapshot. +* Since some storage systems require individual snapshots while others can only return a single group snapshot but not individual snapshots, we propose the following solution: + * In VolumeGroupSnapshotStatus, if ReadyToUse is true and SnapshotList is empty, the VolumeGroupSnapshot Controller assumes the storage system does not return individual snapshots. + * If ReadyToUse is true and SnapshotList in not empty, the VolumeGroupSnapshot Controller knows there are individual snapshots created for this group. Those individual snapshots may be used as readonly, but they cannot be removed from the GroupSnapshot. + * In the CSI Spec, this means repeated .csi.v1.Snapshot snapshots in VolumeGroupSnapshot message from CreateVolumeGroupSnapshotResponse should be optional, not required. + * How to use the VolumeGroupSnapshot if individual snapshots are not returned? How can we create a volume from a snapshot if there are no individual snapshots? `snapshots` is optional while `group_snapshot_id` is required in VolumeGroupSnapshot message in CSI so it is fine to only specify `group_snapshot_id` not `snapshots` when creating a VolumeGroup from a VolumeGroupSnapshot. However, CSI Driver MUST return a list of `volumes` that are restored in `CreateVolumeGroupResponse`. + +### Delete VolumeGroupSnapshot + +A VolumeGroupSnapshot can be deleted if the CSI driver supports the GROUPSNAPSHOT capability. +* When a VolumeGroupSnapshot is deleted, the VolumeGroupSnapshot controller will call the DeleteVolumeGroupSnapshot CSI function as well as DeleteSnapshot CSI functions. Just like create snapshot, there are 2 cases. + * Case 1: Since CSI driver handles individual snapshot creation in CreateVolumeGroupSnapshot, it should handle individual snapshot deletion in DeleteVolumeGroupSnapshot. + * Case 2: Since CSI driver handles individual snapshot creation in CreateSnapshot, it should handle individual snapshot deletion in DeleteSnapshot. +* DeleteSnapshot on a single snapshot that belongs to a group snapshot is not allowed. + +### Restore + +Restore can be done as follows: +1. A new empty volume group can be created first, and then a new volume can be created from a snapshot one by one and added to the volume group. This can be repeated for all the snapshots in the VolumeGroupSnapshot. +2. A VolumeGroup can be created from a VolumeGroupSnapshot source in one step. This is the same as what is described in the section `Create VolumeGroup from VolumeGroupSnapshot`. + +### API Definitions + +API definitions are as follows: + +``` +type VolumeGroupClass struct { + metav1.TypeMeta + // +optional + metav1.ObjectMeta + + // Driver is the driver expected to handle this VolumeGroupClass. + // This value may not be empty. + Driver string + + // Parameters holds parameters for driver. + // These values are opaque to the system and are passed directly + // to the driver. + // +optional + Parameters map[string]string + + // This field specifies whether group snapshot is supported. + // The default is false. + // +optional + VolumeGroupSnapshot *bool + + // Specifies whether consistent group snapshot is supported. + // The default is false. + // +optional + ConsistentGroupSnapshot *bool +} + +// VolumeGroup is a user's request for a group of volumes +type VolumeGroup struct { + metav1.TypeMeta + // +optional + metav1.ObjectMeta + + // Spec defines the volume group requested by a user + Spec VolumeGroupSpec + + // Status represents the current information about a volume group + // +optional + Status *VolumeGroupStatus +} + +// VolumeGroupSpec describes the common attributes of group storage devices +// and allows a Source for provider-specific attributes +Type VolumeGroupSpec struct { + // +optional + VolumeGroupClassName *string + + // If InitSource is nil, an empty volume group will be created. + // Otherwise, a volume group will be created with PVCs. + // If SourceVolumeGroupSnapshotName is not nil, the volume group + // will be created from the source VolumeGroupSnapshot. + // This field determines what PVCs will be in the volume group + // when it is initially created. PVCs can be added to or removed + // from the volume group later if CSI driver supports + // VOLUMEGROUP_ADD_REMOVE_EXISTING_VOLUME. + // +optional + InitSource *VolumeGroupSource +} + +// VolumeGroupSource contains 1 option SourceVolumeGroupSnapshotName. +// Make SourceVolumeGroupSnapshotName a pointer to allow new optional +// source to be added in the future. +Type VolumeGroupSource struct { + // If specified, the VolumeGroup will be created from the source + // VolumeGroupSnapshot. + // +optional + SourceVolumeGroupSnapshotName *string +} + +type VolumeGroupStatus struct { + // VolumeGroupId is a unique id returned by the CSI driver + // to identify the VolumeGroup on the storage system. + // If a storage system does not provide such an id, the + // CSI driver can choose to return the VolumeGroup name. + // +optional + VolumeGroupId *string + + // +optional + GroupCreationTime *metav1.Time + + // A list of persistent volume claims + // +optional + PVCList []PersistentVolumeClaim + + // +optional + Ready *bool + + // Last error encountered during group creation + // +optional + Error *VolumeGroupError +} + +// Describes an error encountered on the group +type VolumeGroupError struct { + // time is the timestamp when the error was encountered. + // +optional + Time *metav1.Time + + // message details the encountered error + // +optional + Message *string +} + +// VolumeGroupSnapshot is a user's request for taking a group snapshot. +type VolumeGroupSnapshot struct { + metav1.TypeMeta `json:",inline"` + // Standard object's metadata. + // +optional + metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` + + // Spec defines the desired characteristics of a group snapshot requested by a user. + Spec VolumeGroupSnapshotSpec `json:"spec" protobuf:"bytes,2,opt,name=spec"` + + // Status represents the latest observed state of the group snapshot + // +optional + Status *VolumeGroupSnapshotStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"` +} + +// VolumeGroupSnapshotSpec describes the common attributes of a group snapshot +type VolumeGroupSnapshotSpec struct { + // Source has the information about where the group snapshot is created from. + // Supported Kind is VolumeGroup + // Required. + Source TypedLocalObjectReference `json:"source" protobuf:"bytes,1,opt,name=source"` +} + +Type VolumeGroupSnapshotStatus struct { + // VolumeGroupSnapshotId is a unique id returned by the CSI driver + // to identify the VolumeGroupSnapshot on the storage system. + // If a storage system does not provide such an id, the + // CSI driver can choose to return the VolumeGroupSnapshot name. + // +optional + VolumeGroupSnapshotID *string + + // ReadyToUse becomes true when ReadyToUse on all individual snapshots become true + // +optional + ReadyToUse *bool + + // List of volume snapshots + // +optional + SnapshotList []VolumeSnapshot +} + +type PersistentVolumeClaimSpec struct { + ...... + // +optional + VolumeGroupNames []string + ...... +} + + +type VolumeSnapshotSpec struct{ + ...... + // +optional + VolumeGroupSnapshotName *string + ...... +} +``` + +### Example Yaml Files + +#### Volume Group Snapshot + +Example yaml files to define a VolumeGroupClass and VolumeGroup are in the following. + +A VolumeGroupClass that supports groupSnapshot: +``` +apiVersion: volumegroup.storage.k8s.io/v1alpha1 +kind: VolumeGroupClass +metadata: + name: volumeGroupClass1 +spec: + parameters: + …... + groupSnapshot: true +``` + +A VolumeGroup belongs to this VolumeGroupClass: +``` +apiVersion: volumegroup.storage.k8s.io/v1alpha1 +kind: VolumeGroup +metadata: + Name: volumeGroup1 +spec: + volumeGroupClassName: volumeGroupClass1 +``` + +A VolumeGroupSnapshot taken from the VolumeGroup: +``` +apiVersion: volumegroup.storage.k8s.io/v1alpha1 +kind: VolumeGroupSnapshot +metadata: + name: my-group-snapshot +spec: + source: + name: volumeGroup1 + kind: VolumeGroup + apiGroup: volumegroup.storage.k8s.io +``` + +A PVC that belongs to the volume group which supports groupSnapshot: +``` +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: pvc1 + annotations: +spec: + accessModes: + - ReadWriteOnce + dataSource: null + resources: + requests: + storage: 1Gi + storageClassName: storageClass1 + volumeMode: Filesystem + volumeGroupNames: [volumeGroup1] +``` + +A new external VolumeGroup controller will handle VolumeGroupClass and VolumeGroup resources. +External provisioner will be modified to read information from volume groups (through volumeGroupNames) and pass them down to the CSI driver. + +### CSI Changes + +#### CSI Capabilities + +New controller capabilities VOLUMEGROUP, VOLUMEGROUP_ADD_REMOVE_EXISTING_VOLUME, GROUPSNAPSHOT, MODIFY_VOLUME, and INDIVIDUAL_SNAPSHOT_RESTORE will be added. + +* VOLUMEGROUP + Indicates that the controller plugin supports creating and deleting a volume group. + +* VOLUMEGROUP_ADD_REMOVE_EXISTING_VOLUME + Indicates that the controller plugin supports adding an existing volume to a + volume group and removing a volume from a volume group without deleting it. + +* GROUPSNAPSHOT + Indicates that the controller plugin supports creating a snapshot of all volumes + in a volume group. + +* CONSISTENT_GROUPSNAPSHOT + Indicates that the controller plugin supports creating a consistent snapshot of + all volumes in a volume group. + +* MODIFY_VOLUME + Indicates that the controller plugin supports modifying a volume. + +* INDIVIDUAL_SNAPSHOT_RESTORE + Indicates whether the controller plugin supports creating a volume from an + individual volume snapshot if the volume snapshot is part of a + VolumeGroupSnapshot. Use cases: selective restore, advanced recovery, etc. + +#### CSI Controller RPC + +``` +service Controller { + … + rpc CreateVolumeGroup(CreateVolumeGroupRequest) + returns (CreateVolumeGroupResponse) { + option (alpha_method) = true; + } + + rpc CreateVolumeGroupSnapshot(CreateVolumeGroupSnapshotRequest) + returns (CreateVolumeGroupSnapshotResponse) { + option (alpha_method) = true; + } + + rpc ModifyVolumeGroup(ModifyVolumeGroupRequest) + returns (ModifyVolumeGroupResponse) { + option (alpha_method) = true; + } + + rpc DeleteVolumeGroup(DeleteVolumeGroupRequest) + returns (DeleteVolumeGroupResponse) } + option (alpha_method) = true; + } + + rpc DeleteVolumeGroupSnapshot(DeleteVolumeGroupSnapshotRequest) + returns (DeleteVolumeGroupSnapshotResponse) { + option (alpha_method) = true; + } + + rpc ListVolumeGroups(ListVolumeGroupsRequest) + returns (ListVolumeGroupsResponse) { + option (alpha_method) = true; + } + + rpc ListVolumeGroupSnapshots(ListVolumeGroupSnapshotsRequest) + returns (ListVolumeGroupSnapshotsResponse) { + option (alpha_method) = true; + } + + rpc GetVolumeGroup(GetVolumeGroupRequest) + returns (GetVolumeGroupResponse) { + option (alpha_method) = true; + } + + rpc GetVolumeGroupSnapshot(GetVolumeGroupSnapshotRequest) + returns (GetVolumeGroupSnapshotResponse) { + option (alpha_method) = true; + } + … +} +``` + +#### CreateVolumeGroup + +This RPC will be called by the CO to create a new volume group on behalf of a user. +This operation MUST be idempotent. If a volume corresponding to the specified volume name already exists, is compatible with the specified parameters in the CreateVolumeGroupRequest, the Plugin MUST reply 0 OK with the corresponding CreateVolumeGroupResponse. +CSI Plugins MAY create the following types of volume groups: + +* Create a new empty volume group. +* At restore time, create a single volume from individual snapshot and then join an existing group. + * Create an empty group + * Create a volume from snapshot in the group +* Create a new volume group from a source group snapshot. +* Create a new volume group and add a list of existing volumes to the group. + +The following is non-goal: +* Non goal: Create a new group and at the same time create a list of new volumes in the group. + +In `VolumeGroupSnapshot` message, `snapshots` is an optional field while `group_snapshot_id` is a required field. It is fine to only specify `group_snapshot_id` but not `snapshots` in `VolumeGroupSnapshot` message at restore time. +However, the Plugin MUST return a list of volumes that are restored in `CreateVolumeGroupResponse`. + +``` +message CreateVolumeGroupRequest { + option (alpha_message) = true; + + // suggested name for volume group (required for idempotency) + // This field is REQUIRED. + string name = 1; + + // params passed from VolumeGroupClass + // This field is OPTIONAL. + map parameters = 2; + + // Secrets required by plugin to complete volume group creation request. + // This field is OPTIONAL. Refer to the `Secrets Requirements` + // section on how to use this field. + map secrets = 3 [(csi_secret) = true]; + + // If specified, a volume group will be created from the source group snapshot. + // This field is OPTIONAL. + VolumeGroupSnapshot source_volume_group_snapshot = 4; + + // If specified, a volume group will be created from a list of existing volumes. + // This field is OPTIONAL. + repeated string volume_id = 5; +} + +message CreateVolumeGroupResponse { + option (alpha_message) = true; + + // Contains all attributes of the newly created volume group. + // This field is REQUIRED. + VolumeGroup volume_group = 1; +} + +message VolumeGroup { + option (alpha_message) = true; + + // The identifier for this volume group, generated by the plugin. + // This field is REQUIRED. + string volume_group_id = 1; + + // Opaque static properties of the volume group. + // This field is OPTIONAL. + map volume_group_context = 2; + + // Underlying volumes in this group. The same definition in CSI Volume. + // This field is OPTIONAL to support the creation of an empty group. + // However, this field is REQUIRED in the following cases: + // - Create a new volume group from a source group snapshot. + // - Create a new volume group and add a list of existing volumes to the group. + repeated .csi.v1.Volume volumes = 3; +} +``` + +#### CreateVolume + +1. When a new volume is created with a volume group id parameter, the volume will be created and added to the existing volume group. +2. A new volume can also be created without a volume group id parameter. It can be added to a volume group later through the ModifyVolumeGroup RPC. + +Note that for filesystems based storage systems, only option 1 can be supported. For block based storage systems. Both option 1 and 2 may be supported. However there is a possibility that option 2 will not work for ConsistencyGroups as the volume is created without the consideration of which group the volume will be placed in. + +``` +message CreateVolumeRequest { + string name = 1; + … + repeated string volume_group_id = 8 [(alpha_field) = true]; +} +``` + +#### DeleteVolumeGroup + +``` +message DeleteVolumeGroupRequest { + option (alpha_message) = true; + + // The ID of the volume group to be deprovisioned. + // This field is REQUIRED. + string volume_group_id = 1; + + // Secrets required by plugin to complete volume group deletion request. + // This field is OPTIONAL. Refer to the `Secrets Requirements` + // section on how to use this field. + map secrets = 2 [(csi_secret) = true]; +} + +message DeleteVolumeGroupResponse { + option (alpha_message) = true; + // Intentionally empty. +} +``` + +#### ModifyVolumeGroup + +This RPC will be called by the CO to modify an existing volumegroup on behalf of a user. volume_ids provided in the ModifyVolumeGroupRequest will be compared to the ones in the existing VolumeGroup. New volume_ids in the modified VolumeGroup will be added to the VolumeGroup. Existing volume_ids not in the modified VolumeGroup will be removed from the VolumeGroup. If volume_ids is empty, the VolumeGroup will be removed of all existing volumes. This operation MUST be idempotent. + +To support ModifyVolumeGroup, the Kubernetes VolumeGroup controller will be implemented to have a desired state of the world and an actual state of the world. The desired state of the world contains VolumeGroups with the desired PVCList while the actual state of the world contains VolumeGroups with the actual PVCList. The controller will try to reconcile the two by handling adding and removing multiple PVCs through a single CSI RPC call each time. + +Note that filesystems based storage systems may not be able to support this RPC. For block based storage systems, this is a very convenient method. However, it may not satisfy the requirement for consistency as the volume is created without the knowledge of which group it is placed in. + +``` +message ModifyVolumeGroupRequest { + option (alpha_message) = true; + + // The ID of the volume group to be modified. + // This field is REQUIRED. + string volume_group_id = 1; + + // Specify volume_ids that will be in the modified volume group. + // This list will be compared with the volume_ids in the existing group. + // New ones will be added and missing ones will be removed. + // If no volume_ids are provided, all existing volumes will + // be removed from the group. + // This field is OPTIONAL. + repeated string volume_ids = 3; +} + +message ModifyVolumeGroupResponse { + option (alpha_message) = true; + + // Contains all attributes of the modified volume group. + // This field is REQUIRED. + VolumeGroup volume_group = 1; +} +``` + +#### ControllerGetVolumeGroup + +``` +message ControllerGetVolumeGroupRequest { + option (alpha_message) = true; + + // The ID of the volume group to fetch current volume group information for. + // This field is REQUIRED. + string volume_group_id = 1; +} + +message ControllerGetVolumeGroupResponse { + option (alpha_message) = true; + + // This field is REQUIRED + VolumeGroup volume_group = 1; +} +``` + +#### ListVolumeGroups + +``` +message ListVolumeGroupsRequest { + option (alpha_message) = true; + + // If specified (non-zero value), the Plugin MUST NOT return more + // entries than this number in the response. If the actual number of + // entries is more than this number, the Plugin MUST set `next_token` + // in the response which can be used to get the next page of entries + // in the subsequent `ListVolumeGroups` call. This field is OPTIONAL. If + // not specified (zero value), it means there is no restriction on the + // number of entries that can be returned. + // The value of this field MUST NOT be negative. + int32 max_entries = 1; + + // A token to specify where to start paginating. Set this field to + // `next_token` returned by a previous `ListVolumeGroups` call to get the + // next page of entries. This field is OPTIONAL. + // An empty string is equal to an unspecified field value. + string starting_token = 2; +} + +message ListVolumeGroupsResponse { + option (alpha_message) = true; + + message Entry { + // This field is REQUIRED + VolumeGroup volume_group = 1; + } + + repeated Entry entries = 1; + + // This token allows you to get the next page of entries for + // `ListVolumeGroups` request. If the number of entries is larger than + // `max_entries`, use the `next_token` as a value for the + // `starting_token` field in the next `ListVolumeGroups` request. This + // field is OPTIONAL. + // An empty string is equal to an unspecified field value. + string next_token = 2; +} +``` + +#### CreateVolumeGroupSnapshot + +The purpose of this call is to request the creation of a multi-volume snapshot. Group snapshots can be created from existing volume group. Note that calls to this function must be idempotent - the function may be called multiple times for the same name - the group snapshot must only be created once. + +``` +message CreateVolumeGroupSnapshotRequest { + option (alpha_message) = true; + + // suggested name for a group snapshot (required for idempotent) + // This field is REQUIRED. + string name = 1; + + // identifier indicates which volume group is used to take + // group snapshot + // This field is REQUIRED. + string source_volume_group_id = 2; + + // volume ids of the volumes in the source group. This field is REQUIRED. + // This is needed because some storage systems does not have a group persisted + // on the storage system until the time to take a group snapshot + repeated string volume_ids = 3; + + // secrets required for snapshot creation (pulled from VolumeSnapshotClass) + // This field is OPTIONAL. + map secrets = 4 [(.csi.v1.csi_secret) = true]; + + // params passed from VolumeSnapshotClass + // This field is OPTIONAL. + map parameters = 5; +} + +message CreateVolumeGroupSnapshotResponse { + option (alpha_message) = true; + + // Contains all attributes of the newly created group snapshot. + // This field is REQUIRED. + VolumeGroupSnapshot group_snapshot = 1; +} + +message VolumeGroupSnapshot { + option (alpha_message) = true; + + // The identifier for this group snapshot, generated by the plugin. + // This field is REQUIRED. + string group_snapshot_id = 1; + + // A list of snapshots created. Snapshot is the same + // definition as Snapshot definition used in CSI. + // This field is OPTIONAL. + repeated .csi.v1.Snapshot snapshots = 2; + + // Identity information for the source volume group. Currently, only + // support the case that source is volume group. This field is REQUIRED. + string source_volume_group_id = 3; + + // Indicates if a list of group snapshots are ready. + // This field is REQUIRED. + bool ready_to_use = 4; + + // Timestamp when the point-in-time consistency group snapshot is taken. + // This field is REQUIRED. + .google.protobuf.Timestamp creation_time = 5; + + // Complete total size of the snapshots in group in bytes. The purpose of + // this field is to give CO guidance on how much space is needed to restore + // volumes from all snapshots in group. This field is OPTIONAL. + int64 size_bytes = 6; +} +``` + +#### CreateSnapshot + +``` +message CreateSnapshotRequest { + // The ID of the source volume to be snapshotted. + // This field is REQUIRED. + string source_volume_id = 1; + … + string group_snapshot_name = 2 [(alpha_field) = true]; +} + +message CreateSnapshotResponse { + Snapshot snapshot = 1; + … + string group_snapshot_id = 2 [(alpha_field) = true]; +} +``` + +#### DeleteVolumeGroupSnapshot + +``` +message DeleteVolumeGroupSnapshotRequest { + option (alpha_message) = true; + + // The ID of the group snapshot to be deprovisioned. + // This field is REQUIRED. + string group_snapshot_id = 1; + + // Secrets required by plugin to complete group snapshot deletion request. + // This field is OPTIONAL. Refer to the `Secrets Requirements` + // section on how to use this field. + map secrets = 2 [(csi_secret) = true]; +} + +message DeleteVolumeGroupSnapshotResponse { + // Intentionally empty. +} +``` + +#### ControllerGetVolumeGroupSnapshot + +``` +message ControllerGetVolumeGroupSnapshotRequest { + option (alpha_message) = true; + + // The ID of the group snapshot to fetch current group snapshot information for. + // This field is REQUIRED. + string group_snapshot_id = 1; +} + +message ControllerGetVolumeGroupSnapshotResponse { + option (alpha_message) = true; + + // This field is REQUIRED + VolumeGroupSnapshot group_snapshot = 1; +} +``` + +#### ListVolumeGroupSnapshots + +``` +message ListVolumeGroupSnapshotsRequest { + option (alpha_message) = true; + + // If specified (non-zero value), the Plugin MUST NOT return more + // entries than this number in the response. If the actual number of + // entries is more than this number, the Plugin MUST set `next_token` + // in the response which can be used to get the next page of entries + // in the subsequent `ListVolumeGroupSnapshots` call. This field is OPTIONAL. If + // not specified (zero value), it means there is no restriction on the + // number of entries that can be returned. + // The value of this field MUST NOT be negative. + int32 max_entries = 1; + + // A token to specify where to start paginating. Set this field to + // `next_token` returned by a previous `ListVolumeGroupSnapshots` call to get the + // next page of entries. This field is OPTIONAL. + // An empty string is equal to an unspecified field value. + string starting_token = 2; +} + +message ListVolumeGroupSnapshotsResponse { + option (alpha_message) = true; + + message Entry { + // This field is REQUIRED + VolumeGroupSnapshot group_snapshot = 1; + } + + repeated Entry entries = 1; + + // This token allows you to get the next page of entries for + // `ListVolumeGroupSnapshots` request. If the number of entries is larger than + // `max_entries`, use the `next_token` as a value for the + // `starting_token` field in the next `ListVolumeGroupSnapshots` request. This + // field is OPTIONAL. + // An empty string is equal to an unspecified field value. + string next_token = 2; +} +``` + +### Alternatives + +#### Immutable VolumeGroup + +During the design discussions, an immutable VolumeGroup was considered but was removed because this would add lots of complexity to the design without much gain. + +Immutable VolumeGroup - PVCList or PVC Selector in the ImmutableSource field in the Spec (optional field); PVCList is in the Status. +* Create a new VolumeGroup with existing PVCs by PVCList or PVC Selector in the Spec. The PVCList will be in the VolumeGroup Status as well. +* VolumeGroup Status has a boolean Mutable set to false. + +``` +ImmutableSource struct { + PVCList + Selector +} +``` + +``` +// VolumeGroupSpec describes the common attributes of group storage devices +// and allows a Source for provider-specific attributes +Type VolumeGroupSpec struct { + // +optional + VolumeGroupClassName *string + + // If ImmutableSource is nil, an empty volume group will be created. + // Otherwise, a volume group will be created with PVCs (if PVCList or Select is set) + // If ImmutableSource is not nil, it indicates the VolumeGroup is immutable + // +optional + ImmutableSource *VolumeGroupSource +} + +// VolumeGroupSource contains 3 options. If VolumeGroupSource is not nil, +// one of the 3 options must be defined. +Type VolumeGroupSource struct { + // A list of existing persistent volume claims + // +optional + PVCList []PersistentVolumeClaim + + // A label query over existing persistent volume claims to be added to the volume group. + // +optional + Selector *metav1.LabelSelector + } + +type VolumeGroupStatus struct { + // VolumeGroupId is a unique id returned by the CSI driver + // to identify the VolumeGroup on the storage system. + // If a storage system does not provide such an id, the + // CSI driver can choose to return the VolumeGroup name. + VolumeGroupId *string + + GroupCreationTime *metav1.Time + + // A list of persistent volume claims + // +optional + PVCList []PersistentVolumeClaim + + Ready *bool + + // Mutable indicates if a VolumeGroup can be modified + // after it is created. If false, it indicates it cannot be + // modified once created. If ImmutableSource is not nil + // in VolumeGroupSpec, Mutable must be false; otherwise + // it means the driver does not support ImmutableSource. + // VOLUMEGROUP_IMMUTABLE and VOLUMEGROUP_MUTABLE capability + // will be added to the CSI spec. + Mutable *bool + + // If true, it indicates the CSI driver supports adding + // an existing volume to the VolumeGroup and removing a + // volume from the VolumeGroup without deleting it. + // Only mutable VolumeGroup can support AddRemoveExistingPVC. + // A corresponding VOLUMEGROUP_ADD_REMOVE_EXISTING_VOLUME + // capability will be added to the CSI spec. + AddRemoveExistingPVC *bool + + // Last error encountered during group creation + Error *VolumeGroupError +} +``` + +VOLUMEGROUP_IMMUTABLE and VOLUMEGROUP_MUTABLE capability will be added to the CSI spec. +If VOLUMEGROUP_IMMUTABLE is supported, a VolumeGroup with an ImmutableSource can be created. Mutable will be false, PVCList will be set, and Ready will be true in the Status. +Otherwise, a VolumeGroup with an ImmutableSource will not be created successfully. + +## Proposal for Volume Placement + +### API Changes + +In order to support Volume Placement, An `AllowedTopologies` field will be added to the VolumeGroupClass API: + +``` +type VolumeGroupClass struct { + metav1.TypeMeta + // +optional + metav1.ObjectMeta + + // Driver is the driver expected to handle this VolumeGroupClass. + // This value may not be empty. + Driver string + + // Parameters holds parameters for driver. + // These values are opaque to the system and are passed directly + // to the driver. + // +optional + Parameters map[string]string + + // This field specifies whether group snapshot is supported. + // The default is false. + // +optional + VolumeGroupSnapshot *bool + + // Restrict the topologies where a group of volumes can be located. + // Each driver defines its own supported topology specifications. + // An empty TopologySelectorTerm list means there is no topology restriction. + // This field is passed on to the drivers to handle placement of a group of + // volumes on storage pools. + // +optional + AllowedTopologies []api.TopologySelectorTerm +} +``` + +#### ModifyVolume + +ModifyVolume CSI RPC was considered earlier to add/remove one volume to/from a group at a time but it was removed because ModifyVolumeGroup CSI RPC was added. + +``` + rpc ModifyVolume(ModifyVolumeRequest) + returns (ModifyVolumeResponse) { + option (alpha_method) = true; + } +``` + +This RPC is called when an existing volume is added to an existing volume group or when a volume is removed from the volume group. +A volume group id parameter will be in the ModifyVolumeRequest for an add request. +A volume group id parameter will not be in the ModifyVolumeRequest for a delete request. +If user requests to add an existing volume to a consistency group, but the CSI driver cannot fulfill the request because the existing volume is placed on a different storage pool from the consistency group, then the CSI driver MUST return failure. +This RPC MUST be idempotent. + +``` +message ModifyVolumeRequest { + string volume_id = 1; + + // This field is OPTIONAL. + repeated string volume_group_id = 2 [(alpha_field) = true]; + + // Secrets required by plugin to complete modify volume request. + // This field is OPTIONAL. Refer to the `Secrets Requirements` + // section on how to use this field. + map secrets = 3 [(csi_secret) = true]; +} +``` +External-provisioner will be modified so that modifying PVC by adding VolumeGroupName will trigger a ModifyVolume call (a new CSI controller RPC) to CSI driver. + +#### Create VolumeGroup with Selector + +Create VolumeGroup with Selector is an option discussed but moved to alternatives section. The suggestion is to add a new CRD and controller to select labeled PVCs. Whether this controller can only add new PVC or can also modify existing PVC will be decided later. + +Creating a new volume group and adding existing PVCs matching the label selector to the group is supported if the CSI driver supports VOLUMEGROUP capability. + +CSI drivers that do not have a volume_group_id concept can use the VolumeGroup name stored in Kubernetes API server as the volume_group_id. + +// VolumeGroupSpec describes the common attributes of group storage devices +// and allows a Source for provider-specific attributes +Type VolumeGroupSpec struct { + // +optional + VolumeGroupClassName *string + + // If InitSource is nil, an empty volume group will be created. + // Otherwise, a volume group will be created with PVCs. + // If Selector is set in InitSource, existing PVCs with matching + // label will be added to the volume group. + // If SourceVolumeGroupSnapshotName is not nil, the volume group + // will be created from the source VolumeGroupSnapshot. + // This field determines what PVCs will be in the volume group + // when it is initially created. PVCs can be added to or removed + // from the volume group later if CSI driver supports + // VOLUMEGROUP_ADD_REMOVE_EXISTING_VOLUME. + // +optional + InitSource *VolumeGroupSource +} + +// VolumeGroupSource contains 2 options. If VolumeGroupSource is not nil, +// one and only one of the 2 options must be defined. +Type VolumeGroupSource struct { + // A label query over existing persistent volume claims to be added to the volume group. + // +optional + Selector *metav1.LabelSelector + + // If specified, the VolumeGroup will be created from the source + // VolumeGroupSnapshot. + // +optional + SourceVolumeGroupSnapshotName *string +} + + +### Example Yaml Files for Volume Placement + +A VolumeGroupClass that supports placement: +``` +apiVersion: volumegroup.storage.k8s.io/v1alpha1 +kind: VolumeGroupClass +metadata: + name: placementGroupClass1 +spec: + parameters: + …... + allowedTopologies: [failure-domain.example.com/placement: storagePool1] +``` +``` +apiVersion: volumegroup.storage.k8s.io/v1alpha1 +kind: VolumeGroup +metadata: + Name: placemenGroup1 +spec: + volumeGroupClassName: placementGroupClass1 +``` + +A PVC that belongs to both the volume group with groupSnapshot support and placement. +``` +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: pvc1 + annotations: +spec: + accessModes: + - ReadWriteOnce + dataSource: null + resources: + requests: + storage: 1Gi + storageClassName: storageClass1 + volumeMode: Filesystem + volumeGroupNames: [volumeGroup1, placementGroup1] +``` + +If both placement group and volume group with groupSnapshot support are defined, it is possible for the same volume to join both groups. For example, a volume group with groupSnapshot support may include volume members from two placement groups as they belong to the same application. From 2c8adc2c0ffb0f0e22d1c8a8dc55850db621edad Mon Sep 17 00:00:00 2001 From: xing-yang Date: Sat, 4 Jun 2022 21:38:25 -0400 Subject: [PATCH 02/19] Add support for static provisioning --- keps/sig-storage/20200212-volume-group.md | 588 +++++++++++++++++----- 1 file changed, 451 insertions(+), 137 deletions(-) diff --git a/keps/sig-storage/20200212-volume-group.md b/keps/sig-storage/20200212-volume-group.md index 2691cb39e77..45e62ef7568 100644 --- a/keps/sig-storage/20200212-volume-group.md +++ b/keps/sig-storage/20200212-volume-group.md @@ -36,17 +36,33 @@ Volume Group - [Summary](#summary) - [Motivation](#motivation) - [Goals](#goals) -- [Proposal for Consistency Groups and Group Snapshots](#proposal-for-consistency-groups-and-group-snapshots) + - [Non Goals](#non-goals) +- [Proposal for VolumeGroup and VolumeGroupSnapshot](#proposal-for-volumegroup-and-volumegroupsnapshot) + - [Create VolumeGroup](#create-volumegroup) + - [Modify VolumeGroup](#modify-volumegroup) - [Create and Modify VolumeGroup](#create-and-modify-volumegroup) - [Create new PVC and add to the VolumeGroup](#create-new-pvc-and-add-to-the-volumegroup) - [Modify VolumeGroup with existing PVCs](#modify-volumegroup-with-existing-pvcs) - - [Create VolumeGroup from VolumeGroupSnapshot](#create-volumegroup-from-volumegroupsnapshot) + - [Phase 2: Create VolumeGroup from VolumeGroupSnapshot](#phase-2-create-volumegroup-from-volumegroupsnapshot) + - [Pre-provisioned VolumeGroup](#pre-provisioned-volumegroup) - [Create VolumeGroupSnapshot](#create-volumegroupsnapshot) + - [Dynamic provisioning](#dynamic-provisioning) + - [Pre-provisioned VolumeGroupSnapshot](#pre-provisioned-volumegroupsnapshot) - [Delete VolumeGroupSnapshot](#delete-volumegroupsnapshot) - [Restore](#restore) - [API Definitions](#api-definitions) + - [VolumeGroupClass](#volumegroupclass) + - [VolumeGroup](#volumegroup) + - [VolumeGroupContent](#volumegroupcontent) + - [VolumeGroupSnapshotClass](#volumegroupsnapshotclass) + - [VolumeGroupSnapshot](#volumegroupsnapshot) + - [VolumeGroupSnapshotContent](#volumegroupsnapshotcontent) + - [PersistentVolumeClaim and PersistentVolume](#persistentvolumeclaim-and-persistentvolume) + - [VolumeSnapshot and VolumeSnapshotContent](#volumesnapshot-and-volumesnapshotcontent) - [Example Yaml Files](#example-yaml-files) - - [Volume Group Snapshot](#volume-group-snapshot) + - [Create Volume Group](#create-volume-group) + - [Add PVC to VolumeGroup](#add-pvc-to-volumegroup) + - [Create VolumeGroupSnapshot](#create-volumegroupsnapshot-1) - [CSI Changes](#csi-changes) - [CSI Capabilities](#csi-capabilities) - [CSI Controller RPC](#csi-controller-rpc) @@ -108,24 +124,35 @@ Application quiesce is time consuming. Some users may not want to do application ### Goals * Provide an API to manage multiple volumes together in a group. -* Provide an API to support consistency groups for snapshots, ensuring crash consistency across all volumes in the group. -* Provide an API to take a snapshot of a group of volumes, not ensuring crash consistency. -* Provide a design to facilitate volume placement using the group API (To be determined). +* Provide an API to take a snapshot of a group of volumes. * The group API should be generic and extensible so that it may be used to support other features in the future. -* A VolumeGroup may potentially be used to support consistency group replication or group replication in the future, but providing design on replication group is not in the scope of this KEP. This can be discussed in the future. -## Proposal for Consistency Groups and Group Snapshots +### Non Goals + +* A VolumeGroup may potentially be used to support group replication in the future, but providing design on replication group is not in the scope of this KEP. This can be discussed in the future. +* Provide a design to facilitate volume placement using the group API (To be determined). + +## Proposal for VolumeGroup and VolumeGroupSnapshot -This proposal introduces new CRDs VolumeGroup, VolumeGroupClass, and VolumeGroupSnapshot. +This proposal introduces new CRDs VolumeGroup, VolumeGroupContent, VolumeGroupClass, VolumeGroupSnapshot, VolumeGroupSnapshotContent, and VolumeGroupSnapshotClass. + +### Create VolumeGroup Create new VolumeGroup can be done in several ways: -1. Create an empty group first, then create a new PVC with the group name which will add a volume to the already created group. -2. Create an empty group first, and then add an existing PVC to the group one by one. -3. Create a new volume group from an existing group snapshot. -4. Non-goal: Create a new empty group and in the same time create new empty PVCs and add to the new group. + +Phase 1: +1. Create an empty group first, then create a new PVC with the group name. This will create a new volume and add that volume to the already created group. When deleting this volume group, all volumes in the group will be deleted together with the group. A CSI driver supporting VOLUME_GROUP controller capability MUST implement this feature. +2. Create an empty group first, then add an existing PVC to the group one by one. A CSI driver supporting VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME MUST implement this feature. + +Phase 2: +1. Create a new volume group from an existing group snapshot in one step. Design details will be added in a future KEP. +2. Non-goal: Create a new empty group and in the same time create new empty PVCs and add to the new group. + +### Modify VolumeGroup Modify an existing VolumeGroup: -Add new volume or remove existing volume from an existing VolumeGroup. Option 2 for create VolumeGroup above falls into this case. +1. Create a new volume with an existing VolumeGroup name will create a new volume and add it to the group. Option 1 of creating VolumeGroup above falls into this case. As mentioned earlier, a CSI driver supporting VOLUME_GROUP MUST implement this feature. +2. Add an existing volume to an existing VolumeGroup or remove a volume from a VolumeGroup. Option 2 of creating VolumeGroup above falls into this case. As mentioned earlier, a CSI driver supporting VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME MUST implement this feature. ### Create and Modify VolumeGroup @@ -133,48 +160,61 @@ VolumeGroups can be created and/or modified in several ways as described in the #### Create new PVC and add to the VolumeGroup -* Create a new empty VolumeGroup. -* Create a new PVC with existing VolumeGroup name. As a result, new PVC is created and added to VolumeGroup. VolumeGroup is modified so Status has this new PVC in PVCList. +* Admin creates a VolumeGroupClass, with the SupportVolumeGroupSnapshot boolean flag set to true. +* User creates a new empty VolumeGroup, specifying the above VolumeGroupClass. As a result, a new empty VolumeGroupContent will also be created and bound to the VolumeGroup. +* User creates a new PVC with an existing VolumeGroup name created above. As a result, a new PVC is created and added to VolumeGroup. VolumeGroup is modified so Status has this new PVC in PVCList. * External-provisioner will be modified so that VolumeGroupName will be passed to the CSI driver when creating a volume. -Only CSI drivers supporting VOLUMEGROUP capability can support the volume group this way. -When a new PVC is created with the existing VolumeGroup name, the VolumeGroup will be modified and the PVC will be added to PVCList in the Status. +Only CSI drivers supporting VOLUME_GROUP capability can support the volume group this way. -The same PVC can belong to different groups, i.e., different type of groups or different groups of the same type, if the storage system supports it. Storage system will decide whether to support this or not. We don't prevent it in the API or controller directly. +When a new PVC is created with the existing VolumeGroup name, the VolumeGroup will be modified and the PVC will be added to PVCList in the Status, and the VolumeGroupContent will also be modified and the PV will be added to the PVList in the Status. + +The same PVC can belong to different groups, i.e., different types of groups or different groups of the same type, if the storage system supports it. Storage system will decide whether to support this or not. We don't prevent it in the API or controller directly. #### Modify VolumeGroup with existing PVCs -We can add an existing PVC to the group or remove a PVC from the group without deleting it. A VOLUMEGROUP_ADD_REMOVE_EXISTING_VOLUME capability will be added to CSI Spec. Only CSI drivers supporting both VOLUMEGROUP and VOLUMEGROUP_ADD_REMOVE_EXISTING_VOLUME capabilities can support the volume group this way. -* Create a new empty VolumeGroup. +We can add an existing PVC to the group or remove a PVC from the group without deleting it. A VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME capability will be added to CSI Spec. Only CSI drivers supporting both VOLUME_GROUP and VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME capabilities can support the volume group this way. + +* Admin creates a VolumeGroupClass, with the SupportVolumeGroupSnapshot boolean flag set to true. +* User creates a new empty VolumeGroup, specifying the above VolumeGroupClass. A new empty VolumeGroupContent will also be created and bound to the VolumeGroup. * Add an existing PVC to an existing VolumeGroup (VolumeGroup can be empty to start with or it can have other PVCs already) by adding VolumeGroup name to the PVC Spec. * The VolumeGroup name is added by user to each PVC Spec, not by the VolumeGroup controller. The VolumeGroup controller watches PVCs and reacts to the PVC updated with a VolumeGroup name event as described in the following step. -* VolumeGroup is modified so the existing PVC is added to the PVCList in the Status. +* VolumeGroup is modified so the existing PVC is added to the PVCList in the Status. VolumeGroupContent is also modified so the PV is added to the PVList in the Status. * Note: The VolumeGroup controller will be implemented to have a desired state of the world and an actual state of the world. The desired state of the world contains VolumeGroups with the desired PVCList while the actual state of the world contains VolumeGroups with the actual PVCList. The controller will try to reconcile the two by handling adding and removing multiple PVCs through a single CSI ModifyVolumeGroup RPC call each time. -* External-provisioner will be modified to update the status of PVC. -* VolumeGroup controller will be triggered to update the VolumeGroup Status. +* External-provisioner will be modified to update the status of PVC and PV. +* VolumeGroup controller will be triggered to update the VolumeGroup Status and VolumeGroupContent Status. * If one volume fails to be added, it should not affect it if it is used by a pod, but there will be error messages. -* Deleting a PVC from a VolumeGroup will trigger external-provisioner and the VolumeGroup controller as well. +* Removing a PVC from a VolumeGroup will trigger the external-provisioner and the VolumeGroup controller as well. + +#### Phase 2: Create VolumeGroup from VolumeGroupSnapshot + +This is in Phase 2 so won't be discussed in detail here. Creating a new volume group from an existing group snapshot will be supported in Phase 2 if the CSI driver supports VOLUME_GROUP_FROM_GROUP_SNAPSHOT capability. As a result, PVCs will be created from source snapshots and placed in a new volume group. -#### Create VolumeGroup from VolumeGroupSnapshot +#### Pre-provisioned VolumeGroup -Creating a new volume group from an existing group snapshot is supported if the CSI driver supports VOLUMEGROUP capability. As a result, PVCs will be created from source snapshots and placed in a new volume group. +Admin can create a VolumeGroupContent, specifying an existing VolumeGroupHandle in the storage system and specifying a VolumeGroup name and namespace. Then create a VolumeGroup that points to the VolumeGroupContent name. + +The VolumeGroup controller will retrieve all volumeHandles in the VolumeGroup from the CSI driver, create PVs pointing to the volumeHandles, and create PVCs pointing to the PVs. ### Create VolumeGroupSnapshot -A VolumeGroupSnapshot can be created with a VolumeGroup as the source if the CSI driver supports the GROUPSNAPSHOT capability. -* Create a VolumeGroupSnapshot with a VolumeGroup as the source. -* This will trigger the VolumeGroupSnapshot controller to call the CreateVolumeGroupSnapshot CSI function and also create multiple VolumeSnapshot API objects with VolumeGroupSnapshot name parameter in each VolumeSnapshot Spec. This will trigger the creation of VolumeSnapshotContent API objects in the snapshot controller and calls to the CreateSnapshot CSI function in the CSI snapshotter sidecar. The CSI snapshotter sidecar will pass the new group_snapshot_name parameter to the CSI Driver when calling CreatSnapshot. +A VolumeGroupSnapshot can be created with a VolumeGroup as the source if the CSI driver supports the GROUP_SNAPSHOT capability. + +#### Dynamic provisioning + +* Admin creates a VolumeGroupSnapshotClass. +* User creates a VolumeGroupSnapshot with a VolumeGroup as the source. +* This will trigger the VolumeGroupSnapshot controller to create a VolumeGroupSnapshotContent API object, and also call the CreateVolumeGroupSnapshot CSI function and also create multiple VolumeSnapshot API objects with VolumeGroupSnapshot name parameter in each VolumeSnapshot Status. This will trigger the creation of VolumeSnapshotContent API objects in the snapshot controller and calls to the CreateSnapshot CSI function in the CSI snapshotter sidecar. The CSI snapshotter sidecar will pass the new group_snapshot_name parameter to the CSI Driver when calling CreatSnapshot. * When CSI driver receives CreateSnapshot request for individual snapshots with a VolumeGroupSnapshot name: - * Case 1: If it knows how to create a group snapshot on the storage system, it returns (nil, nil), and leave it to the CreateVolumeGroupSnapshot function to handle the snapshot creation. - * Case 2: If it does not know how to create a group snapshot on the storage system, it will create an individual snapshot as usual and return the snapshot_id back. + * If it knows how to create a group snapshot on the storage system, it returns (nil, nil), and leaves it to the CreateVolumeGroupSnapshot function to handle the snapshot creation. * CreateVolumeGroupSnapshot CSI function response - * Case 1: The CreateVolumeGroupSnapshot CSI function should return a list of snapshots (Snapshot message defined in CSI Spec) in its response. The VolumeGroupSnapshot controller can use the returned list of snapshots to update corresponding individual VolumeSnapshotContents, wait for VolumeSnapshots and VolumeSnapshotContents to be bound, and update SnapshotList in the VolumeGroupSnapshot Status. - * Case 2: The CreateVolumeGroupSnapshot CSI function returns group_snapshot_id and volume_group_id, but leaves snapshots field as empty. The VolumeGroupSnapshot controller watches VolumeSnapshot and VolumeSnapshotContent API objects. If a VolumeSnapshot's volumeGroupSnapshotName field matches the VolumeGroupSnapshot name that is being created, it is an individual snapshot that belongs to the VolumeGroupSnapshot. When VolumeSnapshot and VolumeSnapshotContent are bound, it saves the VolumeSnapshot API object to SnapshotList in its Status. + * The CreateVolumeGroupSnapshot CSI function should return a list of snapshots (Snapshot message defined in CSI Spec) in its response. The VolumeGroupSnapshot controller can use the returned list of snapshots to update corresponding individual VolumeSnapshotContents, wait for VolumeSnapshots and VolumeSnapshotContents to be bound, and update SnapshotList in the VolumeGroupSnapshot Status and SnapshotContentList in the VolumeGroupSnapshotContent Status. + apiVersion: snapshot.storage.k8s.io/v1 ``` kind: VolumeSnapshot @@ -184,33 +224,51 @@ spec: volumeSnapshotClassName: snapClass1 source: persistentVolumeClaimName: pvc1 +status: volumeGroupSnapshotName: groupSnapshot1 ``` -* An admissions controller or finalizer should be added to prevent an individual snapshot from being deleted that belongs to a GroupSnapshot. -* Since some storage systems require individual snapshots while others can only return a single group snapshot but not individual snapshots, we propose the following solution: - * In VolumeGroupSnapshotStatus, if ReadyToUse is true and SnapshotList is empty, the VolumeGroupSnapshot Controller assumes the storage system does not return individual snapshots. - * If ReadyToUse is true and SnapshotList in not empty, the VolumeGroupSnapshot Controller knows there are individual snapshots created for this group. Those individual snapshots may be used as readonly, but they cannot be removed from the GroupSnapshot. - * In the CSI Spec, this means repeated .csi.v1.Snapshot snapshots in VolumeGroupSnapshot message from CreateVolumeGroupSnapshotResponse should be optional, not required. - * How to use the VolumeGroupSnapshot if individual snapshots are not returned? How can we create a volume from a snapshot if there are no individual snapshots? `snapshots` is optional while `group_snapshot_id` is required in VolumeGroupSnapshot message in CSI so it is fine to only specify `group_snapshot_id` not `snapshots` when creating a VolumeGroup from a VolumeGroupSnapshot. However, CSI Driver MUST return a list of `volumes` that are restored in `CreateVolumeGroupResponse`. + +* An admissions controller or finalizer should be added to prevent an individual snapshot from being deleted that belongs to a VolumeGroupSnapshot. +* Since some storage systems require individual snapshots while others can only return a single group snapshot but not individual snapshots, we propose a two phase solution. + * In Phase 1, since we do not support creating a VolumeGroup directly from a VolumeGroupSnapshot, it is required for individual snapshots to be returned along with the group snapshot. + * In Phase 2, we plan to support creating a VolumeGroup directly from a VolumeGroupSnapshot. We propose the following solution for Phase 2: + * In VolumeGroupSnapshotStatus, if ReadyToUse is true and SnapshotList is empty, the VolumeGroupSnapshot Controller assumes the storage system does not return individual snapshots. + * If ReadyToUse is true and SnapshotList is not empty, the VolumeGroupSnapshot Controller knows there are individual snapshots created for this group. Those individual snapshots may be used as readonly, but they cannot be removed from the VolumeGroupSnapshot. + * In the CSI Spec, this means repeated .csi.v1.Snapshot snapshots in VolumeGroupSnapshot message from CreateVolumeGroupSnapshotResponse should be optional, not required. + * How to use the VolumeGroupSnapshot if individual snapshots are not returned? How can we create a volume from a snapshot if there are no individual snapshots? `snapshots` is optional while `group_snapshot_id` is required in VolumeGroupSnapshot message in CSI so it is fine to only specify `group_snapshot_id` not `snapshots` when creating a VolumeGroup from a VolumeGroupSnapshot. However, CSI Driver MUST return a list of `volumes` that are restored in `CreateVolumeGroupResponse`. + +#### Pre-provisioned VolumeGroupSnapshot + +Admin can create a VolumeGroupSnapshotContent, specifying an existing VolumeGroupSnapshotHandle in the storage system and specifying a VolumeGroupSnapshot name and namespace. Then create a VolumeGroupSnapshot that points to the VolumeGroupSnapshotContent name. + +The VolumeGroupSnapshot controller will retrieve all volumeSnapshotHandles in the Volume Group Snapshot from the CSI driver, create VolumeSnapshotContents pointing to the volumeSnapshotHandles, and create VolumeSnapshots pointing to the VolumeSnapshotContents. ### Delete VolumeGroupSnapshot -A VolumeGroupSnapshot can be deleted if the CSI driver supports the GROUPSNAPSHOT capability. -* When a VolumeGroupSnapshot is deleted, the VolumeGroupSnapshot controller will call the DeleteVolumeGroupSnapshot CSI function as well as DeleteSnapshot CSI functions. Just like create snapshot, there are 2 cases. - * Case 1: Since CSI driver handles individual snapshot creation in CreateVolumeGroupSnapshot, it should handle individual snapshot deletion in DeleteVolumeGroupSnapshot. - * Case 2: Since CSI driver handles individual snapshot creation in CreateSnapshot, it should handle individual snapshot deletion in DeleteSnapshot. +A VolumeGroupSnapshot can be deleted if the CSI driver supports the GROUP_SNAPSHOT capability. +* When a VolumeGroupSnapshot is deleted, the VolumeGroupSnapshot controller will call the DeleteVolumeGroupSnapshot CSI function as well as DeleteSnapshot CSI functions. + * Since CSI driver handles individual snapshot creation in CreateVolumeGroupSnapshot, it should handle individual snapshot deletion in DeleteVolumeGroupSnapshot. * DeleteSnapshot on a single snapshot that belongs to a group snapshot is not allowed. ### Restore Restore can be done as follows: -1. A new empty volume group can be created first, and then a new volume can be created from a snapshot one by one and added to the volume group. This can be repeated for all the snapshots in the VolumeGroupSnapshot. -2. A VolumeGroup can be created from a VolumeGroupSnapshot source in one step. This is the same as what is described in the section `Create VolumeGroup from VolumeGroupSnapshot`. + +Phase 1: + +* A new empty volume group can be created first, and then a new volume can be created from a snapshot one by one and added to the volume group. This can be repeated for all the snapshots in the VolumeGroupSnapshot. + +Phase 2: + +* A VolumeGroup can be created from a VolumeGroupSnapshot source in one step. This is the same as what is described in the section `Create VolumeGroup from VolumeGroupSnapshot`. + ### API Definitions API definitions are as follows: +#### VolumeGroupClass + ``` type VolumeGroupClass struct { metav1.TypeMeta @@ -221,7 +279,7 @@ type VolumeGroupClass struct { // This value may not be empty. Driver string - // Parameters holds parameters for driver. + // Parameters hold parameters for the driver. // These values are opaque to the system and are passed directly // to the driver. // +optional @@ -230,14 +288,13 @@ type VolumeGroupClass struct { // This field specifies whether group snapshot is supported. // The default is false. // +optional - VolumeGroupSnapshot *bool - - // Specifies whether consistent group snapshot is supported. - // The default is false. - // +optional - ConsistentGroupSnapshot *bool + SupportVolumeGroupSnapshot *bool } +``` +#### VolumeGroup + +``` // VolumeGroup is a user's request for a group of volumes type VolumeGroup struct { metav1.TypeMeta @@ -258,44 +315,43 @@ Type VolumeGroupSpec struct { // +optional VolumeGroupClassName *string - // If InitSource is nil, an empty volume group will be created. - // Otherwise, a volume group will be created with PVCs. - // If SourceVolumeGroupSnapshotName is not nil, the volume group - // will be created from the source VolumeGroupSnapshot. - // This field determines what PVCs will be in the volume group - // when it is initially created. PVCs can be added to or removed - // from the volume group later if CSI driver supports - // VOLUMEGROUP_ADD_REMOVE_EXISTING_VOLUME. + // VolumeGroupContentName is the binding reference to the VolumeGroupContent + // backing this VolumeGroup // +optional - InitSource *VolumeGroupSource + VolumeGroupContentName *string + + // Phase 2 + // +optional + VolumeGroupSource *VolumeGroupSource } -// VolumeGroupSource contains 1 option SourceVolumeGroupSnapshotName. -// Make SourceVolumeGroupSnapshotName a pointer to allow new optional -// source to be added in the future. +// Phase 2: VolumeGroupSource will be in Phase 2 +// VolumeGroupSource contains 3 options. If VolumeGroupSource is not nil, +// one of the 3 options must be defined. Type VolumeGroupSource struct { - // If specified, the VolumeGroup will be created from the source - // VolumeGroupSnapshot. + // A list of existing persistent volume claims // +optional - SourceVolumeGroupSnapshotName *string -} + PVCList []PersistentVolumeClaim -type VolumeGroupStatus struct { - // VolumeGroupId is a unique id returned by the CSI driver - // to identify the VolumeGroup on the storage system. - // If a storage system does not provide such an id, the - // CSI driver can choose to return the VolumeGroup name. - // +optional - VolumeGroupId *string + // A label query over existing persistent volume claims to be added to the volume group. + // +optional + Selector *metav1.LabelSelector - // +optional + // This field specifies the source of a volume group. (this is for restore) + // Supported Kind is VolumeGroupSnapshot + // +optional + GroupDataSource *TypedLocalObjectReference + } + +type VolumeGroupStatus struct { + // +optional GroupCreationTime *metav1.Time // A list of persistent volume claims // +optional PVCList []PersistentVolumeClaim - // +optional + // +optional Ready *bool // Last error encountered during group creation @@ -313,7 +369,97 @@ type VolumeGroupError struct { // +optional Message *string } +``` + +#### VolumeGroupContent + +``` +// VolumeGroupContent represents a group of volumes on the storage backend +type VolumeGroupContent struct { + metav1.TypeMeta + // +optional + metav1.ObjectMeta + + // Spec defines the volume group requested by a user + Spec VolumeGroupContentSpec + + // Status represents the current information about a volume group + // +optional + Status *VolumeGroupContentStatus +} + +// VolumeGroupContentSpec +Type VolumeGroupContentSpec struct { + // +optional + VolumeGroupClassName *string + + // +optional + // VolumeGroupRef is part of a bi-directional binding between VolumeGroup and VolumeGroupContent. + VolumeGroupRef *core_v1.ObjectReference + + // +optional + Source *VolumeGroupContentSource + + // +optional + VolumeGroupDeletionPolicy *VolumeGroupDeletionPolicy +} + +// VolumeGroupContentSource +Type VolumeGroupContentSource struct { + // Required + Driver string + + // VolumeGroupHandle is the unique volume group name returned by the + // CSI volume plugin’s CreateVolumeGroup to refer to the volume group on + // all subsequent calls. + // Required. + VolumeGroupHandle string + + // +optional + // Attributes of the volume group to publish. + VolumeGroupAttributes map[string]string +} + +type VolumeGroupContentStatus struct { + // +optional + GroupCreationTime *metav1.Time + + // A list of persistent volumes + // +optional + PVList []PersistentVolume + + // +optional + Ready *bool + + // Last error encountered during group creation + // +optional + Error *VolumeGroupError +} +``` + +#### VolumeGroupSnapshotClass + +``` +type VolumeGroupSnapshotClass struct { + metav1.TypeMeta + // +optional + metav1.ObjectMeta + // Driver is the driver expected to handle this VolumeGroupSnapshotClass. + // This value may not be empty. + Driver string + + // Parameters hold parameters for the driver. + // These values are opaque to the system and are passed directly + // to the driver. + // +optional + Parameters map[string]string +} +``` + +#### VolumeGroupSnapshot + +``` // VolumeGroupSnapshot is a user's request for taking a group snapshot. type VolumeGroupSnapshot struct { metav1.TypeMeta `json:",inline"` @@ -331,52 +477,183 @@ type VolumeGroupSnapshot struct { // VolumeGroupSnapshotSpec describes the common attributes of a group snapshot type VolumeGroupSnapshotSpec struct { + // +optional + VolumeSnapshotClassName *string + // Source has the information about where the group snapshot is created from. - // Supported Kind is VolumeGroup - // Required. - Source TypedLocalObjectReference `json:"source" protobuf:"bytes,1,opt,name=source"` + // Required. + Source VolumeGroupSnapshotSource +} + +// OneOf VolumeGroupName or VolumeGroupSnapshotContentName +Type VolumeGroupSnapshotSource struct { + // +optional + // Dynamically provisioned VolumeGroupSnapshot + VolumeGroupName *string + + // +optional + // Pre-provisioned VolumeGroupSnapshot + VolumeGroupSnapshotContentName *string } Type VolumeGroupSnapshotStatus struct { - // VolumeGroupSnapshotId is a unique id returned by the CSI driver + // +optional + BoundVolumeGroupSnapshotContentName *string + + // ReadyToUse becomes true when ReadyToUse on all individual snapshots become true + // +optional + ReadyToUse *bool + + // +optional + CreationTime *metav1.Time + + // +optional + Error *VolumeGroupSnapshotError + + // List of volume snapshots + // +optional + SnapshotList []VolumeSnapshot +} + +// Describes an error encountered on the group snapshot +type VolumeGroupSnapshotError struct { + // time is the timestamp when the error was encountered. + // +optional + Time *metav1.Time + + // message details the encountered error + // +optional + Message *string +} +``` + +#### VolumeGroupSnapshotContent + +``` +// VolumeGroupSnapshotContent +type VolumeGroupSnapshotContent struct { + metav1.TypeMeta `json:",inline"` + // Standard object's metadata. + // +optional + metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` + + // Spec defines the desired characteristics of a group snapshot content + Spec VolumeGroupSnapshotContentSpec `json:"spec" protobuf:"bytes,2,opt,name=spec"` + + // Status represents the latest observed state of the group snapshot content + // +optional + Status *VolumeGroupSnapshotContentStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"` +} + +// VolumeGroupSnapshotContentSpec describes the common attributes of a group snapshot content +type VolumeGroupSnapshotContentSpec struct { + // Required + // VolumeGroupSnapshotRef specifies the VolumeGroupSnapshot object + // to which this VolumeGroupSnapshotContent object is bound. + VolumeGroupSnapshotRef core_v1.ObjectReference + + // Required + DeletionPolicy DeletionPolicy + + // Required + Driver string + + // +optional + VolumeGroupSnapshotClassName *string + + // Required + Source VolumeGroupSnapshotContentSource +} + +// OneOf +type VolumeGroupSnapshotContentSource struct { + // Dynamical provisioning of VolumeGroupSnapshot + // +optional + VolumeGroupHandle *string + + // Pre-provisioned VolumeGroupSnapshot + // +optional + VolumeGroupSnapshotHandle *string +} + +Type VolumeGroupSnapshotContentStatus struct { + // VolumeGroupSnapshotHandle is a unique id returned by the CSI driver // to identify the VolumeGroupSnapshot on the storage system. // If a storage system does not provide such an id, the // CSI driver can choose to return the VolumeGroupSnapshot name. - // +optional - VolumeGroupSnapshotID *string + // +optional + VolumeGroupSnapshotHandle *string // ReadyToUse becomes true when ReadyToUse on all individual snapshots become true // +optional ReadyToUse *bool - // List of volume snapshots - // +optional - SnapshotList []VolumeSnapshot + // +optional + CreationTime *int64 + + // +optional + Error *VolumeGroupSnapshotError + + // List of volume group snapshot contents + // +optional + VolumeSnapshotContentList []VolumeSnapshotContent } +``` +#### PersistentVolumeClaim and PersistentVolume + +For PersistentVolumeClaim, the user can request it to be added to be VolumeGroup. So VolumeGorupNames will be in both Spec and Status. + +``` type PersistentVolumeClaimSpec struct { ...... - // +optional + // +optional VolumeGroupNames []string ...... } +type PersistentVolumeClaimStatus struct { + ...... + // +optional + VolumeGroupNames []string + ...... +} -type VolumeSnapshotSpec struct{ +type PersistentVolumeStatus struct { ...... - // +optional + // +optional + VolumeGroupContentNames []string + ...... +} +``` + +#### VolumeSnapshot and VolumeSnapshotContent + +For VolumeSnapshot, we cannot request a VolumeSnapshot to be added to be VolumeGroupSnapshot, therefore VolumeGroupSnapshotName is only in the Status but not in the Spec. + +``` +type VolumeSnapshotStatus struct{ + ...... + // +optional VolumeGroupSnapshotName *string ...... } + +type VolumeSnapshotContentStatus struct{ + ...... + // +optional + VolumeGroupSnapshotContentName *string + ...... +} ``` ### Example Yaml Files -#### Volume Group Snapshot +#### Create Volume Group -Example yaml files to define a VolumeGroupClass and VolumeGroup are in the following. +Example yaml files to create a VolumeGroupClass and a VolumeGroup are in the following. -A VolumeGroupClass that supports groupSnapshot: +Create a VolumeGroupClass that supports volumeGroupSnapshot: ``` apiVersion: volumegroup.storage.k8s.io/v1alpha1 kind: VolumeGroupClass @@ -385,10 +662,10 @@ metadata: spec: parameters: …... - groupSnapshot: true + supportVolumeGroupSnapshot: true ``` -A VolumeGroup belongs to this VolumeGroupClass: +Create a VolumeGroup belongs to this VolumeGroupClass: ``` apiVersion: volumegroup.storage.k8s.io/v1alpha1 kind: VolumeGroup @@ -398,26 +675,14 @@ spec: volumeGroupClassName: volumeGroupClass1 ``` -A VolumeGroupSnapshot taken from the VolumeGroup: -``` -apiVersion: volumegroup.storage.k8s.io/v1alpha1 -kind: VolumeGroupSnapshot -metadata: - name: my-group-snapshot -spec: - source: - name: volumeGroup1 - kind: VolumeGroup - apiGroup: volumegroup.storage.k8s.io -``` +#### Add PVC to VolumeGroup -A PVC that belongs to the volume group which supports groupSnapshot: +Create a PVC that belongs to the volume group which supports volumeGroupSnapshot: ``` apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pvc1 - annotations: spec: accessModes: - ReadWriteOnce @@ -430,37 +695,75 @@ spec: volumeGroupNames: [volumeGroup1] ``` -A new external VolumeGroup controller will handle VolumeGroupClass and VolumeGroup resources. +#### Create VolumeGroupSnapshot + +Create a VolumeGroupSnapshotClass: +``` +apiVersion: volumegroup.storage.k8s.io/v1alpha1 +kind: VolumeGroupSnapshotClass +metadata: + name: volumeGroupSnapshotClass1 +spec: + parameters: + …... +``` + +A VolumeGroupSnapshot taken from the VolumeGroup dynamically: +``` +apiVersion: volumegroup.storage.k8s.io/v1alpha1 +kind: VolumeGroupSnapshot +metadata: + name: my-group-snapshot +spec: + source: + volumeGroupName: volumeGroup1 + volumeGroupSnapshotClassName: volumeGroupSnapshotClass1 +``` + +A new external VolumeGroup controller will handle VolumeGroupClass, VolumeGroup, and VolumeGroupContent resources. We may need to split this into two controllers, one common controller that handles common functions such as binding, and one sidecar controller that calls the CSI driver. + External provisioner will be modified to read information from volume groups (through volumeGroupNames) and pass them down to the CSI driver. +A new external VolumeGroupSnapshot controller will handle VolumeGroupSnapshotClass, VolumeGroupSnapshot, and VolumeGroupSnapshotContent resources. We may need to split this into two controllers, one common controller that handles common functions such as binding, and one sidecar controller that calls the CSI driver. + +Snapshot controller will be modified to update VolumeSnapshot status. External snapshotter sidecar will be modified to update VolumeSnapshotContent status. + ### CSI Changes #### CSI Capabilities -New controller capabilities VOLUMEGROUP, VOLUMEGROUP_ADD_REMOVE_EXISTING_VOLUME, GROUPSNAPSHOT, MODIFY_VOLUME, and INDIVIDUAL_SNAPSHOT_RESTORE will be added. +New controller capabilities VOLUME_GROUP, VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME, GROUP_SNAPSHOT, INDIVIDUAL_SNAPSHOT_RESTORE, GET_VOLUME_GROUP, GET_VOLUME_GROUP_SNAPSHOT, LIST_VOLUME_GROUPS, LIST_VOLUME_GROUP_SNAPSHOTS will be added. -* VOLUMEGROUP +* VOLUME_GROUP: Indicates that the controller plugin supports creating and deleting a volume group. -* VOLUMEGROUP_ADD_REMOVE_EXISTING_VOLUME - Indicates that the controller plugin supports adding an existing volume to a - volume group and removing a volume from a volume group without deleting it. +* VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME: + Indicates that the controller plugin supports adding an existing volume to a volume + group and removing a volume from a volume group without deleting it. -* GROUPSNAPSHOT +* GROUP_SNAPSHOT: Indicates that the controller plugin supports creating a snapshot of all volumes in a volume group. -* CONSISTENT_GROUPSNAPSHOT - Indicates that the controller plugin supports creating a consistent snapshot of - all volumes in a volume group. - -* MODIFY_VOLUME - Indicates that the controller plugin supports modifying a volume. - -* INDIVIDUAL_SNAPSHOT_RESTORE +* INDIVIDUAL_SNAPSHOT_RESTORE: Indicates whether the controller plugin supports creating a volume from an individual volume snapshot if the volume snapshot is part of a VolumeGroupSnapshot. Use cases: selective restore, advanced recovery, etc. + Note: In Phase 1, this is the only way to restore after taking a group snapshot. + User can create a volume from a volume snapshot for all the individual snapshots + created along the group snapshot. + +* GET_VOLUME_GROUP: + Indicates that the controller plugin supports getting details of a volume group. + +* GET_VOLUME_GROUP_SNAPSHOT: + Indicates that the controller plugin supports getting details of a volume group snapshot. + +* LIST_VOLUME_GROUPS: + Indicates that the controller plugin supports getting details of a list of volume groups. + +* LIST_VOLUME_GROUP_SNAPSHOTS: + Indicates that the controller plugin supports getting details of a list of volume group snapshots. #### CSI Controller RPC @@ -518,15 +821,16 @@ service Controller { #### CreateVolumeGroup This RPC will be called by the CO to create a new volume group on behalf of a user. -This operation MUST be idempotent. If a volume corresponding to the specified volume name already exists, is compatible with the specified parameters in the CreateVolumeGroupRequest, the Plugin MUST reply 0 OK with the corresponding CreateVolumeGroupResponse. +This operation MUST be idempotent. If a volume group corresponding to the specified volume group name already exists, is compatible with the specified parameters in the CreateVolumeGroupRequest, the Plugin MUST reply 0 OK with the corresponding CreateVolumeGroupResponse. CSI Plugins MAY create the following types of volume groups: * Create a new empty volume group. + * After the empty group is created, create a new volume, specifying the group name in the volume. * At restore time, create a single volume from individual snapshot and then join an existing group. - * Create an empty group - * Create a volume from snapshot in the group -* Create a new volume group from a source group snapshot. -* Create a new volume group and add a list of existing volumes to the group. + * Create an empty group. + * Create a volume from snapshot, specifying the group name in the volume. +* Phase 2: Create a new volume group from a source group snapshot. +* Phase 2: Create a new volume group and add a list of existing volumes to the group. The following is non-goal: * Non goal: Create a new group and at the same time create a list of new volumes in the group. @@ -551,13 +855,15 @@ message CreateVolumeGroupRequest { // section on how to use this field. map secrets = 3 [(csi_secret) = true]; + // Phase 2 // If specified, a volume group will be created from the source group snapshot. // This field is OPTIONAL. - VolumeGroupSnapshot source_volume_group_snapshot = 4; + // VolumeGroupSnapshot source_volume_group_snapshot = 4; + // Phase 2 // If specified, a volume group will be created from a list of existing volumes. // This field is OPTIONAL. - repeated string volume_id = 5; + // repeated string volume_id = 5; } message CreateVolumeGroupResponse { @@ -580,10 +886,13 @@ message VolumeGroup { map volume_group_context = 2; // Underlying volumes in this group. The same definition in CSI Volume. - // This field is OPTIONAL to support the creation of an empty group. - // However, this field is REQUIRED in the following cases: - // - Create a new volume group from a source group snapshot. - // - Create a new volume group and add a list of existing volumes to the group. + // This field is REQUIRED. + // To support the creation of an empty group, this list can be empty. + // However, this field is not empty in the following cases: + // - Response from ListVolumeGroups or GetVolumeGroup if the VolumeGroup is not empty. + // - Response from ModifyVolumeGroup if the VolumeGroup is not empty after modification. + // - Phase 2: Create a new volume group from a source group snapshot. + // - Phase 2: Create a new volume group and add a list of existing volumes to the group. repeated .csi.v1.Volume volumes = 3; } ``` @@ -593,7 +902,7 @@ message VolumeGroup { 1. When a new volume is created with a volume group id parameter, the volume will be created and added to the existing volume group. 2. A new volume can also be created without a volume group id parameter. It can be added to a volume group later through the ModifyVolumeGroup RPC. -Note that for filesystems based storage systems, only option 1 can be supported. For block based storage systems. Both option 1 and 2 may be supported. However there is a possibility that option 2 will not work for ConsistencyGroups as the volume is created without the consideration of which group the volume will be placed in. +Note that for filesystems based storage systems, only option 1 can be supported. For block based storage systems. Both option 1 and 2 may be supported. However there is a possibility that option 2 will not work for consistency groups as the volume is created without the consideration of which group the volume will be placed in. CSI Spec does not determine whether a group is consistent or not. It is up to the storage provider to decide whether a consistent group can be supported or not and clarify that in vendor specific documentation. ``` message CreateVolumeRequest { @@ -631,7 +940,9 @@ This RPC will be called by the CO to modify an existing volumegroup on behalf of To support ModifyVolumeGroup, the Kubernetes VolumeGroup controller will be implemented to have a desired state of the world and an actual state of the world. The desired state of the world contains VolumeGroups with the desired PVCList while the actual state of the world contains VolumeGroups with the actual PVCList. The controller will try to reconcile the two by handling adding and removing multiple PVCs through a single CSI RPC call each time. -Note that filesystems based storage systems may not be able to support this RPC. For block based storage systems, this is a very convenient method. However, it may not satisfy the requirement for consistency as the volume is created without the knowledge of which group it is placed in. +Note that filesystems based storage systems may not be able to support this RPC. For block based storage systems, this is a very convenient method. However, it may not satisfy the requirement for consistency as the volume is created without the knowledge of which group it is placed in. It is out of the scope of the CSI spec to determine whether a group is consistent or not. It is up to the storage provider to clarify that in the vendor specific documentation. + +CSI drivers supporting VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME MUST implement ModifyVolumeGroup RPC. ``` message ModifyVolumeGroupRequest { @@ -1017,6 +1328,9 @@ type VolumeGroupClass struct { ModifyVolume CSI RPC was considered earlier to add/remove one volume to/from a group at a time but it was removed because ModifyVolumeGroup CSI RPC was added. +A new MODIFY_VOLUME capability will be added to support this. +It indicates that the controller plugin supports modifying a volume. + ``` rpc ModifyVolume(ModifyVolumeRequest) returns (ModifyVolumeResponse) { From c2cbadd7488803c2b773a7339840c37632b2bceb Mon Sep 17 00:00:00 2001 From: xing-yang Date: Wed, 7 Sep 2022 17:54:01 -0400 Subject: [PATCH 03/19] Update to the new KEP format --- keps/prod-readiness/sig-storage/3476.yaml | 3 + .../README.md} | 323 +++++++++++++++--- keps/sig-storage/3476-volume-group/kep.yaml | 35 ++ 3 files changed, 317 insertions(+), 44 deletions(-) create mode 100644 keps/prod-readiness/sig-storage/3476.yaml rename keps/sig-storage/{20200212-volume-group.md => 3476-volume-group/README.md} (80%) create mode 100644 keps/sig-storage/3476-volume-group/kep.yaml diff --git a/keps/prod-readiness/sig-storage/3476.yaml b/keps/prod-readiness/sig-storage/3476.yaml new file mode 100644 index 00000000000..a4249a84c8f --- /dev/null +++ b/keps/prod-readiness/sig-storage/3476.yaml @@ -0,0 +1,3 @@ +kep-number: 3476 +alpha: + approver: "@wojtek-t" diff --git a/keps/sig-storage/20200212-volume-group.md b/keps/sig-storage/3476-volume-group/README.md similarity index 80% rename from keps/sig-storage/20200212-volume-group.md rename to keps/sig-storage/3476-volume-group/README.md index 45e62ef7568..b5bdfcfa56d 100644 --- a/keps/sig-storage/20200212-volume-group.md +++ b/keps/sig-storage/3476-volume-group/README.md @@ -1,38 +1,9 @@ ---- -title: Volume Group -authors: - - "@xing-yang" - - "@jingxu97" -owning-sig: sig-storage -participating-sigs: - - sig-storage -reviewers: - - "@msau42" - - "@saad-ali" - - "@thockin" -approvers: - - "@msau42" - - "@saad-ali" - - "@thockin" -editor: TBD -creation-date: 2020-02-12 -last-updated: 2022-03-24 -status: provisional -see-also: - - n/a -replaces: - - n/a -superseded-by: - - n/a ---- - -# Title - -Volume Group +# KEP-3476: Volume Group and Group Snapshot ## Table of Contents +- [Release Signoff Checklist](#release-signoff-checklist) - [Summary](#summary) - [Motivation](#motivation) - [Goals](#goals) @@ -84,8 +55,61 @@ Volume Group - [ModifyVolume](#modifyvolume) - [Create VolumeGroup with Selector](#create-volumegroup-with-selector) - [Example Yaml Files for Volume Placement](#example-yaml-files-for-volume-placement) +- [Graduation Criteria](#graduation-criteria) + - [Alpha](#alpha) + - [Alpha -> Beta](#alpha---beta) + - [Beta -> GA](#beta---ga) +- [Test Plan](#test-plan) + - [Unit tests](#unit-tests) + - [E2E tests](#e2e-tests) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature enablement and rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) +- [Implementation History](#implementation-history) +## Release Signoff Checklist + + + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [x] (R) Design details are appropriately documented +- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input +- [x] (R) Graduation criteria is in place +- [x] (R) Production readiness review completed +- [ ] Production readiness review approved +- [x] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + + + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + ## Summary This proposal is to introduce a VolumeGroup API to manage multiple volumes together and a VolumeGroupSnapshot API to take a snapshot of a VolumeGroup. It also attempts to address other use cases such as volume placement. @@ -140,18 +164,18 @@ This proposal introduces new CRDs VolumeGroup, VolumeGroupContent, VolumeGroupCl Create new VolumeGroup can be done in several ways: -Phase 1: -1. Create an empty group first, then create a new PVC with the group name. This will create a new volume and add that volume to the already created group. When deleting this volume group, all volumes in the group will be deleted together with the group. A CSI driver supporting VOLUME_GROUP controller capability MUST implement this feature. +Phase 1 (Note: only Phase 1 will be covered in this KEP which is targeting Alpha in K8s v1.26): +1. Create an empty group first, then create a new PVC with the group name. This will create a new volume and add that volume to the already created group. When deleting this volume group, all volumes in the group will be deleted together with the group. A CSI driver supporting CREATE_DELETE_VOLUME_GROUP controller capability MUST implement this feature. 2. Create an empty group first, then add an existing PVC to the group one by one. A CSI driver supporting VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME MUST implement this feature. -Phase 2: +Phase 2 (After v1.26): 1. Create a new volume group from an existing group snapshot in one step. Design details will be added in a future KEP. 2. Non-goal: Create a new empty group and in the same time create new empty PVCs and add to the new group. ### Modify VolumeGroup Modify an existing VolumeGroup: -1. Create a new volume with an existing VolumeGroup name will create a new volume and add it to the group. Option 1 of creating VolumeGroup above falls into this case. As mentioned earlier, a CSI driver supporting VOLUME_GROUP MUST implement this feature. +1. Create a new volume with an existing VolumeGroup name will create a new volume and add it to the group. Option 1 of creating VolumeGroup above falls into this case. As mentioned earlier, a CSI driver supporting CREATE_DELETE_VOLUME_GROUP MUST implement this feature. 2. Add an existing volume to an existing VolumeGroup or remove a volume from a VolumeGroup. Option 2 of creating VolumeGroup above falls into this case. As mentioned earlier, a CSI driver supporting VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME MUST implement this feature. ### Create and Modify VolumeGroup @@ -165,7 +189,7 @@ VolumeGroups can be created and/or modified in several ways as described in the * User creates a new PVC with an existing VolumeGroup name created above. As a result, a new PVC is created and added to VolumeGroup. VolumeGroup is modified so Status has this new PVC in PVCList. * External-provisioner will be modified so that VolumeGroupName will be passed to the CSI driver when creating a volume. -Only CSI drivers supporting VOLUME_GROUP capability can support the volume group this way. +Only CSI drivers supporting CREATE_DELETE_VOLUME_GROUP capability can support the volume group this way. When a new PVC is created with the existing VolumeGroup name, the VolumeGroup will be modified and the PVC will be added to PVCList in the Status, and the VolumeGroupContent will also be modified and the PV will be added to the PVList in the Status. @@ -173,7 +197,7 @@ The same PVC can belong to different groups, i.e., different types of groups or #### Modify VolumeGroup with existing PVCs -We can add an existing PVC to the group or remove a PVC from the group without deleting it. A VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME capability will be added to CSI Spec. Only CSI drivers supporting both VOLUME_GROUP and VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME capabilities can support the volume group this way. +We can add an existing PVC to the group or remove a PVC from the group without deleting it. A VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME capability will be added to CSI Spec. Only CSI drivers supporting both CREATE_DELETE_VOLUME_GROUP and VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME capabilities can support the volume group this way. * Admin creates a VolumeGroupClass, with the SupportVolumeGroupSnapshot boolean flag set to true. * User creates a new empty VolumeGroup, specifying the above VolumeGroupClass. A new empty VolumeGroupContent will also be created and bound to the VolumeGroup. @@ -203,7 +227,7 @@ The VolumeGroup controller will retrieve all volumeHandles in the VolumeGroup fr ### Create VolumeGroupSnapshot -A VolumeGroupSnapshot can be created with a VolumeGroup as the source if the CSI driver supports the GROUP_SNAPSHOT capability. +A VolumeGroupSnapshot can be created with a VolumeGroup as the source if the CSI driver supports the CREATE_DELETE_GROUP_SNAPSHOT capability. #### Dynamic provisioning @@ -245,7 +269,7 @@ The VolumeGroupSnapshot controller will retrieve all volumeSnapshotHandles in th ### Delete VolumeGroupSnapshot -A VolumeGroupSnapshot can be deleted if the CSI driver supports the GROUP_SNAPSHOT capability. +A VolumeGroupSnapshot can be deleted if the CSI driver supports the CREATE_DELETE_GROUP_SNAPSHOT capability. * When a VolumeGroupSnapshot is deleted, the VolumeGroupSnapshot controller will call the DeleteVolumeGroupSnapshot CSI function as well as DeleteSnapshot CSI functions. * Since CSI driver handles individual snapshot creation in CreateVolumeGroupSnapshot, it should handle individual snapshot deletion in DeleteVolumeGroupSnapshot. * DeleteSnapshot on a single snapshot that belongs to a group snapshot is not allowed. @@ -322,7 +346,7 @@ Type VolumeGroupSpec struct { // Phase 2 // +optional - VolumeGroupSource *VolumeGroupSource + // VolumeGroupSource *VolumeGroupSource } // Phase 2: VolumeGroupSource will be in Phase 2 @@ -732,16 +756,16 @@ Snapshot controller will be modified to update VolumeSnapshot status. External s #### CSI Capabilities -New controller capabilities VOLUME_GROUP, VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME, GROUP_SNAPSHOT, INDIVIDUAL_SNAPSHOT_RESTORE, GET_VOLUME_GROUP, GET_VOLUME_GROUP_SNAPSHOT, LIST_VOLUME_GROUPS, LIST_VOLUME_GROUP_SNAPSHOTS will be added. +New controller capabilities CREATE_DELETE_VOLUME_GROUP, VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME, CREATE_DELETE_GROUP_SNAPSHOT, INDIVIDUAL_SNAPSHOT_RESTORE, GET_VOLUME_GROUP, GET_VOLUME_GROUP_SNAPSHOT, LIST_VOLUME_GROUPS, LIST_VOLUME_GROUP_SNAPSHOTS will be added. -* VOLUME_GROUP: +* CREATE_DELETE_VOLUME_GROUP: Indicates that the controller plugin supports creating and deleting a volume group. * VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME: Indicates that the controller plugin supports adding an existing volume to a volume group and removing a volume from a volume group without deleting it. -* GROUP_SNAPSHOT: +* CREATE_DELETE_GROUP_SNAPSHOT: Indicates that the controller plugin supports creating a snapshot of all volumes in a volume group. @@ -958,7 +982,7 @@ message ModifyVolumeGroupRequest { // If no volume_ids are provided, all existing volumes will // be removed from the group. // This field is OPTIONAL. - repeated string volume_ids = 3; + repeated string volume_ids = 2; } message ModifyVolumeGroupResponse { @@ -1443,3 +1467,214 @@ spec: ``` If both placement group and volume group with groupSnapshot support are defined, it is possible for the same volume to join both groups. For example, a volume group with groupSnapshot support may include volume members from two placement groups as they belong to the same application. + +## Graduation Criteria +### Alpha +* Initial feature implementation, including: + * Volume group. + * Volume group snapshot. +* Sample implementation in the csi-driver-host-path. +* Add basic unit tests. + +### Alpha -> Beta +* Unit tests and e2e tests outlined in design proposal implemented. + +### Beta -> GA +* Volume group and group snapshot support is added to multiple CSI drivers. +* Volume group and group snapshot feature deployed in production and have gone through at least one K8s upgrade. + +## Test Plan +### Unit tests +* Unit tests for external volume group and group snapshot controller. +* Unit tests for modified code path of external-provisioner and external-snapshotter. + +### E2E tests +* e2e tests for external volume group and group snapshot controller. +* e2e tests for modified code path of external-provisioner and external-snapshotter. +* Add stress and scale tests before moving from beta to GA. + +## Production Readiness Review Questionnaire + +### Feature enablement and rollback + +_This section must be completed when targeting alpha to a release._ + +* **How can this feature be enabled / disabled in a live cluster?** + - [x] Other + - Describe the mechanism: + The external volume group and group snapshot controllers do not have a + feature gate because they are out of tree. + It is enabled when these external controller sidecars are deployed with the CSI driver. + There are proposed changes in PersistentVolumeClaim and PersistentVolume core API objects. These changes need to be controlled by a feature gate. + - Will enabling / disabling the feature require downtime of the control + plane? + From the controller side, it only affects the external controller sidecars. + For the changes in PVC and PV, enabling / disabling the feature does require downtime of the control plane. + - Will enabling / disabling the feature require downtime or reprovisioning + of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled). + No. + +* **Does enabling the feature change any default behavior?** + Yes. Enabling the feature can allow a new PVC to be created and added to a VolumeGroup. Enabling the feature can also allow a VolumeSnapshot to be created as part of the VolumeSnapshotGroup. + +* **Can the feature be disabled once it has been enabled (i.e. can we rollback + the enablement)?** + Yes. All VolumeGroup and VolumeGroupSnapshot API objects need to be deleted before this feature can be truly disabled. + +* **What happens if we reenable the feature if it was previously rolled back?** + We will be able to create new VolumeGroup and VolumeGroupSnapshot API objects again. + +* **Are there any tests for feature enablement/disablement?** + Unit tests will be added for the in-tree feature enable/disablement. + Since there is no feature gate for this feature on the external controller side and the only way to + enable or disable this feature is to install or unistall the sidecar, we cannot write + tests for feature enablement/disablement. + +### Rollout, Upgrade and Rollback Planning + +_This section must be completed when targeting beta graduation to a release._ + +* **How can a rollout fail? Can it impact already running workloads?** + Try to be as paranoid as possible - e.g., what if some components will restart + mid-rollout? + +* **What specific metrics should inform a rollback?** + +* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?** + +* **Is the rollout accompanied by any deprecations and/or removals of features, APIs, +fields of API types, flags, etc.?** + Even if applying deprecation policies, they may still surprise some users. + +### Monitoring Requirements + +_This section must be completed when targeting beta graduation to a release._ + +* **How can an operator determine if the feature is in use by workloads?** + Ideally, this should be a metric. Operations against the Kubernetes API (e.g., + checking if there are objects with field X set) may be a last resort. Avoid + logs or events for this purpose. + +* **What are the SLIs (Service Level Indicators) an operator can use to determine +the health of the service?** + - [ ] Metrics + - Metric name: + - [Optional] Aggregation method: + - Components exposing the metric: + - [ ] Other (treat as last resort) + - Details: + +* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?** + + +* **Are there any missing metrics that would be useful to have to improve observability +of this feature?** + + +### Dependencies + +_This section must be completed when targeting beta graduation to a release._ + +* **Does this feature depend on any specific services running in the cluster?** + Think about both cluster-level services (e.g. metrics-server) as well + as node-level agents (e.g. specific version of CRI). Focus on external or + optional services that are needed. For example, if this feature depends on + a cloud provider API, or upon an external software-defined storage or network + control plane. + + For each of these, fill in the following—thinking about running existing user workloads + and creating new ones, as well as about cluster-level services (e.g. DNS): + - [Dependency name]: + - Usage description: + - Impact of its outage on the feature: + - Impact of its degraded performance or high-error rates on the feature: + +### Scalability + +_For alpha, this section is encouraged: reviewers should consider these questions +and attempt to answer them._ + +_For beta, this section is required: reviewers must answer these questions._ + +_For GA, this section is required: approvers should be able to confirm the +previous answers based on experience in the field._ + +* **Will enabling / using this feature result in any new API calls?** + Describe them, providing: + - API call type (e.g. PATCH pods): new APIs VolumeGroup, VolumeGroupContent, VolumeGroupClass, VolumeGroupSnapshot, VolumeGroupSnapshotContent, VolumeGroupSnapshotClass + - estimated throughput + - originating component(s) (e.g. Kubelet, Feature-X-controller) + focusing mostly on: + - components listing and/or watching resources they didn't before + - API calls that may be triggered by changes of some Kubernetes resources + (e.g. update of object X triggers new updates of object Y) + +* **Will enabling / using this feature result in introducing new API types?** + Describe them, providing: + - API type: + - Supported number of objects per cluster: + - Supported number of objects per namespace (for namespace-scoped objects): + +* **Will enabling / using this feature result in any new calls to the cloud +provider?** + +* **Will enabling / using this feature result in increasing size or count of +the existing API objects?** + Describe them, providing: + - API type(s): + - Estimated increase in size: (e.g., new annotation of size 32B): + - Estimated amount of new objects: (e.g., new Object X for every existing Pod) + +* **Will enabling / using this feature result in increasing time taken by any +operations covered by [existing SLIs/SLOs]?** + Think about adding additional work or introducing new steps in between + (e.g. need to do X to start a container), etc. Please describe the details. + +* **Will enabling / using this feature result in non-negligible increase of +resource usage (CPU, RAM, disk, IO, ...) in any components?** + Things to keep in mind include: additional in-memory state, additional + non-trivial computations, excessive access to disks (including increased log + volume), significant amount of data sent and/or received over network, etc. + This through this both in small and large cases, again with respect to the + [supported limits]. + +### Troubleshooting + +The Troubleshooting section currently serves the `Playbook` role. We may consider +splitting it into a dedicated `Playbook` document (potentially with some monitoring +details). For now, we leave it here. + +_This section must be completed when targeting beta graduation to a release._ + +* **How does this feature react if the API server and/or etcd is unavailable?** + +* **What are other known failure modes?** + For each of them, fill in the following information by copying the below template: + - [Failure mode brief description] + - Detection: How can it be detected via metrics? Stated another way: + how can an operator troubleshoot without logging into a master or worker node? + - Mitigations: What can be done to stop the bleeding, especially for already + running user workloads? + - Diagnostics: What are the useful log messages and their required logging + levels that could help debug the issue? + Not required until feature graduated to beta. + + - Testing: Are there any tests for failure mode? If not, describe why. + +* **What steps should be taken if SLOs are not being met to determine the problem?** + +[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md +[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos + +## Implementation History diff --git a/keps/sig-storage/3476-volume-group/kep.yaml b/keps/sig-storage/3476-volume-group/kep.yaml new file mode 100644 index 00000000000..122f4eb50d7 --- /dev/null +++ b/keps/sig-storage/3476-volume-group/kep.yaml @@ -0,0 +1,35 @@ +title: Volume Group and Group Snapshot +kep-number: 3476 +authors: + - "@xing-yang" + - "@jingxu97" +owning-sig: sig-storage +participating-sigs: +status: implementable +creation-date: 2020-02-12 +reviewers: + - "@msau42" + - "@saad-ali" + - "@thockin" +approvers: + - "@msau42" + - "@saad-ali" + - "@thockin" +see-also: +replaces: + +latest-milestone: "v1.26" +stage: "alpha" +milestone: + alpha: "v1.26" + beta: "v1.27" + stable: "v1.28" + +feature-gates: + - name: VolumeGroup + components: + - kube-controller-manager + - kube-apiserver +disable-supported: true + +metrics: From 4df8fd6b1b88dfea69711bb56c95b5a9f34067dc Mon Sep 17 00:00:00 2001 From: xing-yang Date: Tue, 20 Sep 2022 21:50:16 -0400 Subject: [PATCH 04/19] Remove VolumeGroup name from PVC spec --- keps/sig-storage/3476-volume-group/README.md | 294 +++++++------------ keps/sig-storage/3476-volume-group/kep.yaml | 4 - 2 files changed, 101 insertions(+), 197 deletions(-) diff --git a/keps/sig-storage/3476-volume-group/README.md b/keps/sig-storage/3476-volume-group/README.md index b5bdfcfa56d..316e45f53c7 100644 --- a/keps/sig-storage/3476-volume-group/README.md +++ b/keps/sig-storage/3476-volume-group/README.md @@ -6,6 +6,8 @@ - [Release Signoff Checklist](#release-signoff-checklist) - [Summary](#summary) - [Motivation](#motivation) + - [Use cases for this KEP](#use-cases-for-this-kep) + - [Future use cases](#future-use-cases) - [Goals](#goals) - [Non Goals](#non-goals) - [Proposal for VolumeGroup and VolumeGroupSnapshot](#proposal-for-volumegroup-and-volumegroupsnapshot) @@ -14,7 +16,7 @@ - [Create and Modify VolumeGroup](#create-and-modify-volumegroup) - [Create new PVC and add to the VolumeGroup](#create-new-pvc-and-add-to-the-volumegroup) - [Modify VolumeGroup with existing PVCs](#modify-volumegroup-with-existing-pvcs) - - [Phase 2: Create VolumeGroup from VolumeGroupSnapshot](#phase-2-create-volumegroup-from-volumegroupsnapshot) + - [Phase 2: Create VolumeGroup from VolumeGroupSnapshot or another VolumeGroup](#phase-2-create-volumegroup-from-volumegroupsnapshot-or-another-volumegroup) - [Pre-provisioned VolumeGroup](#pre-provisioned-volumegroup) - [Create VolumeGroupSnapshot](#create-volumegroupsnapshot) - [Dynamic provisioning](#dynamic-provisioning) @@ -50,11 +52,7 @@ - [ListVolumeGroupSnapshots](#listvolumegroupsnapshots) - [Alternatives](#alternatives) - [Immutable VolumeGroup](#immutable-volumegroup) -- [Proposal for Volume Placement](#proposal-for-volume-placement) - - [API Changes](#api-changes) - [ModifyVolume](#modifyvolume) - - [Create VolumeGroup with Selector](#create-volumegroup-with-selector) - - [Example Yaml Files for Volume Placement](#example-yaml-files-for-volume-placement) - [Graduation Criteria](#graduation-criteria) - [Alpha](#alpha) - [Alpha -> Beta](#alpha---beta) @@ -118,32 +116,28 @@ This proposal is to introduce a VolumeGroup API to manage multiple volumes toget While there is already a KEP (https://github.com/kubernetes/enhancements/pull/1051) that introduces APIs to do application snapshot, backup, and restore, there are other use cases not covered by that KEP. -Use case 1: -A VolumeGroup allows users to manage multiple volumes belonging to the same application together and therefore it is very useful in general. For example, it can be used to group all volumes in the same StatefulSet together. +### Use cases for this KEP -Use case 2: -For some storage systems, volumes are always managed in a group. For these storage systems, they will have to create a group for a single volume if they need to implement a create volume function in Kubernetes. Providing a VolumeGroup API will be very convenient for them. +* A VolumeGroup allows users to manage multiple volumes belonging to the same application together and therefore it is very useful in general. For example, it can be used to group all volumes in the same StatefulSet together and we can take a group snapshot of all the volumes in this StatefulSet. -Use case 3: -Instead of taking individual snapshots one after another, VolumeGroup can be used as a source for taking a snapshot of all the volumes in the same volume group. This may be a storage level consistent group snapshot if the storage system supports it. In any case, when used together with quiesce hooks, this group snapshot can be application consistent. For this use case, we will introduce another CRD VolumeGroupSnapshot. +* For some storage systems, volumes are always managed in a group. For these storage systems, they will have to create a group for a single volume if they need to implement a create volume function in Kubernetes. Volume snapshotting, cloning, expansion, and deletion, etc. are all performed at a group level. Providing a VolumeGroup API will be very convenient for them. -Use case 4: -VolumeGroup can be used to manage group replication or consistency group replication if the storage system supports it. Note replication is out of scope for this proposal. It is mentioned here as a potential future use case. +* Instead of taking individual snapshots one after another, VolumeGroup can be used as a source for taking a snapshot of all the volumes in the same volume group. This may be a storage level consistent group snapshot if the storage system supports it. In any case, when used together with quiesce hooks, this group snapshot can be application consistent. For this use case, we will introduce another CRD VolumeGroupSnapshot. -Use case 5: -VolumeGroup can be used to manage volume placement to either spread the volumes across storage pools or stack the volumes on the same storage pool. Related KEPs proposing the concept of storage pool for volume placement is as follows: - https://github.com/kubernetes/enhancements/pull/1353 - https://github.com/kubernetes/enhancements/pull/1347 -We may not really need a VolumeGroup for this use case. A StoragePool is probably enough. This is to be determined. +* VolumeGroup can also be used together with application snapshot. It can be a resource managed by the ApplicationSnapshot CRD. -Use case 6: -VolumeGroup can also be used together with application snapshot. It can be a resource managed by the ApplicationSnapshot CRD. +* Some applications may not want to use ApplicationSnapshot CRD because they don’t use Kubernetes workload APIs such as StatefulSet, Deployment, etc. Instead, they have developed their own operators. In this case it is more convenient to use VolumeGroup to manage persistent volumes used in those applications. -Use case 7: -Some applications may not want to use ApplicationSnapshot CRD because they don’t use Kubernetes workload APIs such as StatefulSet, Deployment, etc. Instead, they have developed their own operators. In this case it is more convenient to use VolumeGroup to manage persistent volumes used in those applications. +* Application quiesce is time consuming. Some users may not want to do application quiesce very frequently for that reason. For example, a user may want to run weekly backups with application quiesce and nightly backups without application quiesce but with consistency group support which provides crash consistency across all volumes in the group. -Use case 8: -Application quiesce is time consuming. Some users may not want to do application quiesce very frequently for that reason. For example, a user may want to run weekly backups with application quiesce and nightly backups without application quiesce but with consistency group support which provides crash consistency across all volumes in the group. +### Future use cases + +* VolumeGroup can be used to manage group replication or consistency group replication if the storage system supports it. Note replication is out of scope for this proposal. It is mentioned here as a potential future use case. + +* VolumeGroup can be used to manage volume placement to either spread the volumes across storage pools or stack the volumes on the same storage pool. Related KEPs proposing the concept of storage pool for volume placement is as follows: + https://github.com/kubernetes/enhancements/pull/1353 + https://github.com/kubernetes/enhancements/pull/1347 +We may not really need a VolumeGroup for this use case. A StoragePool is probably enough. This is to be determined. ### Goals @@ -169,7 +163,8 @@ Phase 1 (Note: only Phase 1 will be covered in this KEP which is targeting Alpha 2. Create an empty group first, then add an existing PVC to the group one by one. A CSI driver supporting VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME MUST implement this feature. Phase 2 (After v1.26): -1. Create a new volume group from an existing group snapshot in one step. Design details will be added in a future KEP. +1. Create a new volume group by querying a label on existing persistent volume claims and adding them to the volume group. +2. Create a new volume group from an existing group snapshot or another volume group in one step. Design details will be added in a future KEP. 2. Non-goal: Create a new empty group and in the same time create new empty PVCs and add to the new group. ### Modify VolumeGroup @@ -193,7 +188,7 @@ Only CSI drivers supporting CREATE_DELETE_VOLUME_GROUP capability can support th When a new PVC is created with the existing VolumeGroup name, the VolumeGroup will be modified and the PVC will be added to PVCList in the Status, and the VolumeGroupContent will also be modified and the PV will be added to the PVList in the Status. -The same PVC can belong to different groups, i.e., different types of groups or different groups of the same type, if the storage system supports it. Storage system will decide whether to support this or not. We don't prevent it in the API or controller directly. +The same PVC can belong to different groups, i.e., different types of groups or different groups of the same type, if the storage system supports it. Storage system will decide whether to support this or not. If it does not support it, an INVALID_ARGUMENT error code should be returned with a message explaining why. We don't prevent it in the API or controller directly. #### Modify VolumeGroup with existing PVCs @@ -201,8 +196,8 @@ We can add an existing PVC to the group or remove a PVC from the group without d * Admin creates a VolumeGroupClass, with the SupportVolumeGroupSnapshot boolean flag set to true. * User creates a new empty VolumeGroup, specifying the above VolumeGroupClass. A new empty VolumeGroupContent will also be created and bound to the VolumeGroup. -* Add an existing PVC to an existing VolumeGroup (VolumeGroup can be empty to start with or it can have other PVCs already) by adding VolumeGroup name to the PVC Spec. - * The VolumeGroup name is added by user to each PVC Spec, not by the VolumeGroup controller. The VolumeGroup controller watches PVCs and reacts to the PVC updated with a VolumeGroup name event as described in the following step. +* Add an existing PVC to an existing VolumeGroup (VolumeGroup can be empty to start with or it can have other PVCs already) by adding VolumeGroup name as a label to the PVC. + * The VolumeGroup name is added by user to each PVC, not by the VolumeGroup controller. The VolumeGroup controller watches PVCs and reacts to the PVC updated with a VolumeGroup name event as described in the following step. * VolumeGroup is modified so the existing PVC is added to the PVCList in the Status. VolumeGroupContent is also modified so the PV is added to the PVList in the Status. * Note: The VolumeGroup controller will be implemented to have a desired state of the world and an actual state of the world. The desired state of the world @@ -215,15 +210,15 @@ We can add an existing PVC to the group or remove a PVC from the group without d * If one volume fails to be added, it should not affect it if it is used by a pod, but there will be error messages. * Removing a PVC from a VolumeGroup will trigger the external-provisioner and the VolumeGroup controller as well. -#### Phase 2: Create VolumeGroup from VolumeGroupSnapshot +#### Phase 2: Create VolumeGroup from VolumeGroupSnapshot or another VolumeGroup This is in Phase 2 so won't be discussed in detail here. Creating a new volume group from an existing group snapshot will be supported in Phase 2 if the CSI driver supports VOLUME_GROUP_FROM_GROUP_SNAPSHOT capability. As a result, PVCs will be created from source snapshots and placed in a new volume group. #### Pre-provisioned VolumeGroup -Admin can create a VolumeGroupContent, specifying an existing VolumeGroupHandle in the storage system and specifying a VolumeGroup name and namespace. Then create a VolumeGroup that points to the VolumeGroupContent name. +Admin can create a VolumeGroupContent, specifying an existing VolumeGroupHandle in the storage system and specifying a VolumeGroup name and namespace. Then the user creates a VolumeGroup that points to the VolumeGroupContent name. -The VolumeGroup controller will retrieve all volumeHandles in the VolumeGroup from the CSI driver, create PVs pointing to the volumeHandles, and create PVCs pointing to the PVs. +Admin will retrieve all volumeHandles in the VolumeGroup from the storage system, create PVs pointing to the volumeHandles. Then the user creates PVCs pointing to the PVs. ### Create VolumeGroupSnapshot @@ -263,9 +258,9 @@ status: #### Pre-provisioned VolumeGroupSnapshot -Admin can create a VolumeGroupSnapshotContent, specifying an existing VolumeGroupSnapshotHandle in the storage system and specifying a VolumeGroupSnapshot name and namespace. Then create a VolumeGroupSnapshot that points to the VolumeGroupSnapshotContent name. +Admin can create a VolumeGroupSnapshotContent, specifying an existing VolumeGroupSnapshotHandle in the storage system and specifying a VolumeGroupSnapshot name and namespace. Then the user creates a VolumeGroupSnapshot that points to the VolumeGroupSnapshotContent name. -The VolumeGroupSnapshot controller will retrieve all volumeSnapshotHandles in the Volume Group Snapshot from the CSI driver, create VolumeSnapshotContents pointing to the volumeSnapshotHandles, and create VolumeSnapshots pointing to the VolumeSnapshotContents. +Admin will retrieve all volumeSnapshotHandles in the Volume Group Snapshot from the storage system, create VolumeSnapshotContents pointing to the volumeSnapshotHandles. Then the user can create VolumeSnapshots pointing to the VolumeSnapshotContents. ### Delete VolumeGroupSnapshot @@ -284,7 +279,7 @@ Phase 1: Phase 2: -* A VolumeGroup can be created from a VolumeGroupSnapshot source in one step. This is the same as what is described in the section `Create VolumeGroup from VolumeGroupSnapshot`. +* A VolumeGroup can be created from a VolumeGroupSnapshot or VolumeGroup source in one step. This is the same as what is described in the section `Create VolumeGroup from VolumeGroupSnapshot or another VolumeGroup`. ### API Definitions @@ -309,11 +304,28 @@ type VolumeGroupClass struct { // +optional Parameters map[string]string + // +optional + VolumeGroupDeletionPolicy *VolumeGroupDeletionPolicy + // This field specifies whether group snapshot is supported. // The default is false. // +optional SupportVolumeGroupSnapshot *bool } + +// VolumeGroupDeletionPolicy describes a policy for end-of-life maintenance of +// volume group contents +type VolumeGroupDeletionPolicy string + +const ( + // VolumeGroupContentDelete means the group will be deleted from the + // underlying storage system on release from its volume group. + VolumeGroupContentDelete VolumeGroupDeletionPolicy = "Delete" + + // VolumeGroupContentRetain means the group will be left in its current + // state on release from its volume group. + VolumeGroupContentRetain VolumeGroupDeletionPolicy = "Retain" +) ``` #### VolumeGroup @@ -350,19 +362,15 @@ Type VolumeGroupSpec struct { } // Phase 2: VolumeGroupSource will be in Phase 2 -// VolumeGroupSource contains 3 options. If VolumeGroupSource is not nil, -// one of the 3 options must be defined. +// VolumeGroupSource contains 2 options. If VolumeGroupSource is not nil, +// one of the 2 options must be defined. Type VolumeGroupSource struct { - // A list of existing persistent volume claims - // +optional - PVCList []PersistentVolumeClaim - // A label query over existing persistent volume claims to be added to the volume group. // +optional Selector *metav1.LabelSelector // This field specifies the source of a volume group. (this is for restore) - // Supported Kind is VolumeGroupSnapshot + // Supported Kind is VolumeGroupSnapshot or VolumeGroup // +optional GroupDataSource *TypedLocalObjectReference } @@ -426,6 +434,19 @@ Type VolumeGroupContentSpec struct { // +optional VolumeGroupDeletionPolicy *VolumeGroupDeletionPolicy + + // This field specifies whether group snapshot is supported. + // The default is false. + // +optional + SupportVolumeGroupSnapshot *bool + + // VolumeGroupSecretRef is a reference to the secret object containing + // sensitive information to pass to the CSI driver to complete the CSI + // calls for VolumeGroups. + // This field is optional, and may be empty if no secret is required. If the + // secret object contains more than one secret, all secrets are passed. + // +optional + VolumeGroupSecretRef *SecretReference } // VolumeGroupContentSource @@ -478,7 +499,25 @@ type VolumeGroupSnapshotClass struct { // to the driver. // +optional Parameters map[string]string + + // +optional + VolumeGroupSnapshotDeletionPolicy *VolumeGroupSnapshotDeletionPolicy } + +// VolumeGroupSnapshotDeletionPolicy describes a policy for end-of-life maintenance of +// volume group snapshot contents +type VolumeGroupSnapshotDeletionPolicy string + +const ( + // VolumeGroupSnapshotContentDelete means the group snapshot will be deleted from the + // underlying storage system on release from its volume group snapshot. + VolumeGroupSnapshotContentDelete VolumeGroupSnapshotDeletionPolicy = "Delete" + + // VolumeGroupSnapshotContentRetain means the group snapshot will be left in its current + // state on release from its volume group snapshot. + VolumeGroupSnapshotContentRetain VolumeGroupSnapshotDeletionPolicy = "Retain" +) + ``` #### VolumeGroupSnapshot @@ -507,6 +546,14 @@ type VolumeGroupSnapshotSpec struct { // Source has the information about where the group snapshot is created from. // Required. Source VolumeGroupSnapshotSource + + // VolumeGroupSnapshotSecretRef is a reference to the secret object containing + // sensitive information to pass to the CSI driver to complete the CSI + // calls for VolumeGroupSnapshots. + // This field is optional, and may be empty if no secret is required. If the + // secret object contains more than one secret, all secrets are passed. + // +optional + VolumeGroupSnapshotSecretRef *SecretReference } // OneOf VolumeGroupName or VolumeGroupSnapshotContentName @@ -577,7 +624,7 @@ type VolumeGroupSnapshotContentSpec struct { VolumeGroupSnapshotRef core_v1.ObjectReference // Required - DeletionPolicy DeletionPolicy + VolumeGroupSnapshotDeletionPolicy VolumeGroupSnapshotDeletionPolicy // Required Driver string @@ -626,30 +673,7 @@ Type VolumeGroupSnapshotContentStatus struct { #### PersistentVolumeClaim and PersistentVolume -For PersistentVolumeClaim, the user can request it to be added to be VolumeGroup. So VolumeGorupNames will be in both Spec and Status. - -``` -type PersistentVolumeClaimSpec struct { - ...... - // +optional - VolumeGroupNames []string - ...... -} - -type PersistentVolumeClaimStatus struct { - ...... - // +optional - VolumeGroupNames []string - ...... -} - -type PersistentVolumeStatus struct { - ...... - // +optional - VolumeGroupContentNames []string - ...... -} -``` +For PersistentVolumeClaim, the user can request it to be added to a VolumeGroup by adding a label with the VolumeGroup name, i.e., volumegroup.storage.k8s.io/volumegroup:volumeGroup1. In the initial phase, no changes will be proposed to PersistentVolumeClaim and PersistentVolume API objects. Before moving to Beta, we will re-evaluate this. #### VolumeSnapshot and VolumeSnapshotContent @@ -707,6 +731,8 @@ apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pvc1 + labels: + volumegroup.storage.k8s.io/volumegroup:volumeGroup1 spec: accessModes: - ReadWriteOnce @@ -853,8 +879,8 @@ CSI Plugins MAY create the following types of volume groups: * At restore time, create a single volume from individual snapshot and then join an existing group. * Create an empty group. * Create a volume from snapshot, specifying the group name in the volume. -* Phase 2: Create a new volume group from a source group snapshot. -* Phase 2: Create a new volume group and add a list of existing volumes to the group. +* Phase 2: Create a new volume group from a source group snapshot or another group. +* Phase 2: Create a new volume group and add a list of existing volumes to the group by querying a label on PVCs. The following is non-goal: * Non goal: Create a new group and at the same time create a list of new volumes in the group. @@ -983,6 +1009,11 @@ message ModifyVolumeGroupRequest { // be removed from the group. // This field is OPTIONAL. repeated string volume_ids = 2; + + // Secrets required by plugin to complete volume group modification request. + // This field is OPTIONAL. Refer to the `Secrets Requirements` + // section on how to use this field. + map secrets = 3 [(csi_secret) = true]; } message ModifyVolumeGroupResponse { @@ -1231,7 +1262,7 @@ message ListVolumeGroupSnapshotsResponse { #### Immutable VolumeGroup -During the design discussions, an immutable VolumeGroup was considered but was removed because this would add lots of complexity to the design without much gain. +During the design discussions, an immutable VolumeGroup was considered but was removed because this would add lots of complexity to the design without much gain. It would also make it impossible to support the current way PVCs are provisioned in a Statefulset. Immutable VolumeGroup - PVCList or PVC Selector in the ImmutableSource field in the Spec (optional field); PVCList is in the Status. * Create a new VolumeGroup with existing PVCs by PVCList or PVC Selector in the Spec. The PVCList will be in the VolumeGroup Status as well. @@ -1311,43 +1342,6 @@ VOLUMEGROUP_IMMUTABLE and VOLUMEGROUP_MUTABLE capability will be added to the CS If VOLUMEGROUP_IMMUTABLE is supported, a VolumeGroup with an ImmutableSource can be created. Mutable will be false, PVCList will be set, and Ready will be true in the Status. Otherwise, a VolumeGroup with an ImmutableSource will not be created successfully. -## Proposal for Volume Placement - -### API Changes - -In order to support Volume Placement, An `AllowedTopologies` field will be added to the VolumeGroupClass API: - -``` -type VolumeGroupClass struct { - metav1.TypeMeta - // +optional - metav1.ObjectMeta - - // Driver is the driver expected to handle this VolumeGroupClass. - // This value may not be empty. - Driver string - - // Parameters holds parameters for driver. - // These values are opaque to the system and are passed directly - // to the driver. - // +optional - Parameters map[string]string - - // This field specifies whether group snapshot is supported. - // The default is false. - // +optional - VolumeGroupSnapshot *bool - - // Restrict the topologies where a group of volumes can be located. - // Each driver defines its own supported topology specifications. - // An empty TopologySelectorTerm list means there is no topology restriction. - // This field is passed on to the drivers to handle placement of a group of - // volumes on storage pools. - // +optional - AllowedTopologies []api.TopologySelectorTerm -} -``` - #### ModifyVolume ModifyVolume CSI RPC was considered earlier to add/remove one volume to/from a group at a time but it was removed because ModifyVolumeGroup CSI RPC was added. @@ -1383,97 +1377,13 @@ message ModifyVolumeRequest { ``` External-provisioner will be modified so that modifying PVC by adding VolumeGroupName will trigger a ModifyVolume call (a new CSI controller RPC) to CSI driver. -#### Create VolumeGroup with Selector - -Create VolumeGroup with Selector is an option discussed but moved to alternatives section. The suggestion is to add a new CRD and controller to select labeled PVCs. Whether this controller can only add new PVC or can also modify existing PVC will be decided later. - -Creating a new volume group and adding existing PVCs matching the label selector to the group is supported if the CSI driver supports VOLUMEGROUP capability. - -CSI drivers that do not have a volume_group_id concept can use the VolumeGroup name stored in Kubernetes API server as the volume_group_id. - -// VolumeGroupSpec describes the common attributes of group storage devices -// and allows a Source for provider-specific attributes -Type VolumeGroupSpec struct { - // +optional - VolumeGroupClassName *string - - // If InitSource is nil, an empty volume group will be created. - // Otherwise, a volume group will be created with PVCs. - // If Selector is set in InitSource, existing PVCs with matching - // label will be added to the volume group. - // If SourceVolumeGroupSnapshotName is not nil, the volume group - // will be created from the source VolumeGroupSnapshot. - // This field determines what PVCs will be in the volume group - // when it is initially created. PVCs can be added to or removed - // from the volume group later if CSI driver supports - // VOLUMEGROUP_ADD_REMOVE_EXISTING_VOLUME. - // +optional - InitSource *VolumeGroupSource -} - -// VolumeGroupSource contains 2 options. If VolumeGroupSource is not nil, -// one and only one of the 2 options must be defined. -Type VolumeGroupSource struct { - // A label query over existing persistent volume claims to be added to the volume group. - // +optional - Selector *metav1.LabelSelector - - // If specified, the VolumeGroup will be created from the source - // VolumeGroupSnapshot. - // +optional - SourceVolumeGroupSnapshotName *string -} - - -### Example Yaml Files for Volume Placement - -A VolumeGroupClass that supports placement: -``` -apiVersion: volumegroup.storage.k8s.io/v1alpha1 -kind: VolumeGroupClass -metadata: - name: placementGroupClass1 -spec: - parameters: - …... - allowedTopologies: [failure-domain.example.com/placement: storagePool1] -``` -``` -apiVersion: volumegroup.storage.k8s.io/v1alpha1 -kind: VolumeGroup -metadata: - Name: placemenGroup1 -spec: - volumeGroupClassName: placementGroupClass1 -``` - -A PVC that belongs to both the volume group with groupSnapshot support and placement. -``` -apiVersion: v1 -kind: PersistentVolumeClaim -metadata: - name: pvc1 - annotations: -spec: - accessModes: - - ReadWriteOnce - dataSource: null - resources: - requests: - storage: 1Gi - storageClassName: storageClass1 - volumeMode: Filesystem - volumeGroupNames: [volumeGroup1, placementGroup1] -``` - -If both placement group and volume group with groupSnapshot support are defined, it is possible for the same volume to join both groups. For example, a volume group with groupSnapshot support may include volume members from two placement groups as they belong to the same application. - ## Graduation Criteria ### Alpha * Initial feature implementation, including: * Volume group. * Volume group snapshot. * Sample implementation in the csi-driver-host-path. +* Reviews from vendors whose storage systems can support this feature. * Add basic unit tests. ### Alpha -> Beta @@ -1505,11 +1415,9 @@ _This section must be completed when targeting alpha to a release._ The external volume group and group snapshot controllers do not have a feature gate because they are out of tree. It is enabled when these external controller sidecars are deployed with the CSI driver. - There are proposed changes in PersistentVolumeClaim and PersistentVolume core API objects. These changes need to be controlled by a feature gate. - Will enabling / disabling the feature require downtime of the control plane? From the controller side, it only affects the external controller sidecars. - For the changes in PVC and PV, enabling / disabling the feature does require downtime of the control plane. - Will enabling / disabling the feature require downtime or reprovisioning of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled). No. diff --git a/keps/sig-storage/3476-volume-group/kep.yaml b/keps/sig-storage/3476-volume-group/kep.yaml index 122f4eb50d7..2a248cf540c 100644 --- a/keps/sig-storage/3476-volume-group/kep.yaml +++ b/keps/sig-storage/3476-volume-group/kep.yaml @@ -26,10 +26,6 @@ milestone: stable: "v1.28" feature-gates: - - name: VolumeGroup - components: - - kube-controller-manager - - kube-apiserver disable-supported: true metrics: From 37805c27202aa9e833481591d811d36698ef9cdc Mon Sep 17 00:00:00 2001 From: xing-yang Date: Tue, 4 Oct 2022 19:17:06 -0400 Subject: [PATCH 05/19] Update PRR approver --- keps/prod-readiness/sig-storage/3476.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/keps/prod-readiness/sig-storage/3476.yaml b/keps/prod-readiness/sig-storage/3476.yaml index a4249a84c8f..0c405e01921 100644 --- a/keps/prod-readiness/sig-storage/3476.yaml +++ b/keps/prod-readiness/sig-storage/3476.yaml @@ -1,3 +1,3 @@ kep-number: 3476 alpha: - approver: "@wojtek-t" + approver: "@johnbelamaric" From 129a5b2f6e7cba52414093c4ef369395d069d015 Mon Sep 17 00:00:00 2001 From: xing-yang Date: Wed, 5 Oct 2022 10:12:50 -0400 Subject: [PATCH 06/19] Move the test plan under design details section --- keps/sig-storage/3476-volume-group/README.md | 125 ++++++++++++++----- 1 file changed, 92 insertions(+), 33 deletions(-) diff --git a/keps/sig-storage/3476-volume-group/README.md b/keps/sig-storage/3476-volume-group/README.md index 316e45f53c7..b1985fd3e1c 100644 --- a/keps/sig-storage/3476-volume-group/README.md +++ b/keps/sig-storage/3476-volume-group/README.md @@ -23,6 +23,20 @@ - [Pre-provisioned VolumeGroupSnapshot](#pre-provisioned-volumegroupsnapshot) - [Delete VolumeGroupSnapshot](#delete-volumegroupsnapshot) - [Restore](#restore) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [Test Plan](#test-plan) + - [Prerequisite testing updates](#prerequisite-testing-updates) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [e2e tests](#e2e-tests) + - [Graduation Criteria](#graduation-criteria) + - [Alpha](#alpha) + - [Alpha -> Beta](#alpha---beta) + - [Beta -> GA](#beta---ga) + - [Deprecation](#deprecation) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) - [API Definitions](#api-definitions) - [VolumeGroupClass](#volumegroupclass) - [VolumeGroup](#volumegroup) @@ -53,13 +67,6 @@ - [Alternatives](#alternatives) - [Immutable VolumeGroup](#immutable-volumegroup) - [ModifyVolume](#modifyvolume) -- [Graduation Criteria](#graduation-criteria) - - [Alpha](#alpha) - - [Alpha -> Beta](#alpha---beta) - - [Beta -> GA](#beta---ga) -- [Test Plan](#test-plan) - - [Unit tests](#unit-tests) - - [E2E tests](#e2e-tests) - [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) - [Feature enablement and rollback](#feature-enablement-and-rollback) - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) @@ -281,6 +288,84 @@ Phase 2: * A VolumeGroup can be created from a VolumeGroupSnapshot or VolumeGroup source in one step. This is the same as what is described in the section `Create VolumeGroup from VolumeGroupSnapshot or another VolumeGroup`. +### Risks and Mitigations + +This feature requires coordination between several controllers including the newly proposed volume group and group snapshot controller and existing external-provisioner and external-snapshotter components. We will introduce this feature as alpha and add tests to make sure it works properly. + +## Design Details + +### Test Plan + +##### Prerequisite testing updates +N/A + +##### Unit tests +* Unit tests for external volume group and group snapshot controller. +* Unit tests for modified code path of external-provisioner and external-snapshotter. + +##### Integration tests +Integration tests are not needed. + +##### e2e tests +* e2e tests for external volume group and group snapshot controller. +* e2e tests for modified code path of external-provisioner and external-snapshotter. +* Add stress and scale tests before moving from beta to GA. + +### Graduation Criteria +#### Alpha +* Initial feature implementation, including: + * Volume group. + * Volume group snapshot. +* Sample implementation in the csi-driver-host-path. +* Reviews from vendors whose storage systems can support this feature. +* Add basic unit tests. + +#### Alpha -> Beta +* Unit tests and e2e tests outlined in design proposal implemented. + +#### Beta -> GA +* Volume group and group snapshot support is added to multiple CSI drivers. +* Volume group and group snapshot feature deployed in production and have gone through at least one K8s upgrade. + +#### Deprecation + +No deprecation plan. + +### Upgrade / Downgrade Strategy + + +External controllers handling volume group and group snapshot are additional sidecars deployed with the CSI driver. External-snapshotter and external-provisioner components will be updated to use the newer version that supports this feature. Upgrade should be fine as long as all the components are updated accordingly. Before downgrade, newly created volume groups and group snapshots which depend on the new CRDs should be deleted. + +### Version Skew Strategy + + +The enhancement only affects the control plane but there are multiple components involved. If the controllers are updated to support this feature but the CSI driver itself does not support it, the `Ready` status of a new VolumeGroup API object will stay `false`. ### API Definitions @@ -1377,32 +1462,6 @@ message ModifyVolumeRequest { ``` External-provisioner will be modified so that modifying PVC by adding VolumeGroupName will trigger a ModifyVolume call (a new CSI controller RPC) to CSI driver. -## Graduation Criteria -### Alpha -* Initial feature implementation, including: - * Volume group. - * Volume group snapshot. -* Sample implementation in the csi-driver-host-path. -* Reviews from vendors whose storage systems can support this feature. -* Add basic unit tests. - -### Alpha -> Beta -* Unit tests and e2e tests outlined in design proposal implemented. - -### Beta -> GA -* Volume group and group snapshot support is added to multiple CSI drivers. -* Volume group and group snapshot feature deployed in production and have gone through at least one K8s upgrade. - -## Test Plan -### Unit tests -* Unit tests for external volume group and group snapshot controller. -* Unit tests for modified code path of external-provisioner and external-snapshotter. - -### E2E tests -* e2e tests for external volume group and group snapshot controller. -* e2e tests for modified code path of external-provisioner and external-snapshotter. -* Add stress and scale tests before moving from beta to GA. - ## Production Readiness Review Questionnaire ### Feature enablement and rollback From 4c82f94321c0013b625fc2248cd306b3a2663399 Mon Sep 17 00:00:00 2001 From: xing-yang Date: Wed, 5 Oct 2022 20:07:32 -0400 Subject: [PATCH 07/19] Add KEP owner's acknowledgment --- keps/sig-storage/3476-volume-group/README.md | 254 ++++++++++--------- 1 file changed, 131 insertions(+), 123 deletions(-) diff --git a/keps/sig-storage/3476-volume-group/README.md b/keps/sig-storage/3476-volume-group/README.md index b1985fd3e1c..bdbd804a560 100644 --- a/keps/sig-storage/3476-volume-group/README.md +++ b/keps/sig-storage/3476-volume-group/README.md @@ -64,9 +64,6 @@ - [DeleteVolumeGroupSnapshot](#deletevolumegroupsnapshot) - [ControllerGetVolumeGroupSnapshot](#controllergetvolumegroupsnapshot) - [ListVolumeGroupSnapshots](#listvolumegroupsnapshots) - - [Alternatives](#alternatives) - - [Immutable VolumeGroup](#immutable-volumegroup) - - [ModifyVolume](#modifyvolume) - [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) - [Feature enablement and rollback](#feature-enablement-and-rollback) - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) @@ -75,6 +72,10 @@ - [Scalability](#scalability) - [Troubleshooting](#troubleshooting) - [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) + - [Immutable VolumeGroup](#immutable-volumegroup) + - [ModifyVolume](#modifyvolume) ## Release Signoff Checklist @@ -296,6 +297,8 @@ This feature requires coordination between several controllers including the new ### Test Plan +[X] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement. + ##### Prerequisite testing updates N/A @@ -1343,125 +1346,6 @@ message ListVolumeGroupSnapshotsResponse { } ``` -### Alternatives - -#### Immutable VolumeGroup - -During the design discussions, an immutable VolumeGroup was considered but was removed because this would add lots of complexity to the design without much gain. It would also make it impossible to support the current way PVCs are provisioned in a Statefulset. - -Immutable VolumeGroup - PVCList or PVC Selector in the ImmutableSource field in the Spec (optional field); PVCList is in the Status. -* Create a new VolumeGroup with existing PVCs by PVCList or PVC Selector in the Spec. The PVCList will be in the VolumeGroup Status as well. -* VolumeGroup Status has a boolean Mutable set to false. - -``` -ImmutableSource struct { - PVCList - Selector -} -``` - -``` -// VolumeGroupSpec describes the common attributes of group storage devices -// and allows a Source for provider-specific attributes -Type VolumeGroupSpec struct { - // +optional - VolumeGroupClassName *string - - // If ImmutableSource is nil, an empty volume group will be created. - // Otherwise, a volume group will be created with PVCs (if PVCList or Select is set) - // If ImmutableSource is not nil, it indicates the VolumeGroup is immutable - // +optional - ImmutableSource *VolumeGroupSource -} - -// VolumeGroupSource contains 3 options. If VolumeGroupSource is not nil, -// one of the 3 options must be defined. -Type VolumeGroupSource struct { - // A list of existing persistent volume claims - // +optional - PVCList []PersistentVolumeClaim - - // A label query over existing persistent volume claims to be added to the volume group. - // +optional - Selector *metav1.LabelSelector - } - -type VolumeGroupStatus struct { - // VolumeGroupId is a unique id returned by the CSI driver - // to identify the VolumeGroup on the storage system. - // If a storage system does not provide such an id, the - // CSI driver can choose to return the VolumeGroup name. - VolumeGroupId *string - - GroupCreationTime *metav1.Time - - // A list of persistent volume claims - // +optional - PVCList []PersistentVolumeClaim - - Ready *bool - - // Mutable indicates if a VolumeGroup can be modified - // after it is created. If false, it indicates it cannot be - // modified once created. If ImmutableSource is not nil - // in VolumeGroupSpec, Mutable must be false; otherwise - // it means the driver does not support ImmutableSource. - // VOLUMEGROUP_IMMUTABLE and VOLUMEGROUP_MUTABLE capability - // will be added to the CSI spec. - Mutable *bool - - // If true, it indicates the CSI driver supports adding - // an existing volume to the VolumeGroup and removing a - // volume from the VolumeGroup without deleting it. - // Only mutable VolumeGroup can support AddRemoveExistingPVC. - // A corresponding VOLUMEGROUP_ADD_REMOVE_EXISTING_VOLUME - // capability will be added to the CSI spec. - AddRemoveExistingPVC *bool - - // Last error encountered during group creation - Error *VolumeGroupError -} -``` - -VOLUMEGROUP_IMMUTABLE and VOLUMEGROUP_MUTABLE capability will be added to the CSI spec. -If VOLUMEGROUP_IMMUTABLE is supported, a VolumeGroup with an ImmutableSource can be created. Mutable will be false, PVCList will be set, and Ready will be true in the Status. -Otherwise, a VolumeGroup with an ImmutableSource will not be created successfully. - -#### ModifyVolume - -ModifyVolume CSI RPC was considered earlier to add/remove one volume to/from a group at a time but it was removed because ModifyVolumeGroup CSI RPC was added. - -A new MODIFY_VOLUME capability will be added to support this. -It indicates that the controller plugin supports modifying a volume. - -``` - rpc ModifyVolume(ModifyVolumeRequest) - returns (ModifyVolumeResponse) { - option (alpha_method) = true; - } -``` - -This RPC is called when an existing volume is added to an existing volume group or when a volume is removed from the volume group. -A volume group id parameter will be in the ModifyVolumeRequest for an add request. -A volume group id parameter will not be in the ModifyVolumeRequest for a delete request. -If user requests to add an existing volume to a consistency group, but the CSI driver cannot fulfill the request because the existing volume is placed on a different storage pool from the consistency group, then the CSI driver MUST return failure. -This RPC MUST be idempotent. - -``` -message ModifyVolumeRequest { - string volume_id = 1; - - // This field is OPTIONAL. - repeated string volume_group_id = 2 [(alpha_field) = true]; - - // Secrets required by plugin to complete modify volume request. - // This field is OPTIONAL. Refer to the `Secrets Requirements` - // section on how to use this field. - map secrets = 3 [(csi_secret) = true]; -} -``` -External-provisioner will be modified so that modifying PVC by adding VolumeGroupName will trigger a ModifyVolume call (a new CSI controller RPC) to CSI driver. - ## Production Readiness Review Questionnaire ### Feature enablement and rollback @@ -1492,7 +1376,6 @@ _This section must be completed when targeting alpha to a release._ We will be able to create new VolumeGroup and VolumeGroupSnapshot API objects again. * **Are there any tests for feature enablement/disablement?** - Unit tests will be added for the in-tree feature enable/disablement. Since there is no feature gate for this feature on the external controller side and the only way to enable or disable this feature is to install or unistall the sidecar, we cannot write tests for feature enablement/disablement. @@ -1645,3 +1528,128 @@ _This section must be completed when targeting beta graduation to a release._ [existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos ## Implementation History + +N/A + +## Drawbacks + +Adding more new APIs and more complexities. + +## Alternatives + +### Immutable VolumeGroup + +During the design discussions, an immutable VolumeGroup was considered but was removed because this would add lots of complexity to the design without much gain. It would also make it impossible to support the current way PVCs are provisioned in a Statefulset. + +Immutable VolumeGroup - PVCList or PVC Selector in the ImmutableSource field in the Spec (optional field); PVCList is in the Status. +* Create a new VolumeGroup with existing PVCs by PVCList or PVC Selector in the Spec. The PVCList will be in the VolumeGroup Status as well. +* VolumeGroup Status has a boolean Mutable set to false. + +``` +ImmutableSource struct { + PVCList + Selector +} +``` + +``` +// VolumeGroupSpec describes the common attributes of group storage devices +// and allows a Source for provider-specific attributes +Type VolumeGroupSpec struct { + // +optional + VolumeGroupClassName *string + + // If ImmutableSource is nil, an empty volume group will be created. + // Otherwise, a volume group will be created with PVCs (if PVCList or Select is set) + // If ImmutableSource is not nil, it indicates the VolumeGroup is immutable + // +optional + ImmutableSource *VolumeGroupSource +} + +// VolumeGroupSource contains 3 options. If VolumeGroupSource is not nil, +// one of the 3 options must be defined. +Type VolumeGroupSource struct { + // A list of existing persistent volume claims + // +optional + PVCList []PersistentVolumeClaim + + // A label query over existing persistent volume claims to be added to the volume group. + // +optional + Selector *metav1.LabelSelector + } + +type VolumeGroupStatus struct { + // VolumeGroupId is a unique id returned by the CSI driver + // to identify the VolumeGroup on the storage system. + // If a storage system does not provide such an id, the + // CSI driver can choose to return the VolumeGroup name. + VolumeGroupId *string + + GroupCreationTime *metav1.Time + + // A list of persistent volume claims + // +optional + PVCList []PersistentVolumeClaim + + Ready *bool + + // Mutable indicates if a VolumeGroup can be modified + // after it is created. If false, it indicates it cannot be + // modified once created. If ImmutableSource is not nil + // in VolumeGroupSpec, Mutable must be false; otherwise + // it means the driver does not support ImmutableSource. + // VOLUMEGROUP_IMMUTABLE and VOLUMEGROUP_MUTABLE capability + // will be added to the CSI spec. + Mutable *bool + + // If true, it indicates the CSI driver supports adding + // an existing volume to the VolumeGroup and removing a + // volume from the VolumeGroup without deleting it. + // Only mutable VolumeGroup can support AddRemoveExistingPVC. + // A corresponding VOLUMEGROUP_ADD_REMOVE_EXISTING_VOLUME + // capability will be added to the CSI spec. + AddRemoveExistingPVC *bool + + // Last error encountered during group creation + Error *VolumeGroupError +} +``` + +VOLUMEGROUP_IMMUTABLE and VOLUMEGROUP_MUTABLE capability will be added to the CSI spec. +If VOLUMEGROUP_IMMUTABLE is supported, a VolumeGroup with an ImmutableSource can be created. Mutable will be false, PVCList will be set, and Ready will be true in the Status. +Otherwise, a VolumeGroup with an ImmutableSource will not be created successfully. + +### ModifyVolume + +ModifyVolume CSI RPC was considered earlier to add/remove one volume to/from a group at a time but it was removed because ModifyVolumeGroup CSI RPC was added. + +A new MODIFY_VOLUME capability will be added to support this. +It indicates that the controller plugin supports modifying a volume. + +``` + rpc ModifyVolume(ModifyVolumeRequest) + returns (ModifyVolumeResponse) { + option (alpha_method) = true; + } +``` + +This RPC is called when an existing volume is added to an existing volume group or when a volume is removed from the volume group. +A volume group id parameter will be in the ModifyVolumeRequest for an add request. +A volume group id parameter will not be in the ModifyVolumeRequest for a delete request. +If user requests to add an existing volume to a consistency group, but the CSI driver cannot fulfill the request because the existing volume is placed on a different storage pool from the consistency group, then the CSI driver MUST return failure. +This RPC MUST be idempotent. + +``` +message ModifyVolumeRequest { + string volume_id = 1; + + // This field is OPTIONAL. + repeated string volume_group_id = 2 [(alpha_field) = true]; + + // Secrets required by plugin to complete modify volume request. + // This field is OPTIONAL. Refer to the `Secrets Requirements` + // section on how to use this field. + map secrets = 3 [(csi_secret) = true]; +} +``` +External-provisioner will be modified so that modifying PVC by adding VolumeGroupName will trigger a ModifyVolume call (a new CSI controller RPC) to CSI driver. From 356e188ab8603db34f48af879e0c346ca99dfa75 Mon Sep 17 00:00:00 2001 From: xing-yang Date: Wed, 5 Oct 2022 20:18:18 -0400 Subject: [PATCH 08/19] Clarify how to disable the feature once it has been enabled --- keps/sig-storage/3476-volume-group/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/keps/sig-storage/3476-volume-group/README.md b/keps/sig-storage/3476-volume-group/README.md index bdbd804a560..12e5a7b94d5 100644 --- a/keps/sig-storage/3476-volume-group/README.md +++ b/keps/sig-storage/3476-volume-group/README.md @@ -1370,7 +1370,7 @@ _This section must be completed when targeting alpha to a release._ * **Can the feature be disabled once it has been enabled (i.e. can we rollback the enablement)?** - Yes. All VolumeGroup and VolumeGroupSnapshot API objects need to be deleted before this feature can be truly disabled. + Yes. In order to disable this feature once it has been enabled, we first need to make sure that all VolumeGroup and VolumeGroupSnapshot API objects are deleted. Then the new controllers for VolumeGroup and VolumeGroupSnapshot can be stopped/removed, and external-provisioner sidecar and external-snapshotter controller/sidecar can be downgraded to a version without this feature. * **What happens if we reenable the feature if it was previously rolled back?** We will be able to create new VolumeGroup and VolumeGroupSnapshot API objects again. From ccb45489c5c65adfa331134e54f35b1329000f21 Mon Sep 17 00:00:00 2001 From: xing-yang Date: Wed, 5 Oct 2022 22:13:12 -0400 Subject: [PATCH 09/19] Address review comments --- keps/sig-storage/3476-volume-group/README.md | 55 +++++++++++++------- 1 file changed, 35 insertions(+), 20 deletions(-) diff --git a/keps/sig-storage/3476-volume-group/README.md b/keps/sig-storage/3476-volume-group/README.md index 12e5a7b94d5..f8fcf0383a4 100644 --- a/keps/sig-storage/3476-volume-group/README.md +++ b/keps/sig-storage/3476-volume-group/README.md @@ -12,6 +12,7 @@ - [Non Goals](#non-goals) - [Proposal for VolumeGroup and VolumeGroupSnapshot](#proposal-for-volumegroup-and-volumegroupsnapshot) - [Create VolumeGroup](#create-volumegroup) + - [Delete VolumeGroup and PVC](#delete-volumegroup-and-pvc) - [Modify VolumeGroup](#modify-volumegroup) - [Create and Modify VolumeGroup](#create-and-modify-volumegroup) - [Create new PVC and add to the VolumeGroup](#create-new-pvc-and-add-to-the-volumegroup) @@ -130,7 +131,7 @@ While there is already a KEP (https://github.com/kubernetes/enhancements/pull/10 * For some storage systems, volumes are always managed in a group. For these storage systems, they will have to create a group for a single volume if they need to implement a create volume function in Kubernetes. Volume snapshotting, cloning, expansion, and deletion, etc. are all performed at a group level. Providing a VolumeGroup API will be very convenient for them. -* Instead of taking individual snapshots one after another, VolumeGroup can be used as a source for taking a snapshot of all the volumes in the same volume group. This may be a storage level consistent group snapshot if the storage system supports it. In any case, when used together with quiesce hooks, this group snapshot can be application consistent. For this use case, we will introduce another CRD VolumeGroupSnapshot. +* Instead of taking individual snapshots one after another, VolumeGroup can be used as a source for taking a snapshot of all the volumes in the same volume group. This may be a storage level consistent group snapshot if the storage system supports it. For this use case, we will introduce another CRD VolumeGroupSnapshot. * VolumeGroup can also be used together with application snapshot. It can be a resource managed by the ApplicationSnapshot CRD. @@ -173,7 +174,14 @@ Phase 1 (Note: only Phase 1 will be covered in this KEP which is targeting Alpha Phase 2 (After v1.26): 1. Create a new volume group by querying a label on existing persistent volume claims and adding them to the volume group. 2. Create a new volume group from an existing group snapshot or another volume group in one step. Design details will be added in a future KEP. -2. Non-goal: Create a new empty group and in the same time create new empty PVCs and add to the new group. + +Non-goal: Create a new empty group and in the same time create new empty PVCs and add to the new group. + +### Delete VolumeGroup and PVC + +Deleting a volume group will delete the volume group along with all the PVCs in the group. + +An individual PVC needs to be removed from the group first before it can be deleted. A finalizer or webhook will be added that prevents an individual PVC in a group from being deleted. ### Modify VolumeGroup @@ -204,7 +212,7 @@ We can add an existing PVC to the group or remove a PVC from the group without d * Admin creates a VolumeGroupClass, with the SupportVolumeGroupSnapshot boolean flag set to true. * User creates a new empty VolumeGroup, specifying the above VolumeGroupClass. A new empty VolumeGroupContent will also be created and bound to the VolumeGroup. -* Add an existing PVC to an existing VolumeGroup (VolumeGroup can be empty to start with or it can have other PVCs already) by adding VolumeGroup name as a label to the PVC. +* Add an existing PVC to an existing VolumeGroup (VolumeGroup can be empty to start with or it can have other PVCs already) by adding a label specified by the labelSelector in the VolumeGroup to the PVC. * The VolumeGroup name is added by user to each PVC, not by the VolumeGroup controller. The VolumeGroup controller watches PVCs and reacts to the PVC updated with a VolumeGroup name event as described in the following step. * VolumeGroup is modified so the existing PVC is added to the PVCList in the Status. VolumeGroupContent is also modified so the PV is added to the PVList in the Status. * Note: The VolumeGroup controller will be implemented to have a desired state @@ -439,31 +447,38 @@ Type VolumeGroupSpec struct { // +optional VolumeGroupClassName *string - // VolumeGroupContentName is the binding reference to the VolumeGroupContent - // backing this VolumeGroup - // +optional - VolumeGroupContentName *string - - // Phase 2 - // +optional - // VolumeGroupSource *VolumeGroupSource + // Source has the information about where the group is created from. + // Required. + Source VolumeGroupSource } -// Phase 2: VolumeGroupSource will be in Phase 2 -// VolumeGroupSource contains 2 options. If VolumeGroupSource is not nil, -// one of the 2 options must be defined. +// VolumeGroupSource contains several options. +// OneOf the options must be defined. Type VolumeGroupSource struct { - // A label query over existing persistent volume claims to be added to the volume group. // +optional + // Pre-provisioned VolumeGroup + VolumeGroupContentName *string + + // +optional + // Dynamically provisioned VolumeGroup + // A label query over persistent volume claims to be added to the volume group. + // This labelSelector will be used to match the label added to a PVC. + // In Phase 1, when the label is added to PVC, the PVC will be added to the matching group. + // In Phase 2, this labelSelector will be used to find all PVCs with matching label and add them to the group when the group is being created. Selector *metav1.LabelSelector + // Phase 2 + // +optional + // Dynamically provisioned VolumeGroup // This field specifies the source of a volume group. (this is for restore) // Supported Kind is VolumeGroupSnapshot or VolumeGroup - // +optional - GroupDataSource *TypedLocalObjectReference + // GroupDataSource *TypedLocalObjectReference } type VolumeGroupStatus struct { + // +optional + BoundVolumeGroupContentName *string + // +optional GroupCreationTime *metav1.Time @@ -761,7 +776,7 @@ Type VolumeGroupSnapshotContentStatus struct { #### PersistentVolumeClaim and PersistentVolume -For PersistentVolumeClaim, the user can request it to be added to a VolumeGroup by adding a label with the VolumeGroup name, i.e., volumegroup.storage.k8s.io/volumegroup:volumeGroup1. In the initial phase, no changes will be proposed to PersistentVolumeClaim and PersistentVolume API objects. Before moving to Beta, we will re-evaluate this. +For PersistentVolumeClaim, the user can request it to be added to a VolumeGroup by adding the same label specified by the labelSelector in the VolumeGroup. In the initial phase, no changes will be proposed to PersistentVolumeClaim and PersistentVolume API objects. Before moving to Beta, we will re-evaluate this. #### VolumeSnapshot and VolumeSnapshotContent @@ -820,7 +835,7 @@ kind: PersistentVolumeClaim metadata: name: pvc1 labels: - volumegroup.storage.k8s.io/volumegroup:volumeGroup1 + volumegroup:myApp spec: accessModes: - ReadWriteOnce @@ -967,8 +982,8 @@ CSI Plugins MAY create the following types of volume groups: * At restore time, create a single volume from individual snapshot and then join an existing group. * Create an empty group. * Create a volume from snapshot, specifying the group name in the volume. +* Phase 2: Create a new volume group and add a list of existing volumes to the group by querying a label on PVCs. The label is specified by the labelSelector in the volume group. * Phase 2: Create a new volume group from a source group snapshot or another group. -* Phase 2: Create a new volume group and add a list of existing volumes to the group by querying a label on PVCs. The following is non-goal: * Non goal: Create a new group and at the same time create a list of new volumes in the group. From fe71e537453887e4f0457c3f4fb5f77746e1927b Mon Sep 17 00:00:00 2001 From: xing-yang Date: Thu, 6 Oct 2022 13:35:25 -0400 Subject: [PATCH 10/19] Addressed comments from PRR review --- keps/sig-storage/3476-volume-group/README.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/keps/sig-storage/3476-volume-group/README.md b/keps/sig-storage/3476-volume-group/README.md index f8fcf0383a4..6d37ab1f589 100644 --- a/keps/sig-storage/3476-volume-group/README.md +++ b/keps/sig-storage/3476-volume-group/README.md @@ -1387,6 +1387,12 @@ _This section must be completed when targeting alpha to a release._ the enablement)?** Yes. In order to disable this feature once it has been enabled, we first need to make sure that all VolumeGroup and VolumeGroupSnapshot API objects are deleted. Then the new controllers for VolumeGroup and VolumeGroupSnapshot can be stopped/removed, and external-provisioner sidecar and external-snapshotter controller/sidecar can be downgraded to a version without this feature. +If we don't delete the VolumeGroup and VolumeGroupSnapshot API objects and CRDs but just uninstall the VolumeGroup and VolumeGroupSnapshot controllers and downgrade the other sidecars, the API objects continue to exist in the API server. User may delete an individual PVC that is part of a VolumeGroup or delete an individual VolumeSnapshot that is associated with a VolumeGroupSnapshot. After that if the user starts the controllers/sidecars again and try to use the pre-existing VolumeGroup and VolumeGroupSnapshot, they are no longer in sync with the storage system. + +If the API objects and VolumeGroup and GroupSnapshot controllers are running, but the provisioner/snapshotter sidecars are downgraded to a lower version that does not support this feature, creating a PVC and adding it to the group will not work as one step. Basically the provisioner sidecar will create a new PV but ignoring the part that adds it to the group. If the CSI driver also supports VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME capability, the VolumeGroup controller will detect an existing PVC with a matching label and will try to add the PVC to the group. If the CSI driver does not support VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME capability, the PVC will not be added to the group. + +If the external-provisioner and external-snapshotter sidecars which support this feature are running but VolumeGroup/GroupSnapshot controllers are not (CRDs are still installed), creating VolumeGroup or creating VolumeGroupSnapshot will not be successfully. Ready status in VolumeGroup or VolumeGroupSnapshot API objects will be false until those controllers are running again. + * **What happens if we reenable the feature if it was previously rolled back?** We will be able to create new VolumeGroup and VolumeGroupSnapshot API objects again. From 9235b73b15b738c61d326be72d7c1c51d3f70b68 Mon Sep 17 00:00:00 2001 From: xing-yang Date: Thu, 6 Oct 2022 15:00:54 -0400 Subject: [PATCH 11/19] Address PRR comments --- keps/sig-storage/3476-volume-group/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/keps/sig-storage/3476-volume-group/README.md b/keps/sig-storage/3476-volume-group/README.md index 6d37ab1f589..976ac55a9f4 100644 --- a/keps/sig-storage/3476-volume-group/README.md +++ b/keps/sig-storage/3476-volume-group/README.md @@ -1387,9 +1387,9 @@ _This section must be completed when targeting alpha to a release._ the enablement)?** Yes. In order to disable this feature once it has been enabled, we first need to make sure that all VolumeGroup and VolumeGroupSnapshot API objects are deleted. Then the new controllers for VolumeGroup and VolumeGroupSnapshot can be stopped/removed, and external-provisioner sidecar and external-snapshotter controller/sidecar can be downgraded to a version without this feature. -If we don't delete the VolumeGroup and VolumeGroupSnapshot API objects and CRDs but just uninstall the VolumeGroup and VolumeGroupSnapshot controllers and downgrade the other sidecars, the API objects continue to exist in the API server. User may delete an individual PVC that is part of a VolumeGroup or delete an individual VolumeSnapshot that is associated with a VolumeGroupSnapshot. After that if the user starts the controllers/sidecars again and try to use the pre-existing VolumeGroup and VolumeGroupSnapshot, they are no longer in sync with the storage system. +If we don't delete the VolumeGroup and VolumeGroupSnapshot API objects and CRDs but just uninstall the VolumeGroup and VolumeGroupSnapshot controllers and downgrade the other sidecars, the API objects continue to exist in the API server. User may delete an individual PVC that is part of a VolumeGroup or delete an individual VolumeSnapshot that is associated with a VolumeGroupSnapshot. After that if the user starts the controllers/sidecars again and try to use the pre-existing VolumeGroup and VolumeGroupSnapshot, they are no longer in sync with the storage system. Assume the VolumeGroup has 3 PVCs initially, but 1 got removed by user but the VolumeGroup status is not updated so it still has a record of 3. If the user now takes a group snapshot from the VolumeGroup, the storage system will return an error due to the mismatch. We could add logic to reconcile what is in the K8s API object VolumeGroup and what is on the storage system, but before the reconcile completes, the call to create a VolumeGroup will fail. -If the API objects and VolumeGroup and GroupSnapshot controllers are running, but the provisioner/snapshotter sidecars are downgraded to a lower version that does not support this feature, creating a PVC and adding it to the group will not work as one step. Basically the provisioner sidecar will create a new PV but ignoring the part that adds it to the group. If the CSI driver also supports VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME capability, the VolumeGroup controller will detect an existing PVC with a matching label and will try to add the PVC to the group. If the CSI driver does not support VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME capability, the PVC will not be added to the group. +If the API objects and VolumeGroup and GroupSnapshot controllers are running, but the provisioner/snapshotter sidecars are downgraded to a lower version that does not support this feature, creating a PVC and adding it to the group will not work as one step. Basically the provisioner sidecar will create a new PV but ignoring the part that adds it to the group. If the CSI driver also supports VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME capability, the VolumeGroup controller will detect an existing PVC with a matching label and will try to add the PVC to the group. If the CSI driver does not support VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME capability, the PVC will not be added to the group. In this case, PV will still work but it is not in the group so user's request isn't fully satisfied. There won't be errors in this case. The provisioner doesn't support the new feature so it ignores the label on PVC and won't add it to the group. VolumeGroup controller won't pick it up because the CSI driver does not have the capability to support adding an existing volume to a group. If the external-provisioner and external-snapshotter sidecars which support this feature are running but VolumeGroup/GroupSnapshot controllers are not (CRDs are still installed), creating VolumeGroup or creating VolumeGroupSnapshot will not be successfully. Ready status in VolumeGroup or VolumeGroupSnapshot API objects will be false until those controllers are running again. From ffbc96687854e257e6d11b56749ce0c0a608811b Mon Sep 17 00:00:00 2001 From: xing-yang Date: Thu, 12 Jan 2023 11:25:18 -0500 Subject: [PATCH 12/19] Update milestone to 1.27 --- keps/sig-storage/3476-volume-group/kep.yaml | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/keps/sig-storage/3476-volume-group/kep.yaml b/keps/sig-storage/3476-volume-group/kep.yaml index 2a248cf540c..3c4b2be2fac 100644 --- a/keps/sig-storage/3476-volume-group/kep.yaml +++ b/keps/sig-storage/3476-volume-group/kep.yaml @@ -18,12 +18,12 @@ approvers: see-also: replaces: -latest-milestone: "v1.26" +latest-milestone: "v1.27" stage: "alpha" milestone: - alpha: "v1.26" - beta: "v1.27" - stable: "v1.28" + alpha: "v1.27" + beta: "v1.29" + stable: "v1.31" feature-gates: disable-supported: true From bc3c94006f79a74534024dc384c13dc97f21f891 Mon Sep 17 00:00:00 2001 From: xing-yang Date: Mon, 23 Jan 2023 23:29:58 -0500 Subject: [PATCH 13/19] Removed VolumeGroup API --- .../README.md | 1762 +++++++++-------- .../kep.yaml | 2 +- 2 files changed, 888 insertions(+), 876 deletions(-) rename keps/sig-storage/{3476-volume-group => 3476-volume-group-snapshot}/README.md (69%) rename keps/sig-storage/{3476-volume-group => 3476-volume-group-snapshot}/kep.yaml (91%) diff --git a/keps/sig-storage/3476-volume-group/README.md b/keps/sig-storage/3476-volume-group-snapshot/README.md similarity index 69% rename from keps/sig-storage/3476-volume-group/README.md rename to keps/sig-storage/3476-volume-group-snapshot/README.md index 976ac55a9f4..ba9eef2a86f 100644 --- a/keps/sig-storage/3476-volume-group/README.md +++ b/keps/sig-storage/3476-volume-group-snapshot/README.md @@ -1,4 +1,4 @@ -# KEP-3476: Volume Group and Group Snapshot +# KEP-3476: Volume Group Snapshot ## Table of Contents @@ -6,19 +6,9 @@ - [Release Signoff Checklist](#release-signoff-checklist) - [Summary](#summary) - [Motivation](#motivation) - - [Use cases for this KEP](#use-cases-for-this-kep) - - [Future use cases](#future-use-cases) - [Goals](#goals) - [Non Goals](#non-goals) -- [Proposal for VolumeGroup and VolumeGroupSnapshot](#proposal-for-volumegroup-and-volumegroupsnapshot) - - [Create VolumeGroup](#create-volumegroup) - - [Delete VolumeGroup and PVC](#delete-volumegroup-and-pvc) - - [Modify VolumeGroup](#modify-volumegroup) - - [Create and Modify VolumeGroup](#create-and-modify-volumegroup) - - [Create new PVC and add to the VolumeGroup](#create-new-pvc-and-add-to-the-volumegroup) - - [Modify VolumeGroup with existing PVCs](#modify-volumegroup-with-existing-pvcs) - - [Phase 2: Create VolumeGroup from VolumeGroupSnapshot or another VolumeGroup](#phase-2-create-volumegroup-from-volumegroupsnapshot-or-another-volumegroup) - - [Pre-provisioned VolumeGroup](#pre-provisioned-volumegroup) +- [Proposal for VolumeGroupSnapshot](#proposal-for-volumegroupsnapshot) - [Create VolumeGroupSnapshot](#create-volumegroupsnapshot) - [Dynamic provisioning](#dynamic-provisioning) - [Pre-provisioned VolumeGroupSnapshot](#pre-provisioned-volumegroupsnapshot) @@ -39,32 +29,18 @@ - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) - [Version Skew Strategy](#version-skew-strategy) - [API Definitions](#api-definitions) - - [VolumeGroupClass](#volumegroupclass) - - [VolumeGroup](#volumegroup) - - [VolumeGroupContent](#volumegroupcontent) - [VolumeGroupSnapshotClass](#volumegroupsnapshotclass) - [VolumeGroupSnapshot](#volumegroupsnapshot) - [VolumeGroupSnapshotContent](#volumegroupsnapshotcontent) - - [PersistentVolumeClaim and PersistentVolume](#persistentvolumeclaim-and-persistentvolume) - [VolumeSnapshot and VolumeSnapshotContent](#volumesnapshot-and-volumesnapshotcontent) - [Example Yaml Files](#example-yaml-files) - - [Create Volume Group](#create-volume-group) - - [Add PVC to VolumeGroup](#add-pvc-to-volumegroup) - [Create VolumeGroupSnapshot](#create-volumegroupsnapshot-1) - [CSI Changes](#csi-changes) - [CSI Capabilities](#csi-capabilities) - - [CSI Controller RPC](#csi-controller-rpc) - - [CreateVolumeGroup](#createvolumegroup) - - [CreateVolume](#createvolume) - - [DeleteVolumeGroup](#deletevolumegroup) - - [ModifyVolumeGroup](#modifyvolumegroup) - - [ControllerGetVolumeGroup](#controllergetvolumegroup) - - [ListVolumeGroups](#listvolumegroups) + - [CSI Group Controller RPC](#csi-group-controller-rpc) - [CreateVolumeGroupSnapshot](#createvolumegroupsnapshot) - - [CreateSnapshot](#createsnapshot) - [DeleteVolumeGroupSnapshot](#deletevolumegroupsnapshot) - [ControllerGetVolumeGroupSnapshot](#controllergetvolumegroupsnapshot) - - [ListVolumeGroupSnapshots](#listvolumegroupsnapshots) - [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) - [Feature enablement and rollback](#feature-enablement-and-rollback) - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) @@ -77,6 +53,29 @@ - [Alternatives](#alternatives) - [Immutable VolumeGroup](#immutable-volumegroup) - [ModifyVolume](#modifyvolume) + - [VolumeGroup API Definitions](#volumegroup-api-definitions) + - [Use cases for the VolumeGroup](#use-cases-for-the-volumegroup) + - [Future use cases for the VolumeGroup](#future-use-cases-for-the-volumegroup) + - [Proposal for VolumeGroup and VolumeGroupSnapshot](#proposal-for-volumegroup-and-volumegroupsnapshot) + - [Create VolumeGroup](#create-volumegroup) + - [Delete VolumeGroup and PVC](#delete-volumegroup-and-pvc) + - [Modify VolumeGroup](#modify-volumegroup) + - [Create and Modify VolumeGroup](#create-and-modify-volumegroup) + - [Create VolumeGroupSnapshot](#create-volumegroupsnapshot-2) + - [Delete VolumeGroupSnapshot](#delete-volumegroupsnapshot-1) + - [Restore](#restore-1) + - [VolumeGroupClass](#volumegroupclass) + - [VolumeGroup](#volumegroup) + - [VolumeGroupContent](#volumegroupcontent) + - [VolumeGroupSnapshotClass](#volumegroupsnapshotclass-1) + - [VolumeGroupSnapshot](#volumegroupsnapshot-1) + - [VolumeGroupSnapshotContent](#volumegroupsnapshotcontent-1) + - [PersistentVolumeClaim and PersistentVolume](#persistentvolumeclaim-and-persistentvolume) + - [VolumeSnapshot and VolumeSnapshotContent](#volumesnapshot-and-volumesnapshotcontent-1) + - [Example Yaml Files](#example-yaml-files-1) + - [Create Volume Group](#create-volume-group) + - [Add PVC to VolumeGroup](#add-pvc-to-volumegroup) + - [Create VolumeGroupSnapshot](#create-volumegroupsnapshot-3) ## Release Signoff Checklist @@ -103,7 +102,7 @@ Items marked with (R) are required *prior to targeting to a milestone / release* - [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input - [x] (R) Graduation criteria is in place - [x] (R) Production readiness review completed -- [ ] Production readiness review approved +- [x] Production readiness review approved - [x] "Implementation History" section is up-to-date for milestone - [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] - [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes @@ -119,136 +118,44 @@ Items marked with (R) are required *prior to targeting to a milestone / release* ## Summary -This proposal is to introduce a VolumeGroup API to manage multiple volumes together and a VolumeGroupSnapshot API to take a snapshot of a VolumeGroup. It also attempts to address other use cases such as volume placement. +This proposal introduces a Kubernetes API that allows users to take a crash consistent snapshot of multiple volumes together. It uses a label selector to group multiple persistent volume claims together for snapshotting. This design is proposed to add the volume group snapshot support for CSI Volume Drivers. The CSI volume group snapshot spec is proposed [here](https://github.com/container-storage-interface/spec/pull/519). ## Motivation -While there is already a KEP (https://github.com/kubernetes/enhancements/pull/1051) that introduces APIs to do application snapshot, backup, and restore, there are other use cases not covered by that KEP. - -### Use cases for this KEP - -* A VolumeGroup allows users to manage multiple volumes belonging to the same application together and therefore it is very useful in general. For example, it can be used to group all volumes in the same StatefulSet together and we can take a group snapshot of all the volumes in this StatefulSet. - -* For some storage systems, volumes are always managed in a group. For these storage systems, they will have to create a group for a single volume if they need to implement a create volume function in Kubernetes. Volume snapshotting, cloning, expansion, and deletion, etc. are all performed at a group level. Providing a VolumeGroup API will be very convenient for them. - -* Instead of taking individual snapshots one after another, VolumeGroup can be used as a source for taking a snapshot of all the volumes in the same volume group. This may be a storage level consistent group snapshot if the storage system supports it. For this use case, we will introduce another CRD VolumeGroupSnapshot. - -* VolumeGroup can also be used together with application snapshot. It can be a resource managed by the ApplicationSnapshot CRD. - -* Some applications may not want to use ApplicationSnapshot CRD because they don’t use Kubernetes workload APIs such as StatefulSet, Deployment, etc. Instead, they have developed their own operators. In this case it is more convenient to use VolumeGroup to manage persistent volumes used in those applications. - -* Application quiesce is time consuming. Some users may not want to do application quiesce very frequently for that reason. For example, a user may want to run weekly backups with application quiesce and nightly backups without application quiesce but with consistency group support which provides crash consistency across all volumes in the group. +There is already a [VolumeSnapshot API](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/177-volume-snapshot) that provides the ability to take a snapshot of a persistent volume to protect against data loss or data corruption. However, there are other snapshotting functionalities not covered by the VolumeSnapshot API. -### Future use cases +Some storage systems support consistent group snapshot that allows a snapshot to be taken from multiple volumes at the same point-in-time to achieve write order consistency. This can be useful for applications that contain multiple volumes. For example, an application may have data stored in one volume and logs stored in another volume. If snapshots for the data volume and the logs volume are taken at different times, the application will not be consistent and will not function properly if it is restored from those snapshots when an disaster strikes. -* VolumeGroup can be used to manage group replication or consistency group replication if the storage system supports it. Note replication is out of scope for this proposal. It is mentioned here as a potential future use case. +It is true that we can quiesce the application first, take an individual snapshot from each volume that is part of the application one after the other, and then unquiesce the application after all the individual snapshots are taken. This way we will get application consistent snapshots. However, application quiesce is time consuming. Sometimes it may not be possible to quiesce an application. Taking individual snapshots one after another may also take longer time compared to taking a consistent group snapshot. Some users may not want to do application quiesce very frequently for these reasons. For example, a user may want to run weekly backups with application quiesce and nightly backups without application quiesce but with consistency group support which provides crash consistency across all volumes in group. -* VolumeGroup can be used to manage volume placement to either spread the volumes across storage pools or stack the volumes on the same storage pool. Related KEPs proposing the concept of storage pool for volume placement is as follows: - https://github.com/kubernetes/enhancements/pull/1353 - https://github.com/kubernetes/enhancements/pull/1347 -We may not really need a VolumeGroup for this use case. A StoragePool is probably enough. This is to be determined. +There is also another KEP (https://github.com/kubernetes/enhancements/pull/1051) that introduces APIs to do application snapshot, backup, and restore, but that KEP has a broader scope. In other words, volume group snapshot proposed in this KEP can be used by the application snapshot proposed in the other KEP. ### Goals -* Provide an API to manage multiple volumes together in a group. -* Provide an API to take a snapshot of a group of volumes. -* The group API should be generic and extensible so that it may be used to support other features in the future. +* Provide an API to take a snapshot of multiple volumes together. ### Non Goals -* A VolumeGroup may potentially be used to support group replication in the future, but providing design on replication group is not in the scope of this KEP. This can be discussed in the future. -* Provide a design to facilitate volume placement using the group API (To be determined). - -## Proposal for VolumeGroup and VolumeGroupSnapshot - -This proposal introduces new CRDs VolumeGroup, VolumeGroupContent, VolumeGroupClass, VolumeGroupSnapshot, VolumeGroupSnapshotContent, and VolumeGroupSnapshotClass. - -### Create VolumeGroup - -Create new VolumeGroup can be done in several ways: - -Phase 1 (Note: only Phase 1 will be covered in this KEP which is targeting Alpha in K8s v1.26): -1. Create an empty group first, then create a new PVC with the group name. This will create a new volume and add that volume to the already created group. When deleting this volume group, all volumes in the group will be deleted together with the group. A CSI driver supporting CREATE_DELETE_VOLUME_GROUP controller capability MUST implement this feature. -2. Create an empty group first, then add an existing PVC to the group one by one. A CSI driver supporting VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME MUST implement this feature. - -Phase 2 (After v1.26): -1. Create a new volume group by querying a label on existing persistent volume claims and adding them to the volume group. -2. Create a new volume group from an existing group snapshot or another volume group in one step. Design details will be added in a future KEP. - -Non-goal: Create a new empty group and in the same time create new empty PVCs and add to the new group. - -### Delete VolumeGroup and PVC - -Deleting a volume group will delete the volume group along with all the PVCs in the group. - -An individual PVC needs to be removed from the group first before it can be deleted. A finalizer or webhook will be added that prevents an individual PVC in a group from being deleted. - -### Modify VolumeGroup - -Modify an existing VolumeGroup: -1. Create a new volume with an existing VolumeGroup name will create a new volume and add it to the group. Option 1 of creating VolumeGroup above falls into this case. As mentioned earlier, a CSI driver supporting CREATE_DELETE_VOLUME_GROUP MUST implement this feature. -2. Add an existing volume to an existing VolumeGroup or remove a volume from a VolumeGroup. Option 2 of creating VolumeGroup above falls into this case. As mentioned earlier, a CSI driver supporting VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME MUST implement this feature. - -### Create and Modify VolumeGroup - -VolumeGroups can be created and/or modified in several ways as described in the following. - -#### Create new PVC and add to the VolumeGroup - -* Admin creates a VolumeGroupClass, with the SupportVolumeGroupSnapshot boolean flag set to true. -* User creates a new empty VolumeGroup, specifying the above VolumeGroupClass. As a result, a new empty VolumeGroupContent will also be created and bound to the VolumeGroup. -* User creates a new PVC with an existing VolumeGroup name created above. As a result, a new PVC is created and added to VolumeGroup. VolumeGroup is modified so Status has this new PVC in PVCList. -* External-provisioner will be modified so that VolumeGroupName will be passed to the CSI driver when creating a volume. - -Only CSI drivers supporting CREATE_DELETE_VOLUME_GROUP capability can support the volume group this way. - -When a new PVC is created with the existing VolumeGroup name, the VolumeGroup will be modified and the PVC will be added to PVCList in the Status, and the VolumeGroupContent will also be modified and the PV will be added to the PVList in the Status. - -The same PVC can belong to different groups, i.e., different types of groups or different groups of the same type, if the storage system supports it. Storage system will decide whether to support this or not. If it does not support it, an INVALID_ARGUMENT error code should be returned with a message explaining why. We don't prevent it in the API or controller directly. - -#### Modify VolumeGroup with existing PVCs - -We can add an existing PVC to the group or remove a PVC from the group without deleting it. A VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME capability will be added to CSI Spec. Only CSI drivers supporting both CREATE_DELETE_VOLUME_GROUP and VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME capabilities can support the volume group this way. - -* Admin creates a VolumeGroupClass, with the SupportVolumeGroupSnapshot boolean flag set to true. -* User creates a new empty VolumeGroup, specifying the above VolumeGroupClass. A new empty VolumeGroupContent will also be created and bound to the VolumeGroup. -* Add an existing PVC to an existing VolumeGroup (VolumeGroup can be empty to start with or it can have other PVCs already) by adding a label specified by the labelSelector in the VolumeGroup to the PVC. - * The VolumeGroup name is added by user to each PVC, not by the VolumeGroup controller. The VolumeGroup controller watches PVCs and reacts to the PVC updated with a VolumeGroup name event as described in the following step. -* VolumeGroup is modified so the existing PVC is added to the PVCList in the Status. VolumeGroupContent is also modified so the PV is added to the PVList in the Status. - * Note: The VolumeGroup controller will be implemented to have a desired state - of the world and an actual state of the world. The desired state of the world - contains VolumeGroups with the desired PVCList while the actual state of the - world contains VolumeGroups with the actual PVCList. The controller will try - to reconcile the two by handling adding and removing multiple PVCs through a - single CSI ModifyVolumeGroup RPC call each time. -* External-provisioner will be modified to update the status of PVC and PV. -* VolumeGroup controller will be triggered to update the VolumeGroup Status and VolumeGroupContent Status. -* If one volume fails to be added, it should not affect it if it is used by a pod, but there will be error messages. -* Removing a PVC from a VolumeGroup will trigger the external-provisioner and the VolumeGroup controller as well. - -#### Phase 2: Create VolumeGroup from VolumeGroupSnapshot or another VolumeGroup - -This is in Phase 2 so won't be discussed in detail here. Creating a new volume group from an existing group snapshot will be supported in Phase 2 if the CSI driver supports VOLUME_GROUP_FROM_GROUP_SNAPSHOT capability. As a result, PVCs will be created from source snapshots and placed in a new volume group. - -#### Pre-provisioned VolumeGroup +* Provide an API to manage multiple volumes together in a group. +* Provide a generic and extensible group API that may be used to support other features in the future. +* Provide a VolumeGroup API that supports group replication. +* Provide a design to facilitate volume placement using the group API. -Admin can create a VolumeGroupContent, specifying an existing VolumeGroupHandle in the storage system and specifying a VolumeGroup name and namespace. Then the user creates a VolumeGroup that points to the VolumeGroupContent name. +## Proposal for VolumeGroupSnapshot -Admin will retrieve all volumeHandles in the VolumeGroup from the storage system, create PVs pointing to the volumeHandles. Then the user creates PVCs pointing to the PVs. +This proposal introduces new CRDs VolumeGroupSnapshot, VolumeGroupSnapshotContent, and VolumeGroupSnapshotClass. ### Create VolumeGroupSnapshot -A VolumeGroupSnapshot can be created with a VolumeGroup as the source if the CSI driver supports the CREATE_DELETE_GROUP_SNAPSHOT capability. +A VolumeGroupSnapshot can be created from multiple PVCs with a label on the PVCs specified by the labelSelector in the VolumeGroupSnapshot if the CSI driver supports the CREATE_DELETE_GET_VOLUME_GROUP_SNAPSHOT capability. #### Dynamic provisioning * Admin creates a VolumeGroupSnapshotClass. -* User creates a VolumeGroupSnapshot with a VolumeGroup as the source. -* This will trigger the VolumeGroupSnapshot controller to create a VolumeGroupSnapshotContent API object, and also call the CreateVolumeGroupSnapshot CSI function and also create multiple VolumeSnapshot API objects with VolumeGroupSnapshot name parameter in each VolumeSnapshot Status. This will trigger the creation of VolumeSnapshotContent API objects in the snapshot controller and calls to the CreateSnapshot CSI function in the CSI snapshotter sidecar. The CSI snapshotter sidecar will pass the new group_snapshot_name parameter to the CSI Driver when calling CreatSnapshot. -* When CSI driver receives CreateSnapshot request for individual snapshots with a VolumeGroupSnapshot name: - * If it knows how to create a group snapshot on the storage system, it returns (nil, nil), and leaves it to the CreateVolumeGroupSnapshot function to handle the snapshot creation. +* User creates a VolumeGroupSnapshot with label selector that matches the label applied to all PVCs to be snapshotted together. +* This will trigger the VolumeGroupSnapshot controller to create a VolumeGroupSnapshotContent API object, and also call the CreateVolumeGroupSnapshot CSI function. It will also create multiple VolumeSnapshot API objects with volumeGroupSnapshotName in the status and the corresponding VolumeSnapshotContents with the snapshot handle. The VolumeSnapshot and VolumeSnapshotContent will point to each other before these objects are created in the API server to avoid triggering the VolumeSnapshot controller to create new individual objects. The CSI snapshotter sidecar will not call CSI driver in this case. If needed, GetVolumeGroupSnapshot CSI function will be called to retrieve individual snapshot statuses until all snapshots are ready to use. * CreateVolumeGroupSnapshot CSI function response - * The CreateVolumeGroupSnapshot CSI function should return a list of snapshots (Snapshot message defined in CSI Spec) in its response. The VolumeGroupSnapshot controller can use the returned list of snapshots to update corresponding individual VolumeSnapshotContents, wait for VolumeSnapshots and VolumeSnapshotContents to be bound, and update SnapshotList in the VolumeGroupSnapshot Status and SnapshotContentList in the VolumeGroupSnapshotContent Status. + * The CreateVolumeGroupSnapshot CSI function should return a list of snapshots (Snapshot message defined in CSI Spec) in its response. The VolumeGroupSnapshot controller can use the returned list of snapshots to construct corresponding individual VolumeSnapshotContents and VolumeSnapshots, wait for VolumeSnapshots and VolumeSnapshotContents to be bound, and update SnapshotList in the VolumeGroupSnapshot Status and SnapshotContentList in the VolumeGroupSnapshotContent Status. apiVersion: snapshot.storage.k8s.io/v1 ``` @@ -256,21 +163,14 @@ kind: VolumeSnapshot metadata: name: snapshot1 spec: - volumeSnapshotClassName: snapClass1 source: - persistentVolumeClaimName: pvc1 + persistentVolumeClaimName: vsc1 status: volumeGroupSnapshotName: groupSnapshot1 ``` * An admissions controller or finalizer should be added to prevent an individual snapshot from being deleted that belongs to a VolumeGroupSnapshot. -* Since some storage systems require individual snapshots while others can only return a single group snapshot but not individual snapshots, we propose a two phase solution. - * In Phase 1, since we do not support creating a VolumeGroup directly from a VolumeGroupSnapshot, it is required for individual snapshots to be returned along with the group snapshot. - * In Phase 2, we plan to support creating a VolumeGroup directly from a VolumeGroupSnapshot. We propose the following solution for Phase 2: - * In VolumeGroupSnapshotStatus, if ReadyToUse is true and SnapshotList is empty, the VolumeGroupSnapshot Controller assumes the storage system does not return individual snapshots. - * If ReadyToUse is true and SnapshotList is not empty, the VolumeGroupSnapshot Controller knows there are individual snapshots created for this group. Those individual snapshots may be used as readonly, but they cannot be removed from the VolumeGroupSnapshot. - * In the CSI Spec, this means repeated .csi.v1.Snapshot snapshots in VolumeGroupSnapshot message from CreateVolumeGroupSnapshotResponse should be optional, not required. - * How to use the VolumeGroupSnapshot if individual snapshots are not returned? How can we create a volume from a snapshot if there are no individual snapshots? `snapshots` is optional while `group_snapshot_id` is required in VolumeGroupSnapshot message in CSI so it is fine to only specify `group_snapshot_id` not `snapshots` when creating a VolumeGroup from a VolumeGroupSnapshot. However, CSI Driver MUST return a list of `volumes` that are restored in `CreateVolumeGroupResponse`. +* In the CSI spec, it is specified that it is required for individual snapshots to be returned along with the group snapshot. #### Pre-provisioned VolumeGroupSnapshot @@ -280,26 +180,21 @@ Admin will retrieve all volumeSnapshotHandles in the Volume Group Snapshot from ### Delete VolumeGroupSnapshot -A VolumeGroupSnapshot can be deleted if the CSI driver supports the CREATE_DELETE_GROUP_SNAPSHOT capability. -* When a VolumeGroupSnapshot is deleted, the VolumeGroupSnapshot controller will call the DeleteVolumeGroupSnapshot CSI function as well as DeleteSnapshot CSI functions. - * Since CSI driver handles individual snapshot creation in CreateVolumeGroupSnapshot, it should handle individual snapshot deletion in DeleteVolumeGroupSnapshot. +A VolumeGroupSnapshot can be deleted if the CSI driver supports the CREATE_DELETE_GET_VOLUME_GROUP_SNAPSHOT capability. +* When a VolumeGroupSnapshot is deleted, the VolumeGroupSnapshot controller will call the DeleteVolumeGroupSnapshot CSI function which will delete individual snapshots as well. + * Since CSI driver handles individual snapshot creation in CreateVolumeGroupSnapshot, it should handle individual snapshot deletion in DeleteVolumeGroupSnapshot as well. DeleteSnapshot CSI function will not be called. * DeleteSnapshot on a single snapshot that belongs to a group snapshot is not allowed. ### Restore Restore can be done as follows: -Phase 1: - -* A new empty volume group can be created first, and then a new volume can be created from a snapshot one by one and added to the volume group. This can be repeated for all the snapshots in the VolumeGroupSnapshot. - -Phase 2: +A new volume can be created from a snapshot. This can be repeated for all the snapshots in the VolumeGroupSnapshot. -* A VolumeGroup can be created from a VolumeGroupSnapshot or VolumeGroup source in one step. This is the same as what is described in the section `Create VolumeGroup from VolumeGroupSnapshot or another VolumeGroup`. ### Risks and Mitigations -This feature requires coordination between several controllers including the newly proposed volume group and group snapshot controller and existing external-provisioner and external-snapshotter components. We will introduce this feature as alpha and add tests to make sure it works properly. +This feature requires coordination between several controllers including the newly proposed volume group snapshot controller and existing external-snapshotter components. We will introduce this feature as alpha and add tests to make sure it works properly. ## Design Details @@ -311,21 +206,20 @@ This feature requires coordination between several controllers including the new N/A ##### Unit tests -* Unit tests for external volume group and group snapshot controller. -* Unit tests for modified code path of external-provisioner and external-snapshotter. +* Unit tests for external volume group snapshot controller. +* Unit tests for modified code path of external-snapshotter. ##### Integration tests Integration tests are not needed. ##### e2e tests -* e2e tests for external volume group and group snapshot controller. -* e2e tests for modified code path of external-provisioner and external-snapshotter. +* e2e tests for external volume group snapshot controller. +* e2e tests for modified code path of external-snapshotter. * Add stress and scale tests before moving from beta to GA. ### Graduation Criteria #### Alpha * Initial feature implementation, including: - * Volume group. * Volume group snapshot. * Sample implementation in the csi-driver-host-path. * Reviews from vendors whose storage systems can support this feature. @@ -335,8 +229,8 @@ Integration tests are not needed. * Unit tests and e2e tests outlined in design proposal implemented. #### Beta -> GA -* Volume group and group snapshot support is added to multiple CSI drivers. -* Volume group and group snapshot feature deployed in production and have gone through at least one K8s upgrade. +* Volume group snapshot support is added to multiple CSI drivers. +* Volume group snapshot feature deployed in production and have gone through at least one K8s upgrade. #### Deprecation -External controllers handling volume group and group snapshot are additional sidecars deployed with the CSI driver. External-snapshotter and external-provisioner components will be updated to use the newer version that supports this feature. Upgrade should be fine as long as all the components are updated accordingly. Before downgrade, newly created volume groups and group snapshots which depend on the new CRDs should be deleted. +External controller handling volume group snapshot is additional sidecar deployed with the CSI driver. External-snapshotter components will be updated to use the newer version that supports this feature. Upgrade should be fine as long as all the components are updated accordingly. Before downgrade, newly created volume group snapshots which depend on the new CRDs should be deleted. ### Version Skew Strategy @@ -376,24 +270,24 @@ enhancement: - Will any other components on the node change? For example, changes to CSI, CRI or CNI may require updating that component before the kubelet. --> -The enhancement only affects the control plane but there are multiple components involved. If the controllers are updated to support this feature but the CSI driver itself does not support it, the `Ready` status of a new VolumeGroup API object will stay `false`. +The enhancement only affects the control plane but there are multiple components involved. If the controllers are updated to support this feature but the CSI driver itself does not support it, the `Ready` status of a new VolumeGroupSnapshot API object will stay `false`. ### API Definitions API definitions are as follows: -#### VolumeGroupClass +#### VolumeGroupSnapshotClass ``` -type VolumeGroupClass struct { +type VolumeGroupSnapshotClass struct { metav1.TypeMeta // +optional metav1.ObjectMeta - - // Driver is the driver expected to handle this VolumeGroupClass. + + // Driver is the driver expected to handle this VolumeGroupSnapshotClass. // This value may not be empty. Driver string - + // Parameters hold parameters for the driver. // These values are opaque to the system and are passed directly // to the driver. @@ -401,307 +295,112 @@ type VolumeGroupClass struct { Parameters map[string]string // +optional - VolumeGroupDeletionPolicy *VolumeGroupDeletionPolicy - - // This field specifies whether group snapshot is supported. - // The default is false. - // +optional - SupportVolumeGroupSnapshot *bool + VolumeGroupSnapshotDeletionPolicy *VolumeGroupSnapshotDeletionPolicy } -// VolumeGroupDeletionPolicy describes a policy for end-of-life maintenance of -// volume group contents -type VolumeGroupDeletionPolicy string +// VolumeGroupSnapshotDeletionPolicy describes a policy for end-of-life maintenance of +// volume group snapshot contents +type VolumeGroupSnapshotDeletionPolicy string const ( - // VolumeGroupContentDelete means the group will be deleted from the - // underlying storage system on release from its volume group. - VolumeGroupContentDelete VolumeGroupDeletionPolicy = "Delete" + // VolumeGroupSnapshotContentDelete means the group snapshot will be deleted from the + // underlying storage system on release from its volume group snapshot. + VolumeGroupSnapshotContentDelete VolumeGroupSnapshotDeletionPolicy = "Delete" - // VolumeGroupContentRetain means the group will be left in its current - // state on release from its volume group. - VolumeGroupContentRetain VolumeGroupDeletionPolicy = "Retain" + // VolumeGroupSnapshotContentRetain means the group snapshot will be left in its current + // state on release from its volume group snapshot. + VolumeGroupSnapshotContentRetain VolumeGroupSnapshotDeletionPolicy = "Retain" ) + ``` -#### VolumeGroup +#### VolumeGroupSnapshot ``` -// VolumeGroup is a user's request for a group of volumes -type VolumeGroup struct { - metav1.TypeMeta +// VolumeGroupSnapshot is a user's request for taking a group snapshot. +type VolumeGroupSnapshot struct { + metav1.TypeMeta `json:",inline"` + // Standard object's metadata. // +optional - metav1.ObjectMeta + metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` - // Spec defines the volume group requested by a user - Spec VolumeGroupSpec + // Spec defines the desired characteristics of a group snapshot requested by a user. + Spec VolumeGroupSnapshotSpec `json:"spec" protobuf:"bytes,2,opt,name=spec"` - // Status represents the current information about a volume group + // Status represents the latest observed state of the group snapshot // +optional - Status *VolumeGroupStatus + Status *VolumeGroupSnapshotStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"` } -// VolumeGroupSpec describes the common attributes of group storage devices -// and allows a Source for provider-specific attributes -Type VolumeGroupSpec struct { +// VolumeGroupSnapshotSpec describes the common attributes of a group snapshot +type VolumeGroupSnapshotSpec struct { // +optional - VolumeGroupClassName *string + VolumeSnapshotClassName *string - // Source has the information about where the group is created from. - // Required. - Source VolumeGroupSource -} + // A label query over persistent volume claims to be grouped together + // for snapshotting. + // This labelSelector will be used to match the label added to a PVC. + Selector *metav1.LabelSelector -// VolumeGroupSource contains several options. -// OneOf the options must be defined. -Type VolumeGroupSource struct { + // VolumeGroupSnapshotSecretRef is a reference to the secret object containing + // sensitive information to pass to the CSI driver to complete the CSI + // calls for VolumeGroupSnapshots. + // This field is optional, and may be empty if no secret is required. If the + // secret object contains more than one secret, all secrets are passed. // +optional - // Pre-provisioned VolumeGroup - VolumeGroupContentName *string + VolumeGroupSnapshotSecretRef *SecretReference + // A list of VolumeSecrets + // This field is only needed if per volume secret is different from + // VolumeGroupSnapshotSecretRef // +optional - // Dynamically provisioned VolumeGroup - // A label query over persistent volume claims to be added to the volume group. - // This labelSelector will be used to match the label added to a PVC. - // In Phase 1, when the label is added to PVC, the PVC will be added to the matching group. - // In Phase 2, this labelSelector will be used to find all PVCs with matching label and add them to the group when the group is being created. - Selector *metav1.LabelSelector + VolumeSecretRefList []VolumeSecret +} - // Phase 2 - // +optional - // Dynamically provisioned VolumeGroup - // This field specifies the source of a volume group. (this is for restore) - // Supported Kind is VolumeGroupSnapshot or VolumeGroup - // GroupDataSource *TypedLocalObjectReference - } +Type VolumeSecret { + // Name of a PVC + Name string -type VolumeGroupStatus struct { + // VolumeSecretRef is a reference to the secret object containing + // sensitive information to pass to the CSI driver to complete the CSI + // calls for VolumeGroupSnapshots. + // This field is optional, and may be empty if no secret is required. If the + // secret object contains more than one secret, all secrets are passed. + VolumeSecretRef *SecretReference +} + +Type VolumeGroupSnapshotStatus struct { // +optional - BoundVolumeGroupContentName *string + BoundVolumeGroupSnapshotContentName *string + // ReadyToUse becomes true when ReadyToUse on all individual snapshots become true // +optional - GroupCreationTime *metav1.Time + ReadyToUse *bool - // A list of persistent volume claims // +optional - PVCList []PersistentVolumeClaim + CreationTime *metav1.Time // +optional - Ready *bool + Error *VolumeGroupSnapshotError - // Last error encountered during group creation - // +optional - Error *VolumeGroupError + // List of volume snapshots + // +optional + SnapshotList []VolumeSnapshot } -// Describes an error encountered on the group -type VolumeGroupError struct { - // time is the timestamp when the error was encountered. - // +optional - Time *metav1.Time - - // message details the encountered error - // +optional - Message *string +// Describes an error encountered on the group snapshot +type VolumeGroupSnapshotError struct { + // time is the timestamp when the error was encountered. + // +optional + Time *metav1.Time + + // message details the encountered error + // +optional + Message *string } ``` -#### VolumeGroupContent - -``` -// VolumeGroupContent represents a group of volumes on the storage backend -type VolumeGroupContent struct { - metav1.TypeMeta - // +optional - metav1.ObjectMeta - - // Spec defines the volume group requested by a user - Spec VolumeGroupContentSpec - - // Status represents the current information about a volume group - // +optional - Status *VolumeGroupContentStatus -} - -// VolumeGroupContentSpec -Type VolumeGroupContentSpec struct { - // +optional - VolumeGroupClassName *string - - // +optional - // VolumeGroupRef is part of a bi-directional binding between VolumeGroup and VolumeGroupContent. - VolumeGroupRef *core_v1.ObjectReference - - // +optional - Source *VolumeGroupContentSource - - // +optional - VolumeGroupDeletionPolicy *VolumeGroupDeletionPolicy - - // This field specifies whether group snapshot is supported. - // The default is false. - // +optional - SupportVolumeGroupSnapshot *bool - - // VolumeGroupSecretRef is a reference to the secret object containing - // sensitive information to pass to the CSI driver to complete the CSI - // calls for VolumeGroups. - // This field is optional, and may be empty if no secret is required. If the - // secret object contains more than one secret, all secrets are passed. - // +optional - VolumeGroupSecretRef *SecretReference -} - -// VolumeGroupContentSource -Type VolumeGroupContentSource struct { - // Required - Driver string - - // VolumeGroupHandle is the unique volume group name returned by the - // CSI volume plugin’s CreateVolumeGroup to refer to the volume group on - // all subsequent calls. - // Required. - VolumeGroupHandle string - - // +optional - // Attributes of the volume group to publish. - VolumeGroupAttributes map[string]string -} - -type VolumeGroupContentStatus struct { - // +optional - GroupCreationTime *metav1.Time - - // A list of persistent volumes - // +optional - PVList []PersistentVolume - - // +optional - Ready *bool - - // Last error encountered during group creation - // +optional - Error *VolumeGroupError -} -``` - -#### VolumeGroupSnapshotClass - -``` -type VolumeGroupSnapshotClass struct { - metav1.TypeMeta - // +optional - metav1.ObjectMeta - - // Driver is the driver expected to handle this VolumeGroupSnapshotClass. - // This value may not be empty. - Driver string - - // Parameters hold parameters for the driver. - // These values are opaque to the system and are passed directly - // to the driver. - // +optional - Parameters map[string]string - - // +optional - VolumeGroupSnapshotDeletionPolicy *VolumeGroupSnapshotDeletionPolicy -} - -// VolumeGroupSnapshotDeletionPolicy describes a policy for end-of-life maintenance of -// volume group snapshot contents -type VolumeGroupSnapshotDeletionPolicy string - -const ( - // VolumeGroupSnapshotContentDelete means the group snapshot will be deleted from the - // underlying storage system on release from its volume group snapshot. - VolumeGroupSnapshotContentDelete VolumeGroupSnapshotDeletionPolicy = "Delete" - - // VolumeGroupSnapshotContentRetain means the group snapshot will be left in its current - // state on release from its volume group snapshot. - VolumeGroupSnapshotContentRetain VolumeGroupSnapshotDeletionPolicy = "Retain" -) - -``` - -#### VolumeGroupSnapshot - -``` -// VolumeGroupSnapshot is a user's request for taking a group snapshot. -type VolumeGroupSnapshot struct { - metav1.TypeMeta `json:",inline"` - // Standard object's metadata. - // +optional - metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` - - // Spec defines the desired characteristics of a group snapshot requested by a user. - Spec VolumeGroupSnapshotSpec `json:"spec" protobuf:"bytes,2,opt,name=spec"` - - // Status represents the latest observed state of the group snapshot - // +optional - Status *VolumeGroupSnapshotStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"` -} - -// VolumeGroupSnapshotSpec describes the common attributes of a group snapshot -type VolumeGroupSnapshotSpec struct { - // +optional - VolumeSnapshotClassName *string - - // Source has the information about where the group snapshot is created from. - // Required. - Source VolumeGroupSnapshotSource - - // VolumeGroupSnapshotSecretRef is a reference to the secret object containing - // sensitive information to pass to the CSI driver to complete the CSI - // calls for VolumeGroupSnapshots. - // This field is optional, and may be empty if no secret is required. If the - // secret object contains more than one secret, all secrets are passed. - // +optional - VolumeGroupSnapshotSecretRef *SecretReference -} - -// OneOf VolumeGroupName or VolumeGroupSnapshotContentName -Type VolumeGroupSnapshotSource struct { - // +optional - // Dynamically provisioned VolumeGroupSnapshot - VolumeGroupName *string - - // +optional - // Pre-provisioned VolumeGroupSnapshot - VolumeGroupSnapshotContentName *string -} - -Type VolumeGroupSnapshotStatus struct { - // +optional - BoundVolumeGroupSnapshotContentName *string - - // ReadyToUse becomes true when ReadyToUse on all individual snapshots become true - // +optional - ReadyToUse *bool - - // +optional - CreationTime *metav1.Time - - // +optional - Error *VolumeGroupSnapshotError - - // List of volume snapshots - // +optional - SnapshotList []VolumeSnapshot -} - -// Describes an error encountered on the group snapshot -type VolumeGroupSnapshotError struct { - // time is the timestamp when the error was encountered. - // +optional - Time *metav1.Time - - // message details the encountered error - // +optional - Message *string -} -``` - -#### VolumeGroupSnapshotContent +#### VolumeGroupSnapshotContent ``` // VolumeGroupSnapshotContent @@ -724,7 +423,7 @@ type VolumeGroupSnapshotContentSpec struct { // Required // VolumeGroupSnapshotRef specifies the VolumeGroupSnapshot object // to which this VolumeGroupSnapshotContent object is bound. - VolumeGroupSnapshotRef core_v1.ObjectReference + VolumeGroupSnapshotRef core_v1.ObjectReference // Required VolumeGroupSnapshotDeletionPolicy VolumeGroupSnapshotDeletionPolicy @@ -774,10 +473,6 @@ Type VolumeGroupSnapshotContentStatus struct { } ``` -#### PersistentVolumeClaim and PersistentVolume - -For PersistentVolumeClaim, the user can request it to be added to a VolumeGroup by adding the same label specified by the labelSelector in the VolumeGroup. In the initial phase, no changes will be proposed to PersistentVolumeClaim and PersistentVolume API objects. Before moving to Beta, we will re-evaluate this. - #### VolumeSnapshot and VolumeSnapshotContent For VolumeSnapshot, we cannot request a VolumeSnapshot to be added to be VolumeGroupSnapshot, therefore VolumeGroupSnapshotName is only in the Status but not in the Spec. @@ -800,54 +495,6 @@ type VolumeSnapshotContentStatus struct{ ### Example Yaml Files -#### Create Volume Group - -Example yaml files to create a VolumeGroupClass and a VolumeGroup are in the following. - -Create a VolumeGroupClass that supports volumeGroupSnapshot: -``` -apiVersion: volumegroup.storage.k8s.io/v1alpha1 -kind: VolumeGroupClass -metadata: - name: volumeGroupClass1 -spec: - parameters: - …... - supportVolumeGroupSnapshot: true -``` - -Create a VolumeGroup belongs to this VolumeGroupClass: -``` -apiVersion: volumegroup.storage.k8s.io/v1alpha1 -kind: VolumeGroup -metadata: - Name: volumeGroup1 -spec: - volumeGroupClassName: volumeGroupClass1 -``` - -#### Add PVC to VolumeGroup - -Create a PVC that belongs to the volume group which supports volumeGroupSnapshot: -``` -apiVersion: v1 -kind: PersistentVolumeClaim -metadata: - name: pvc1 - labels: - volumegroup:myApp -spec: - accessModes: - - ReadWriteOnce - dataSource: null - resources: - requests: - storage: 1Gi - storageClassName: storageClass1 - volumeMode: Filesystem - volumeGroupNames: [volumeGroup1] -``` - #### Create VolumeGroupSnapshot Create a VolumeGroupSnapshotClass: @@ -861,108 +508,47 @@ spec: …... ``` -A VolumeGroupSnapshot taken from the VolumeGroup dynamically: +A VolumeGroupSnapshot taken from multiple volumes dynamically: ``` apiVersion: volumegroup.storage.k8s.io/v1alpha1 kind: VolumeGroupSnapshot metadata: name: my-group-snapshot spec: - source: - volumeGroupName: volumeGroup1 + selector: + myapp: postgresql volumeGroupSnapshotClassName: volumeGroupSnapshotClass1 ``` -A new external VolumeGroup controller will handle VolumeGroupClass, VolumeGroup, and VolumeGroupContent resources. We may need to split this into two controllers, one common controller that handles common functions such as binding, and one sidecar controller that calls the CSI driver. - -External provisioner will be modified to read information from volume groups (through volumeGroupNames) and pass them down to the CSI driver. - A new external VolumeGroupSnapshot controller will handle VolumeGroupSnapshotClass, VolumeGroupSnapshot, and VolumeGroupSnapshotContent resources. We may need to split this into two controllers, one common controller that handles common functions such as binding, and one sidecar controller that calls the CSI driver. -Snapshot controller will be modified to update VolumeSnapshot status. External snapshotter sidecar will be modified to update VolumeSnapshotContent status. +Snapshot controller will be modified so that it will not delete an indiviual VolumeSnapshot that is part of a VolumeGroupSnapshot. External snapshotter sidecar will be modified so that it will not delete an individual VolumeSnapshotContent that is part of a VolumeGroupSnapshotContent. ### CSI Changes #### CSI Capabilities -New controller capabilities CREATE_DELETE_VOLUME_GROUP, VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME, CREATE_DELETE_GROUP_SNAPSHOT, INDIVIDUAL_SNAPSHOT_RESTORE, GET_VOLUME_GROUP, GET_VOLUME_GROUP_SNAPSHOT, LIST_VOLUME_GROUPS, LIST_VOLUME_GROUP_SNAPSHOTS will be added. +A new group controller service will be added with a new controller capability CREATE_DELETE_GET_VOLUME_GROUP_SNAPSHOT. -* CREATE_DELETE_VOLUME_GROUP: - Indicates that the controller plugin supports creating and deleting a volume group. +* CREATE_DELETE_GET_VOLUME_GROUP_SNAPSHOT: + Indicates that the controller plugin supports creating, deleting, and getting details of a snapshot of + multiple volumes. -* VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME: - Indicates that the controller plugin supports adding an existing volume to a volume - group and removing a volume from a volume group without deleting it. - -* CREATE_DELETE_GROUP_SNAPSHOT: - Indicates that the controller plugin supports creating a snapshot of all volumes - in a volume group. - -* INDIVIDUAL_SNAPSHOT_RESTORE: - Indicates whether the controller plugin supports creating a volume from an - individual volume snapshot if the volume snapshot is part of a - VolumeGroupSnapshot. Use cases: selective restore, advanced recovery, etc. - Note: In Phase 1, this is the only way to restore after taking a group snapshot. - User can create a volume from a volume snapshot for all the individual snapshots - created along the group snapshot. - -* GET_VOLUME_GROUP: - Indicates that the controller plugin supports getting details of a volume group. - -* GET_VOLUME_GROUP_SNAPSHOT: - Indicates that the controller plugin supports getting details of a volume group snapshot. - -* LIST_VOLUME_GROUPS: - Indicates that the controller plugin supports getting details of a list of volume groups. - -* LIST_VOLUME_GROUP_SNAPSHOTS: - Indicates that the controller plugin supports getting details of a list of volume group snapshots. - -#### CSI Controller RPC +#### CSI Group Controller RPC ``` service Controller { … - rpc CreateVolumeGroup(CreateVolumeGroupRequest) - returns (CreateVolumeGroupResponse) { - option (alpha_method) = true; - } - rpc CreateVolumeGroupSnapshot(CreateVolumeGroupSnapshotRequest) returns (CreateVolumeGroupSnapshotResponse) { option (alpha_method) = true; } - rpc ModifyVolumeGroup(ModifyVolumeGroupRequest) - returns (ModifyVolumeGroupResponse) { - option (alpha_method) = true; - } - - rpc DeleteVolumeGroup(DeleteVolumeGroupRequest) - returns (DeleteVolumeGroupResponse) } - option (alpha_method) = true; - } - rpc DeleteVolumeGroupSnapshot(DeleteVolumeGroupSnapshotRequest) returns (DeleteVolumeGroupSnapshotResponse) { option (alpha_method) = true; } - rpc ListVolumeGroups(ListVolumeGroupsRequest) - returns (ListVolumeGroupsResponse) { - option (alpha_method) = true; - } - - rpc ListVolumeGroupSnapshots(ListVolumeGroupSnapshotsRequest) - returns (ListVolumeGroupSnapshotsResponse) { - option (alpha_method) = true; - } - - rpc GetVolumeGroup(GetVolumeGroupRequest) - returns (GetVolumeGroupResponse) { - option (alpha_method) = true; - } - rpc GetVolumeGroupSnapshot(GetVolumeGroupSnapshotRequest) returns (GetVolumeGroupSnapshotResponse) { option (alpha_method) = true; @@ -971,254 +557,59 @@ service Controller { } ``` -#### CreateVolumeGroup - -This RPC will be called by the CO to create a new volume group on behalf of a user. -This operation MUST be idempotent. If a volume group corresponding to the specified volume group name already exists, is compatible with the specified parameters in the CreateVolumeGroupRequest, the Plugin MUST reply 0 OK with the corresponding CreateVolumeGroupResponse. -CSI Plugins MAY create the following types of volume groups: - -* Create a new empty volume group. - * After the empty group is created, create a new volume, specifying the group name in the volume. -* At restore time, create a single volume from individual snapshot and then join an existing group. - * Create an empty group. - * Create a volume from snapshot, specifying the group name in the volume. -* Phase 2: Create a new volume group and add a list of existing volumes to the group by querying a label on PVCs. The label is specified by the labelSelector in the volume group. -* Phase 2: Create a new volume group from a source group snapshot or another group. - -The following is non-goal: -* Non goal: Create a new group and at the same time create a list of new volumes in the group. +#### CreateVolumeGroupSnapshot -In `VolumeGroupSnapshot` message, `snapshots` is an optional field while `group_snapshot_id` is a required field. It is fine to only specify `group_snapshot_id` but not `snapshots` in `VolumeGroupSnapshot` message at restore time. -However, the Plugin MUST return a list of volumes that are restored in `CreateVolumeGroupResponse`. +The purpose of this call is to request the creation of a multi-volume snapshot. Group snapshots can be created from multiple volumes. Note that calls to this function must be idempotent - the function may be called multiple times for the same name - the group snapshot must only be created once. ``` -message CreateVolumeGroupRequest { +message CreateVolumeGroupSnapshotRequest { option (alpha_message) = true; - // suggested name for volume group (required for idempotency) - // This field is REQUIRED. + // The suggested name for the group snapshot. This field is REQUIRED + // for idempotency. + // Any Unicode string that conforms to the length limit is allowed + // except those containing the following banned characters: + // U+0000-U+0008, U+000B, U+000C, U+000E-U+001F, U+007F-U+009F. + // (These are control characters other than commonly used whitespace.) string name = 1; - // params passed from VolumeGroupClass - // This field is OPTIONAL. - map parameters = 2; + // volume ids of the source volumes to be snapshotted together. + // This field is REQUIRED. + repeated string source_volume_ids = 2; - // Secrets required by plugin to complete volume group creation request. + // Secrets required by plugin to complete + // ControllerCreateVolumeGroupSnapshot request. // This field is OPTIONAL. Refer to the `Secrets Requirements` // section on how to use this field. + // The secrets provided in this field SHOULD be the same as + // the secrets provided in ControllerDeleteVolumeGroupSnapshot + // and ControllerGetVolumeGroupSnapshot requests for the same + // group snapshot unless if secrets are rotated after the + // group snapshot is created. map secrets = 3 [(csi_secret) = true]; - // Phase 2 - // If specified, a volume group will be created from the source group snapshot. - // This field is OPTIONAL. - // VolumeGroupSnapshot source_volume_group_snapshot = 4; - - // Phase 2 - // If specified, a volume group will be created from a list of existing volumes. - // This field is OPTIONAL. - // repeated string volume_id = 5; -} - -message CreateVolumeGroupResponse { - option (alpha_message) = true; - - // Contains all attributes of the newly created volume group. - // This field is REQUIRED. - VolumeGroup volume_group = 1; -} - -message VolumeGroup { - option (alpha_message) = true; - - // The identifier for this volume group, generated by the plugin. - // This field is REQUIRED. - string volume_group_id = 1; - - // Opaque static properties of the volume group. + // Volume secrets required by plugin to complete volume group + // snapshot creation request. This field is needed in case the + // volume level secrets are different from the above secrets + // for the group snapshot. // This field is OPTIONAL. - map volume_group_context = 2; - - // Underlying volumes in this group. The same definition in CSI Volume. - // This field is REQUIRED. - // To support the creation of an empty group, this list can be empty. - // However, this field is not empty in the following cases: - // - Response from ListVolumeGroups or GetVolumeGroup if the VolumeGroup is not empty. - // - Response from ModifyVolumeGroup if the VolumeGroup is not empty after modification. - // - Phase 2: Create a new volume group from a source group snapshot. - // - Phase 2: Create a new volume group and add a list of existing volumes to the group. - repeated .csi.v1.Volume volumes = 3; -} -``` - -#### CreateVolume - -1. When a new volume is created with a volume group id parameter, the volume will be created and added to the existing volume group. -2. A new volume can also be created without a volume group id parameter. It can be added to a volume group later through the ModifyVolumeGroup RPC. - -Note that for filesystems based storage systems, only option 1 can be supported. For block based storage systems. Both option 1 and 2 may be supported. However there is a possibility that option 2 will not work for consistency groups as the volume is created without the consideration of which group the volume will be placed in. CSI Spec does not determine whether a group is consistent or not. It is up to the storage provider to decide whether a consistent group can be supported or not and clarify that in vendor specific documentation. - -``` -message CreateVolumeRequest { - string name = 1; - … - repeated string volume_group_id = 8 [(alpha_field) = true]; -} -``` - -#### DeleteVolumeGroup - -``` -message DeleteVolumeGroupRequest { - option (alpha_message) = true; - - // The ID of the volume group to be deprovisioned. - // This field is REQUIRED. - string volume_group_id = 1; + repeated VolumeSecret volume_secrets = 4; - // Secrets required by plugin to complete volume group deletion request. - // This field is OPTIONAL. Refer to the `Secrets Requirements` - // section on how to use this field. - map secrets = 2 [(csi_secret) = true]; -} - -message DeleteVolumeGroupResponse { - option (alpha_message) = true; - // Intentionally empty. + // Plugin specific parameters passed in as opaque key-value pairs. + // This field is OPTIONAL. The Plugin is responsible for parsing and + // validating these parameters. COs will treat these as opaque. + map parameters = 5; } -``` - -#### ModifyVolumeGroup - -This RPC will be called by the CO to modify an existing volumegroup on behalf of a user. volume_ids provided in the ModifyVolumeGroupRequest will be compared to the ones in the existing VolumeGroup. New volume_ids in the modified VolumeGroup will be added to the VolumeGroup. Existing volume_ids not in the modified VolumeGroup will be removed from the VolumeGroup. If volume_ids is empty, the VolumeGroup will be removed of all existing volumes. This operation MUST be idempotent. - -To support ModifyVolumeGroup, the Kubernetes VolumeGroup controller will be implemented to have a desired state of the world and an actual state of the world. The desired state of the world contains VolumeGroups with the desired PVCList while the actual state of the world contains VolumeGroups with the actual PVCList. The controller will try to reconcile the two by handling adding and removing multiple PVCs through a single CSI RPC call each time. - -Note that filesystems based storage systems may not be able to support this RPC. For block based storage systems, this is a very convenient method. However, it may not satisfy the requirement for consistency as the volume is created without the knowledge of which group it is placed in. It is out of the scope of the CSI spec to determine whether a group is consistent or not. It is up to the storage provider to clarify that in the vendor specific documentation. - -CSI drivers supporting VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME MUST implement ModifyVolumeGroup RPC. - -``` -message ModifyVolumeGroupRequest { - option (alpha_message) = true; - // The ID of the volume group to be modified. +message VolumeSecret { + // ID of the volume whose secrets are provided. // This field is REQUIRED. - string volume_group_id = 1; - - // Specify volume_ids that will be in the modified volume group. - // This list will be compared with the volume_ids in the existing group. - // New ones will be added and missing ones will be removed. - // If no volume_ids are provided, all existing volumes will - // be removed from the group. - // This field is OPTIONAL. - repeated string volume_ids = 2; + string volume_id = 1; - // Secrets required by plugin to complete volume group modification request. - // This field is OPTIONAL. Refer to the `Secrets Requirements` + // Secrets required by plugin for a volume operation. + // This field is REQUIRED. Refer to the `Secrets Requirements` // section on how to use this field. - map secrets = 3 [(csi_secret) = true]; -} - -message ModifyVolumeGroupResponse { - option (alpha_message) = true; - - // Contains all attributes of the modified volume group. - // This field is REQUIRED. - VolumeGroup volume_group = 1; -} -``` - -#### ControllerGetVolumeGroup - -``` -message ControllerGetVolumeGroupRequest { - option (alpha_message) = true; - - // The ID of the volume group to fetch current volume group information for. - // This field is REQUIRED. - string volume_group_id = 1; -} - -message ControllerGetVolumeGroupResponse { - option (alpha_message) = true; - - // This field is REQUIRED - VolumeGroup volume_group = 1; -} -``` - -#### ListVolumeGroups - -``` -message ListVolumeGroupsRequest { - option (alpha_message) = true; - - // If specified (non-zero value), the Plugin MUST NOT return more - // entries than this number in the response. If the actual number of - // entries is more than this number, the Plugin MUST set `next_token` - // in the response which can be used to get the next page of entries - // in the subsequent `ListVolumeGroups` call. This field is OPTIONAL. If - // not specified (zero value), it means there is no restriction on the - // number of entries that can be returned. - // The value of this field MUST NOT be negative. - int32 max_entries = 1; - - // A token to specify where to start paginating. Set this field to - // `next_token` returned by a previous `ListVolumeGroups` call to get the - // next page of entries. This field is OPTIONAL. - // An empty string is equal to an unspecified field value. - string starting_token = 2; -} - -message ListVolumeGroupsResponse { - option (alpha_message) = true; - - message Entry { - // This field is REQUIRED - VolumeGroup volume_group = 1; - } - - repeated Entry entries = 1; - - // This token allows you to get the next page of entries for - // `ListVolumeGroups` request. If the number of entries is larger than - // `max_entries`, use the `next_token` as a value for the - // `starting_token` field in the next `ListVolumeGroups` request. This - // field is OPTIONAL. - // An empty string is equal to an unspecified field value. - string next_token = 2; -} -``` - -#### CreateVolumeGroupSnapshot - -The purpose of this call is to request the creation of a multi-volume snapshot. Group snapshots can be created from existing volume group. Note that calls to this function must be idempotent - the function may be called multiple times for the same name - the group snapshot must only be created once. - -``` -message CreateVolumeGroupSnapshotRequest { - option (alpha_message) = true; - - // suggested name for a group snapshot (required for idempotent) - // This field is REQUIRED. - string name = 1; - - // identifier indicates which volume group is used to take - // group snapshot - // This field is REQUIRED. - string source_volume_group_id = 2; - - // volume ids of the volumes in the source group. This field is REQUIRED. - // This is needed because some storage systems does not have a group persisted - // on the storage system until the time to take a group snapshot - repeated string volume_ids = 3; - - // secrets required for snapshot creation (pulled from VolumeSnapshotClass) - // This field is OPTIONAL. - map secrets = 4 [(.csi.v1.csi_secret) = true]; - - // params passed from VolumeSnapshotClass - // This field is OPTIONAL. - map parameters = 5; + map secrets = 2 [(.csi.v1.csi_secret) = true]; } message CreateVolumeGroupSnapshotResponse { @@ -1233,48 +624,23 @@ message VolumeGroupSnapshot { option (alpha_message) = true; // The identifier for this group snapshot, generated by the plugin. + // This field MUST contain enough information to uniquely identify + // this specific snapshot vs all other group snapshots supported by + // this plugin. + // This field SHALL be used by the CO in subsequent calls to refer to + // this group snapshot. + // The SP is NOT responsible for global uniqueness of + // group_snapshot_id across multiple SPs. // This field is REQUIRED. string group_snapshot_id = 1; - // A list of snapshots created. Snapshot is the same - // definition as Snapshot definition used in CSI. - // This field is OPTIONAL. - repeated .csi.v1.Snapshot snapshots = 2; - - // Identity information for the source volume group. Currently, only - // support the case that source is volume group. This field is REQUIRED. - string source_volume_group_id = 3; - - // Indicates if a list of group snapshots are ready. - // This field is REQUIRED. - bool ready_to_use = 4; - - // Timestamp when the point-in-time consistency group snapshot is taken. + // A list of snapshots created. // This field is REQUIRED. - .google.protobuf.Timestamp creation_time = 5; - - // Complete total size of the snapshots in group in bytes. The purpose of - // this field is to give CO guidance on how much space is needed to restore - // volumes from all snapshots in group. This field is OPTIONAL. - int64 size_bytes = 6; -} -``` + repeated Snapshot snapshots = 2; -#### CreateSnapshot - -``` -message CreateSnapshotRequest { - // The ID of the source volume to be snapshotted. + // Timestamp when the volume group snapshot is taken. // This field is REQUIRED. - string source_volume_id = 1; - … - string group_snapshot_name = 2 [(alpha_field) = true]; -} - -message CreateSnapshotResponse { - Snapshot snapshot = 1; - … - string group_snapshot_id = 2 [(alpha_field) = true]; + .google.protobuf.Timestamp creation_time = 3; } ``` @@ -1284,14 +650,26 @@ message CreateSnapshotResponse { message DeleteVolumeGroupSnapshotRequest { option (alpha_message) = true; - // The ID of the group snapshot to be deprovisioned. + // The ID of the group snapshot to be deleted. // This field is REQUIRED. string group_snapshot_id = 1; - // Secrets required by plugin to complete group snapshot deletion request. + // A list of snapshot ids that are part of this group snapshot. + // Some SPs require this list to delete the snapshots in the group. + // This field is REQUIRED. + repeated string snapshot_ids = 2; + + // Secrets required by plugin to complete group snapshot deletion + // request. // This field is OPTIONAL. Refer to the `Secrets Requirements` // section on how to use this field. - map secrets = 2 [(csi_secret) = true]; + // The secrets provided in this field SHOULD be the same as + // the secrets provided in ControllerCreateVolumeGroupSnapshot + // request for the same group snapshot unless if secrets are rotated + // after the group snapshot is created. + // The secrets provided in the field SHOULD be passed to both + // the group snapshot and the individual snapshot members if needed. + map secrets = 3 [(csi_secret) = true]; } message DeleteVolumeGroupSnapshotResponse { @@ -1305,9 +683,20 @@ message DeleteVolumeGroupSnapshotResponse { message ControllerGetVolumeGroupSnapshotRequest { option (alpha_message) = true; - // The ID of the group snapshot to fetch current group snapshot information for. + // The ID of the group snapshot to fetch current group snapshot + // information for. // This field is REQUIRED. string group_snapshot_id = 1; + + // Secrets required by plugin to complete + // ControllerGetVolumeGroupSnapshot request. + // This field is OPTIONAL. Refer to the `Secrets Requirements` + // section on how to use this field. + // The secrets provided in this field SHOULD be the same as + // the secrets provided in ControllerCreateVolumeGroupSnapshot + // request for the same group snapshot unless if secrets are rotated + // after the group snapshot is created. + map secrets = 2 [(csi_secret) = true]; } message ControllerGetVolumeGroupSnapshotResponse { @@ -1318,49 +707,6 @@ message ControllerGetVolumeGroupSnapshotResponse { } ``` -#### ListVolumeGroupSnapshots - -``` -message ListVolumeGroupSnapshotsRequest { - option (alpha_message) = true; - - // If specified (non-zero value), the Plugin MUST NOT return more - // entries than this number in the response. If the actual number of - // entries is more than this number, the Plugin MUST set `next_token` - // in the response which can be used to get the next page of entries - // in the subsequent `ListVolumeGroupSnapshots` call. This field is OPTIONAL. If - // not specified (zero value), it means there is no restriction on the - // number of entries that can be returned. - // The value of this field MUST NOT be negative. - int32 max_entries = 1; - - // A token to specify where to start paginating. Set this field to - // `next_token` returned by a previous `ListVolumeGroupSnapshots` call to get the - // next page of entries. This field is OPTIONAL. - // An empty string is equal to an unspecified field value. - string starting_token = 2; -} - -message ListVolumeGroupSnapshotsResponse { - option (alpha_message) = true; - - message Entry { - // This field is REQUIRED - VolumeGroupSnapshot group_snapshot = 1; - } - - repeated Entry entries = 1; - - // This token allows you to get the next page of entries for - // `ListVolumeGroupSnapshots` request. If the number of entries is larger than - // `max_entries`, use the `next_token` as a value for the - // `starting_token` field in the next `ListVolumeGroupSnapshots` request. This - // field is OPTIONAL. - // An empty string is equal to an unspecified field value. - string next_token = 2; -} -``` - ## Production Readiness Review Questionnaire ### Feature enablement and rollback @@ -1370,7 +716,7 @@ _This section must be completed when targeting alpha to a release._ * **How can this feature be enabled / disabled in a live cluster?** - [x] Other - Describe the mechanism: - The external volume group and group snapshot controllers do not have a + The external volume group snapshot controllers do not have a feature gate because they are out of tree. It is enabled when these external controller sidecars are deployed with the CSI driver. - Will enabling / disabling the feature require downtime of the control @@ -1381,20 +727,20 @@ _This section must be completed when targeting alpha to a release._ No. * **Does enabling the feature change any default behavior?** - Yes. Enabling the feature can allow a new PVC to be created and added to a VolumeGroup. Enabling the feature can also allow a VolumeSnapshot to be created as part of the VolumeSnapshotGroup. + Yes. Enabling the feature can allow a VolumeSnapshot to be created as part of the VolumeSnapshotGroup. * **Can the feature be disabled once it has been enabled (i.e. can we rollback the enablement)?** - Yes. In order to disable this feature once it has been enabled, we first need to make sure that all VolumeGroup and VolumeGroupSnapshot API objects are deleted. Then the new controllers for VolumeGroup and VolumeGroupSnapshot can be stopped/removed, and external-provisioner sidecar and external-snapshotter controller/sidecar can be downgraded to a version without this feature. + Yes. In order to disable this feature once it has been enabled, we first need to make sure that all VolumeGroupSnapshot API objects are deleted. Then the new controllers for VolumeGroupSnapshot can be stopped/removed, and external-snapshotter controller/sidecar can be downgraded to a version without this feature. -If we don't delete the VolumeGroup and VolumeGroupSnapshot API objects and CRDs but just uninstall the VolumeGroup and VolumeGroupSnapshot controllers and downgrade the other sidecars, the API objects continue to exist in the API server. User may delete an individual PVC that is part of a VolumeGroup or delete an individual VolumeSnapshot that is associated with a VolumeGroupSnapshot. After that if the user starts the controllers/sidecars again and try to use the pre-existing VolumeGroup and VolumeGroupSnapshot, they are no longer in sync with the storage system. Assume the VolumeGroup has 3 PVCs initially, but 1 got removed by user but the VolumeGroup status is not updated so it still has a record of 3. If the user now takes a group snapshot from the VolumeGroup, the storage system will return an error due to the mismatch. We could add logic to reconcile what is in the K8s API object VolumeGroup and what is on the storage system, but before the reconcile completes, the call to create a VolumeGroup will fail. +If we don't delete the VolumeGroupSnapshot API objects and CRDs but just uninstall the VolumeGroupSnapshot controllers and downgrade the other sidecars, the API objects continue to exist in the API server. User may delete an individual VolumeSnapshot that is associated with a VolumeGroupSnapshot. After that if the user starts the controllers/sidecars again, the pre-existing VolumeGroupSnapshot still has the deleted individual VolumeSnapshots in its status so it is out of sync with the storage system and provides out-dated information to the user. User can still restore individual PVCs from individual VolumeSnapshots that are not deleted, but they cannot restore PVCs from the deleted VolumeSnapshots. -If the API objects and VolumeGroup and GroupSnapshot controllers are running, but the provisioner/snapshotter sidecars are downgraded to a lower version that does not support this feature, creating a PVC and adding it to the group will not work as one step. Basically the provisioner sidecar will create a new PV but ignoring the part that adds it to the group. If the CSI driver also supports VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME capability, the VolumeGroup controller will detect an existing PVC with a matching label and will try to add the PVC to the group. If the CSI driver does not support VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME capability, the PVC will not be added to the group. In this case, PV will still work but it is not in the group so user's request isn't fully satisfied. There won't be errors in this case. The provisioner doesn't support the new feature so it ignores the label on PVC and won't add it to the group. VolumeGroup controller won't pick it up because the CSI driver does not have the capability to support adding an existing volume to a group. +If the API objects and VolumeGroupSnapshot controllers are running, but the snapshotter sidecars are downgraded to a lower version that does not support this feature, it should be fine as individual snapshots that are part of a group snapshot will be created and deleted by the VolumeGroupSnapshot controller. -If the external-provisioner and external-snapshotter sidecars which support this feature are running but VolumeGroup/GroupSnapshot controllers are not (CRDs are still installed), creating VolumeGroup or creating VolumeGroupSnapshot will not be successfully. Ready status in VolumeGroup or VolumeGroupSnapshot API objects will be false until those controllers are running again. +If the external-snapshotter sidecar which supports this feature is running but VolumeGroupSnapshot controller is not (CRDs are still installed), creating VolumeGroupSnapshot will not be successful. Ready status in VolumeGroupSnapshot API objects will be false until those controllers are running again. * **What happens if we reenable the feature if it was previously rolled back?** - We will be able to create new VolumeGroup and VolumeGroupSnapshot API objects again. + We will be able to create new VolumeGroupSnapshot API objects again. * **Are there any tests for feature enablement/disablement?** Since there is no feature gate for this feature on the external controller side and the only way to @@ -1674,3 +1020,669 @@ message ModifyVolumeRequest { } ``` External-provisioner will be modified so that modifying PVC by adding VolumeGroupName will trigger a ModifyVolume call (a new CSI controller RPC) to CSI driver. + +### VolumeGroup API Definitions + +In an earlier version of this KEP, a VolumeGroup API is introduced to group volumes together. The VolumeGroup is removed from the KEP for a simpler design that supports group snapshot. + +#### Use cases for the VolumeGroup + +* A VolumeGroup allows users to manage multiple volumes belonging to the same application together and therefore it is very useful in general. For example, it can be used to group all volumes in the same StatefulSet together and we can take a group snapshot of all the volumes in this StatefulSet. + +* For some storage systems, volumes are always managed in a group. For these storage systems, they will have to create a group for a single volume if they need to implement a create volume function in Kubernetes. Volume snapshotting, cloning, expansion, and deletion, etc. are all performed at a group level. Providing a VolumeGroup API will be very convenient for them. + +* Instead of taking individual snapshots one after another, VolumeGroup can be used as a source for taking a snapshot of all the volumes in the same volume group. This may be a storage level consistent group snapshot if the storage system supports it. For this use case, we will introduce another CRD VolumeGroupSnapshot. + +* VolumeGroup can also be used together with application snapshot. It can be a resource managed by the ApplicationSnapshot CRD. + +* Some applications may not want to use ApplicationSnapshot CRD because they don’t use Kubernetes workload APIs such as StatefulSet, Deployment, etc. Instead, they have developed their own operators. In this case it is more convenient to use VolumeGroup to manage persistent volumes used in those applications. + +* Application quiesce is time consuming. Some users may not want to do application quiesce very frequently for that reason. For example, a user may want to run weekly backups with application quiesce and nightly backups without application quiesce but with consistency group support which provides crash consistency across all volumes in the group. + +#### Future use cases for the VolumeGroup + +* VolumeGroup can be used to manage group replication or consistency group replication if the storage system supports it. Note replication is out of scope for this proposal. It is mentioned here as a potential future use case. + +* VolumeGroup can be used to manage volume placement to either spread the volumes across storage pools or stack the volumes on the same storage pool. Related KEPs proposing the concept of storage pool for volume placement is as follows: + https://github.com/kubernetes/enhancements/pull/1353 + https://github.com/kubernetes/enhancements/pull/1347 +We may not really need a VolumeGroup for this use case. A StoragePool is probably enough. This is to be determined. + +#### Proposal for VolumeGroup and VolumeGroupSnapshot + +This proposal introduces new CRDs VolumeGroupSnapshot, VolumeGroupSnapshotContent, and VolumeGroupSnapshotClass. + +##### Create VolumeGroup + +Create new VolumeGroup can be done in several ways: + +Phase 1 (Note: only Phase 1 will be covered in this KEP which is targeting Alpha in K8s v1.26): +1. Create an empty group first, then create a new PVC with the group name. This will create a new volume and add that volume to the already created group. When deleting this volume group, all volumes in the group will be deleted together with the group. A CSI driver supporting CREATE_DELETE_VOLUME_GROUP controller capability MUST implement this feature. +2. Create an empty group first, then add an existing PVC to the group one by one. A CSI driver supporting VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME MUST implement this feature. + +Phase 2 (After v1.26): +1. Create a new volume group by querying a label on existing persistent volume claims and adding them to the volume group. +2. Create a new volume group from an existing group snapshot or another volume group in one step. Design details will be added in a future KEP. + +Non-goal: Create a new empty group and in the same time create new empty PVCs and add to the new group. + +##### Delete VolumeGroup and PVC + +Deleting a volume group will delete the volume group along with all the PVCs in the group. + +An individual PVC needs to be removed from the group first before it can be deleted. A finalizer or webhook will be added that prevents an individual PVC in a group from being deleted. + +##### Modify VolumeGroup + +Modify an existing VolumeGroup: +1. Create a new volume with an existing VolumeGroup name will create a new volume and add it to the group. Option 1 of creating VolumeGroup above falls into this case. As mentioned earlier, a CSI driver supporting CREATE_DELETE_VOLUME_GROUP MUST implement this feature. +2. Add an existing volume to an existing VolumeGroup or remove a volume from a VolumeGroup. Option 2 of creating VolumeGroup above falls into this case. As mentioned earlier, a CSI driver supporting VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME MUST implement this feature. + +##### Create and Modify VolumeGroup + +VolumeGroups can be created and/or modified in several ways as described in the following. + +###### Create new PVC and add to the VolumeGroup + +* Admin creates a VolumeGroupClass, with the SupportVolumeGroupSnapshot boolean flag set to true. +* User creates a new empty VolumeGroup, specifying the above VolumeGroupClass. As a result, a new empty VolumeGroupContent will also be created and bound to the VolumeGroup. +* User creates a new PVC with an existing VolumeGroup name created above. As a result, a new PVC is created and added to VolumeGroup. VolumeGroup is modified so Status has this new PVC in PVCList. +* External-provisioner will be modified so that VolumeGroupName will be passed to the CSI driver when creating a volume. + +Only CSI drivers supporting CREATE_DELETE_VOLUME_GROUP capability can support the volume group this way. + +When a new PVC is created with the existing VolumeGroup name, the VolumeGroup will be modified and the PVC will be added to PVCList in the Status, and the VolumeGroupContent will also be modified and the PV will be added to the PVList in the Status. + +The same PVC can belong to different groups, i.e., different types of groups or different groups of the same type, if the storage system supports it. Storage system will decide whether to support this or not. If it does not support it, an INVALID_ARGUMENT error code should be returned with a message explaining why. We don't prevent it in the API or controller directly. + +###### Modify VolumeGroup with existing PVCs + +We can add an existing PVC to the group or remove a PVC from the group without deleting it. A VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME capability will be added to CSI Spec. Only CSI drivers supporting both CREATE_DELETE_VOLUME_GROUP and VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME capabilities can support the volume group this way. + +* Admin creates a VolumeGroupClass, with the SupportVolumeGroupSnapshot boolean flag set to true. +* User creates a new empty VolumeGroup, specifying the above VolumeGroupClass. A new empty VolumeGroupContent will also be created and bound to the VolumeGroup. +* Add an existing PVC to an existing VolumeGroup (VolumeGroup can be empty to start with or it can have other PVCs already) by adding a label specified by the labelSelector in the VolumeGroup to the PVC. + * The VolumeGroup name is added by user to each PVC, not by the VolumeGroup controller. The VolumeGroup controller watches PVCs and reacts to the PVC updated with a VolumeGroup name event as described in the following step. +* VolumeGroup is modified so the existing PVC is added to the PVCList in the Status. VolumeGroupContent is also modified so the PV is added to the PVList in the Status. + * Note: The VolumeGroup controller will be implemented to have a desired state + of the world and an actual state of the world. The desired state of the world + contains VolumeGroups with the desired PVCList while the actual state of the + world contains VolumeGroups with the actual PVCList. The controller will try + to reconcile the two by handling adding and removing multiple PVCs through a + single CSI ModifyVolumeGroup RPC call each time. +* External-provisioner will be modified to update the status of PVC and PV. +* VolumeGroup controller will be triggered to update the VolumeGroup Status and VolumeGroupContent Status. +* If one volume fails to be added, it should not affect it if it is used by a pod, but there will be error messages. +* Removing a PVC from a VolumeGroup will trigger the external-provisioner and the VolumeGroup controller as well. + +###### Phase 2: Create VolumeGroup from VolumeGroupSnapshot or another VolumeGroup + +This is in Phase 2 so won't be discussed in detail here. Creating a new volume group from an existing group snapshot will be supported in Phase 2 if the CSI driver supports VOLUME_GROUP_FROM_GROUP_SNAPSHOT capability. As a result, PVCs will be created from source snapshots and placed in a new volume group. + +###### Pre-provisioned VolumeGroup + +Admin can create a VolumeGroupContent, specifying an existing VolumeGroupHandle in the storage system and specifying a VolumeGroup name and namespace. Then the user creates a VolumeGroup that points to the VolumeGroupContent name. + +Admin will retrieve all volumeHandles in the VolumeGroup from the storage system, create PVs pointing to the volumeHandles. Then the user creates PVCs pointing to the PVs. + +##### Create VolumeGroupSnapshot + +A VolumeGroupSnapshot can be created with a VolumeGroup as the source if the CSI driver supports the CREATE_DELETE_GROUP_SNAPSHOT capability. + +###### Dynamic provisioning + +* Admin creates a VolumeGroupSnapshotClass. +* User creates a VolumeGroupSnapshot with a VolumeGroup as the source. +* This will trigger the VolumeGroupSnapshot controller to create a VolumeGroupSnapshotContent API object, and also call the CreateVolumeGroupSnapshot CSI function and also create multiple VolumeSnapshot API objects with VolumeGroupSnapshot name parameter in each VolumeSnapshot Status. This will trigger the creation of VolumeSnapshotContent API objects in the snapshot controller and calls to the CreateSnapshot CSI function in the CSI snapshotter sidecar. The CSI snapshotter sidecar will pass the new group_snapshot_name parameter to the CSI Driver when calling CreatSnapshot. +* When CSI driver receives CreateSnapshot request for individual snapshots with a VolumeGroupSnapshot name: + * If it knows how to create a group snapshot on the storage system, it returns (nil, nil), and leaves it to the CreateVolumeGroupSnapshot function to handle the snapshot creation. +* CreateVolumeGroupSnapshot CSI function response + * The CreateVolumeGroupSnapshot CSI function should return a list of snapshots (Snapshot message defined in CSI Spec) in its response. The VolumeGroupSnapshot controller can use the returned list of snapshots to update corresponding individual VolumeSnapshotContents, wait for VolumeSnapshots and VolumeSnapshotContents to be bound, and update SnapshotList in the VolumeGroupSnapshot Status and SnapshotContentList in the VolumeGroupSnapshotContent Status. + +apiVersion: snapshot.storage.k8s.io/v1 +``` +kind: VolumeSnapshot +metadata: + name: snapshot1 +spec: + volumeSnapshotClassName: snapClass1 + source: + persistentVolumeClaimName: pvc1 +status: + volumeGroupSnapshotName: groupSnapshot1 +``` + +* An admissions controller or finalizer should be added to prevent an individual snapshot from being deleted that belongs to a VolumeGroupSnapshot. +* Since some storage systems require individual snapshots while others can only return a single group snapshot but not individual snapshots, we propose a two phase solution. + * In Phase 1, since we do not support creating a VolumeGroup directly from a VolumeGroupSnapshot, it is required for individual snapshots to be returned along with the group snapshot. + * In Phase 2, we plan to support creating a VolumeGroup directly from a VolumeGroupSnapshot. We propose the following solution for Phase 2: + * In VolumeGroupSnapshotStatus, if ReadyToUse is true and SnapshotList is empty, the VolumeGroupSnapshot Controller assumes the storage system does not return individual snapshots. + * If ReadyToUse is true and SnapshotList is not empty, the VolumeGroupSnapshot Controller knows there are individual snapshots created for this group. Those individual snapshots may be used as readonly, but they cannot be removed from the VolumeGroupSnapshot. + * In the CSI Spec, this means repeated .csi.v1.Snapshot snapshots in VolumeGroupSnapshot message from CreateVolumeGroupSnapshotResponse should be optional, not required. + * How to use the VolumeGroupSnapshot if individual snapshots are not returned? How can we create a volume from a snapshot if there are no individual snapshots? `snapshots` is optional while `group_snapshot_id` is required in VolumeGroupSnapshot message in CSI so it is fine to only specify `group_snapshot_id` not `snapshots` when creating a VolumeGroup from a VolumeGroupSnapshot. However, CSI Driver MUST return a list of `volumes` that are restored in `CreateVolumeGroupResponse`. + +###### Pre-provisioned VolumeGroupSnapshot + +Admin can create a VolumeGroupSnapshotContent, specifying an existing VolumeGroupSnapshotHandle in the storage system and specifying a VolumeGroupSnapshot name and namespace. Then the user creates a VolumeGroupSnapshot that points to the VolumeGroupSnapshotContent name. + +Admin will retrieve all volumeSnapshotHandles in the Volume Group Snapshot from the storage system, create VolumeSnapshotContents pointing to the volumeSnapshotHandles. Then the user can create VolumeSnapshots pointing to the VolumeSnapshotContents. + +##### Delete VolumeGroupSnapshot + +A VolumeGroupSnapshot can be deleted if the CSI driver supports the CREATE_DELETE_GROUP_SNAPSHOT capability. +* When a VolumeGroupSnapshot is deleted, the VolumeGroupSnapshot controller will call the DeleteVolumeGroupSnapshot CSI function as well as DeleteSnapshot CSI functions. + * Since CSI driver handles individual snapshot creation in CreateVolumeGroupSnapshot, it should handle individual snapshot deletion in DeleteVolumeGroupSnapshot. +* DeleteSnapshot on a single snapshot that belongs to a group snapshot is not allowed. + +##### Restore + +Restore can be done as follows: + +Phase 1: + +* A new empty volume group can be created first, and then a new volume can be created from a snapshot one by one and added to the volume group. This can be repeated for all the snapshots in the VolumeGroupSnapshot. + +Phase 2: + +* A VolumeGroup can be created from a VolumeGroupSnapshot or VolumeGroup source in one step. This is the same as what is described in the section `Create VolumeGroup from VolumeGroupSnapshot or another VolumeGroup`. + +API definitions are as follows: + +#### VolumeGroupClass + +``` +type VolumeGroupClass struct { + metav1.TypeMeta + // +optional + metav1.ObjectMeta + + // Driver is the driver expected to handle this VolumeGroupClass. + // This value may not be empty. + Driver string + + // Parameters hold parameters for the driver. + // These values are opaque to the system and are passed directly + // to the driver. + // +optional + Parameters map[string]string + + // +optional + VolumeGroupDeletionPolicy *VolumeGroupDeletionPolicy + + // This field specifies whether group snapshot is supported. + // The default is false. + // +optional + SupportVolumeGroupSnapshot *bool +} + +// VolumeGroupDeletionPolicy describes a policy for end-of-life maintenance of +// volume group contents +type VolumeGroupDeletionPolicy string + +const ( + // VolumeGroupContentDelete means the group will be deleted from the + // underlying storage system on release from its volume group. + VolumeGroupContentDelete VolumeGroupDeletionPolicy = "Delete" + + // VolumeGroupContentRetain means the group will be left in its current + // state on release from its volume group. + VolumeGroupContentRetain VolumeGroupDeletionPolicy = "Retain" +) +``` + +#### VolumeGroup + +``` +// VolumeGroup is a user's request for a group of volumes +type VolumeGroup struct { + metav1.TypeMeta + // +optional + metav1.ObjectMeta + + // Spec defines the volume group requested by a user + Spec VolumeGroupSpec + + // Status represents the current information about a volume group + // +optional + Status *VolumeGroupStatus +} + +// VolumeGroupSpec describes the common attributes of group storage devices +// and allows a Source for provider-specific attributes +Type VolumeGroupSpec struct { + // +optional + VolumeGroupClassName *string + + // Source has the information about where the group is created from. + // Required. + Source VolumeGroupSource +} + +// VolumeGroupSource contains several options. +// OneOf the options must be defined. +Type VolumeGroupSource struct { + // +optional + // Pre-provisioned VolumeGroup + VolumeGroupContentName *string + + // +optional + // Dynamically provisioned VolumeGroup + // A label query over persistent volume claims to be added to the volume group. + // This labelSelector will be used to match the label added to a PVC. + // In Phase 1, when the label is added to PVC, the PVC will be added to the matching group. + // In Phase 2, this labelSelector will be used to find all PVCs with matching label and add them to the group when the group is being created. + Selector *metav1.LabelSelector + + // Phase 2 + // +optional + // Dynamically provisioned VolumeGroup + // This field specifies the source of a volume group. (this is for restore) + // Supported Kind is VolumeGroupSnapshot or VolumeGroup + // GroupDataSource *TypedLocalObjectReference + } + +type VolumeGroupStatus struct { + // +optional + BoundVolumeGroupContentName *string + + // +optional + GroupCreationTime *metav1.Time + + // A list of persistent volume claims + // +optional + PVCList []PersistentVolumeClaim + + // +optional + Ready *bool + + // Last error encountered during group creation + // +optional + Error *VolumeGroupError +} + +// Describes an error encountered on the group +type VolumeGroupError struct { + // time is the timestamp when the error was encountered. + // +optional + Time *metav1.Time + + // message details the encountered error + // +optional + Message *string +} +``` + +#### VolumeGroupContent + +``` +// VolumeGroupContent represents a group of volumes on the storage backend +type VolumeGroupContent struct { + metav1.TypeMeta + // +optional + metav1.ObjectMeta + + // Spec defines the volume group requested by a user + Spec VolumeGroupContentSpec + + // Status represents the current information about a volume group + // +optional + Status *VolumeGroupContentStatus +} + +// VolumeGroupContentSpec +Type VolumeGroupContentSpec struct { + // +optional + VolumeGroupClassName *string + + // +optional + // VolumeGroupRef is part of a bi-directional binding between VolumeGroup and VolumeGroupContent. + VolumeGroupRef *core_v1.ObjectReference + + // +optional + Source *VolumeGroupContentSource + + // +optional + VolumeGroupDeletionPolicy *VolumeGroupDeletionPolicy + + // This field specifies whether group snapshot is supported. + // The default is false. + // +optional + SupportVolumeGroupSnapshot *bool + + // VolumeGroupSecretRef is a reference to the secret object containing + // sensitive information to pass to the CSI driver to complete the CSI + // calls for VolumeGroups. + // This field is optional, and may be empty if no secret is required. If the + // secret object contains more than one secret, all secrets are passed. + // +optional + VolumeGroupSecretRef *SecretReference +} + +// VolumeGroupContentSource +Type VolumeGroupContentSource struct { + // Required + Driver string + + // VolumeGroupHandle is the unique volume group name returned by the + // CSI volume plugin’s CreateVolumeGroup to refer to the volume group on + // all subsequent calls. + // Required. + VolumeGroupHandle string + + // +optional + // Attributes of the volume group to publish. + VolumeGroupAttributes map[string]string +} + +type VolumeGroupContentStatus struct { + // +optional + GroupCreationTime *metav1.Time + + // A list of persistent volumes + // +optional + PVList []PersistentVolume + + // +optional + Ready *bool + + // Last error encountered during group creation + // +optional + Error *VolumeGroupError +} +``` + +#### VolumeGroupSnapshotClass + +``` +type VolumeGroupSnapshotClass struct { + metav1.TypeMeta + // +optional + metav1.ObjectMeta + + // Driver is the driver expected to handle this VolumeGroupSnapshotClass. + // This value may not be empty. + Driver string + + // Parameters hold parameters for the driver. + // These values are opaque to the system and are passed directly + // to the driver. + // +optional + Parameters map[string]string + + // +optional + VolumeGroupSnapshotDeletionPolicy *VolumeGroupSnapshotDeletionPolicy +} + +// VolumeGroupSnapshotDeletionPolicy describes a policy for end-of-life maintenance of +// volume group snapshot contents +type VolumeGroupSnapshotDeletionPolicy string + +const ( + // VolumeGroupSnapshotContentDelete means the group snapshot will be deleted from the + // underlying storage system on release from its volume group snapshot. + VolumeGroupSnapshotContentDelete VolumeGroupSnapshotDeletionPolicy = "Delete" + + // VolumeGroupSnapshotContentRetain means the group snapshot will be left in its current + // state on release from its volume group snapshot. + VolumeGroupSnapshotContentRetain VolumeGroupSnapshotDeletionPolicy = "Retain" +) + +``` + +#### VolumeGroupSnapshot + +``` +// VolumeGroupSnapshot is a user's request for taking a group snapshot. +type VolumeGroupSnapshot struct { + metav1.TypeMeta `json:",inline"` + // Standard object's metadata. + // +optional + metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` + + // Spec defines the desired characteristics of a group snapshot requested by a user. + Spec VolumeGroupSnapshotSpec `json:"spec" protobuf:"bytes,2,opt,name=spec"` + + // Status represents the latest observed state of the group snapshot + // +optional + Status *VolumeGroupSnapshotStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"` +} + +// VolumeGroupSnapshotSpec describes the common attributes of a group snapshot +type VolumeGroupSnapshotSpec struct { + // +optional + VolumeSnapshotClassName *string + + // Source has the information about where the group snapshot is created from. + // Required. + Source VolumeGroupSnapshotSource + + // VolumeGroupSnapshotSecretRef is a reference to the secret object containing + // sensitive information to pass to the CSI driver to complete the CSI + // calls for VolumeGroupSnapshots. + // This field is optional, and may be empty if no secret is required. If the + // secret object contains more than one secret, all secrets are passed. + // +optional + VolumeGroupSnapshotSecretRef *SecretReference +} + +// OneOf VolumeGroupName or VolumeGroupSnapshotContentName +Type VolumeGroupSnapshotSource struct { + // +optional + // Dynamically provisioned VolumeGroupSnapshot + VolumeGroupName *string + + // +optional + // Pre-provisioned VolumeGroupSnapshot + VolumeGroupSnapshotContentName *string +} + +Type VolumeGroupSnapshotStatus struct { + // +optional + BoundVolumeGroupSnapshotContentName *string + + // ReadyToUse becomes true when ReadyToUse on all individual snapshots become true + // +optional + ReadyToUse *bool + + // +optional + CreationTime *metav1.Time + + // +optional + Error *VolumeGroupSnapshotError + + // List of volume snapshots + // +optional + SnapshotList []VolumeSnapshot +} + +// Describes an error encountered on the group snapshot +type VolumeGroupSnapshotError struct { + // time is the timestamp when the error was encountered. + // +optional + Time *metav1.Time + + // message details the encountered error + // +optional + Message *string +} +``` + +#### VolumeGroupSnapshotContent + +``` +// VolumeGroupSnapshotContent +type VolumeGroupSnapshotContent struct { + metav1.TypeMeta `json:",inline"` + // Standard object's metadata. + // +optional + metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` + + // Spec defines the desired characteristics of a group snapshot content + Spec VolumeGroupSnapshotContentSpec `json:"spec" protobuf:"bytes,2,opt,name=spec"` + + // Status represents the latest observed state of the group snapshot content + // +optional + Status *VolumeGroupSnapshotContentStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"` +} + +// VolumeGroupSnapshotContentSpec describes the common attributes of a group snapshot content +type VolumeGroupSnapshotContentSpec struct { + // Required + // VolumeGroupSnapshotRef specifies the VolumeGroupSnapshot object + // to which this VolumeGroupSnapshotContent object is bound. + VolumeGroupSnapshotRef core_v1.ObjectReference + + // Required + VolumeGroupSnapshotDeletionPolicy VolumeGroupSnapshotDeletionPolicy + + // Required + Driver string + + // +optional + VolumeGroupSnapshotClassName *string + + // Required + Source VolumeGroupSnapshotContentSource +} + +// OneOf +type VolumeGroupSnapshotContentSource struct { + // Dynamical provisioning of VolumeGroupSnapshot + // +optional + VolumeGroupHandle *string + + // Pre-provisioned VolumeGroupSnapshot + // +optional + VolumeGroupSnapshotHandle *string +} + +Type VolumeGroupSnapshotContentStatus struct { + // VolumeGroupSnapshotHandle is a unique id returned by the CSI driver + // to identify the VolumeGroupSnapshot on the storage system. + // If a storage system does not provide such an id, the + // CSI driver can choose to return the VolumeGroupSnapshot name. + // +optional + VolumeGroupSnapshotHandle *string + + // ReadyToUse becomes true when ReadyToUse on all individual snapshots become true + // +optional + ReadyToUse *bool + + // +optional + CreationTime *int64 + + // +optional + Error *VolumeGroupSnapshotError + + // List of volume group snapshot contents + // +optional + VolumeSnapshotContentList []VolumeSnapshotContent +} +``` + +#### PersistentVolumeClaim and PersistentVolume + +For PersistentVolumeClaim, the user can request it to be added to a VolumeGroup by adding the same label specified by the labelSelector in the VolumeGroup. In the initial phase, no changes will be proposed to PersistentVolumeClaim and PersistentVolume API objects. Before moving to Beta, we will re-evaluate this. + +#### VolumeSnapshot and VolumeSnapshotContent + +For VolumeSnapshot, we cannot request a VolumeSnapshot to be added to be VolumeGroupSnapshot, therefore VolumeGroupSnapshotName is only in the Status but not in the Spec. + +``` +type VolumeSnapshotStatus struct{ + ...... + // +optional + VolumeGroupSnapshotName *string + ...... +} + +type VolumeSnapshotContentStatus struct{ + ...... + // +optional + VolumeGroupSnapshotContentName *string + ...... +} +``` + +### Example Yaml Files + +#### Create Volume Group + +Example yaml files to create a VolumeGroupClass and a VolumeGroup are in the following. + +Create a VolumeGroupClass that supports volumeGroupSnapshot: +``` +apiVersion: volumegroup.storage.k8s.io/v1alpha1 +kind: VolumeGroupClass +metadata: + name: volumeGroupClass1 +spec: + parameters: + …... + supportVolumeGroupSnapshot: true +``` + +Create a VolumeGroup belongs to this VolumeGroupClass: +``` +apiVersion: volumegroup.storage.k8s.io/v1alpha1 +kind: VolumeGroup +metadata: + Name: volumeGroup1 +spec: + volumeGroupClassName: volumeGroupClass1 +``` + +#### Add PVC to VolumeGroup + +Create a PVC that belongs to the volume group which supports volumeGroupSnapshot: +``` +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: pvc1 + labels: + volumegroup:myApp +spec: + accessModes: + - ReadWriteOnce + dataSource: null + resources: + requests: + storage: 1Gi + storageClassName: storageClass1 + volumeMode: Filesystem + volumeGroupNames: [volumeGroup1] +``` + +#### Create VolumeGroupSnapshot + +Create a VolumeGroupSnapshotClass: +``` +apiVersion: volumegroup.storage.k8s.io/v1alpha1 +kind: VolumeGroupSnapshotClass +metadata: + name: volumeGroupSnapshotClass1 +spec: + parameters: + …... +``` + +A VolumeGroupSnapshot taken from the VolumeGroup dynamically: +``` +apiVersion: volumegroup.storage.k8s.io/v1alpha1 +kind: VolumeGroupSnapshot +metadata: + name: my-group-snapshot +spec: + source: + volumeGroupName: volumeGroup1 + volumeGroupSnapshotClassName: volumeGroupSnapshotClass1 +``` + +A new external VolumeGroup controller will handle VolumeGroupClass, VolumeGroup, and VolumeGroupContent resources. We may need to split this into two controllers, one common controller that handles common functions such as binding, and one sidecar controller that calls the CSI driver. + +External provisioner will be modified to read information from volume groups (through volumeGroupNames) and pass them down to the CSI driver. + +A new external VolumeGroupSnapshot controller will handle VolumeGroupSnapshotClass, VolumeGroupSnapshot, and VolumeGroupSnapshotContent resources. We may need to split this into two controllers, one common controller that handles common functions such as binding, and one sidecar controller that calls the CSI driver. + +Snapshot controller will be modified to update VolumeSnapshot status. External snapshotter sidecar will be modified to update VolumeSnapshotContent status. diff --git a/keps/sig-storage/3476-volume-group/kep.yaml b/keps/sig-storage/3476-volume-group-snapshot/kep.yaml similarity index 91% rename from keps/sig-storage/3476-volume-group/kep.yaml rename to keps/sig-storage/3476-volume-group-snapshot/kep.yaml index 3c4b2be2fac..7dc299900f7 100644 --- a/keps/sig-storage/3476-volume-group/kep.yaml +++ b/keps/sig-storage/3476-volume-group-snapshot/kep.yaml @@ -1,4 +1,4 @@ -title: Volume Group and Group Snapshot +title: Volume Group Snapshot kep-number: 3476 authors: - "@xing-yang" From 6035f538863d384c8b306eca194060b02c14414b Mon Sep 17 00:00:00 2001 From: xing-yang Date: Mon, 30 Jan 2023 09:45:30 -0500 Subject: [PATCH 14/19] Make changes based on CSI spec changes --- .../3476-volume-group-snapshot/README.md | 66 ++++++------------- 1 file changed, 19 insertions(+), 47 deletions(-) diff --git a/keps/sig-storage/3476-volume-group-snapshot/README.md b/keps/sig-storage/3476-volume-group-snapshot/README.md index ba9eef2a86f..32ee8eb5ab5 100644 --- a/keps/sig-storage/3476-volume-group-snapshot/README.md +++ b/keps/sig-storage/3476-volume-group-snapshot/README.md @@ -40,7 +40,7 @@ - [CSI Group Controller RPC](#csi-group-controller-rpc) - [CreateVolumeGroupSnapshot](#createvolumegroupsnapshot) - [DeleteVolumeGroupSnapshot](#deletevolumegroupsnapshot) - - [ControllerGetVolumeGroupSnapshot](#controllergetvolumegroupsnapshot) + - [GetVolumeGroupSnapshot](#getvolumegroupsnapshot) - [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) - [Feature enablement and rollback](#feature-enablement-and-rollback) - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) @@ -349,24 +349,6 @@ type VolumeGroupSnapshotSpec struct { // secret object contains more than one secret, all secrets are passed. // +optional VolumeGroupSnapshotSecretRef *SecretReference - - // A list of VolumeSecrets - // This field is only needed if per volume secret is different from - // VolumeGroupSnapshotSecretRef - // +optional - VolumeSecretRefList []VolumeSecret -} - -Type VolumeSecret { - // Name of a PVC - Name string - - // VolumeSecretRef is a reference to the secret object containing - // sensitive information to pass to the CSI driver to complete the CSI - // calls for VolumeGroupSnapshots. - // This field is optional, and may be empty if no secret is required. If the - // secret object contains more than one secret, all secrets are passed. - VolumeSecretRef *SecretReference } Type VolumeGroupSnapshotStatus struct { @@ -581,35 +563,14 @@ message CreateVolumeGroupSnapshotRequest { // ControllerCreateVolumeGroupSnapshot request. // This field is OPTIONAL. Refer to the `Secrets Requirements` // section on how to use this field. - // The secrets provided in this field SHOULD be the same as - // the secrets provided in ControllerDeleteVolumeGroupSnapshot - // and ControllerGetVolumeGroupSnapshot requests for the same - // group snapshot unless if secrets are rotated after the - // group snapshot is created. + // The secrets provided in this field SHOULD be the same for + // all group snapshot operations on the same group snapshot. map secrets = 3 [(csi_secret) = true]; - // Volume secrets required by plugin to complete volume group - // snapshot creation request. This field is needed in case the - // volume level secrets are different from the above secrets - // for the group snapshot. - // This field is OPTIONAL. - repeated VolumeSecret volume_secrets = 4; - // Plugin specific parameters passed in as opaque key-value pairs. // This field is OPTIONAL. The Plugin is responsible for parsing and // validating these parameters. COs will treat these as opaque. - map parameters = 5; -} - -message VolumeSecret { - // ID of the volume whose secrets are provided. - // This field is REQUIRED. - string volume_id = 1; - - // Secrets required by plugin for a volume operation. - // This field is REQUIRED. Refer to the `Secrets Requirements` - // section on how to use this field. - map secrets = 2 [(.csi.v1.csi_secret) = true]; + map parameters = 4; } message CreateVolumeGroupSnapshotResponse { @@ -641,6 +602,17 @@ message VolumeGroupSnapshot { // Timestamp when the volume group snapshot is taken. // This field is REQUIRED. .google.protobuf.Timestamp creation_time = 3; + + // Indicates if all individual snapshots in the group snapshot + // are ready to use as a `volume_content_source` in a + // `CreateVolumeRequest`. The default value is false. + // If any snapshot in the list of snapshots in this message have + // ready_to_use set to false, the SP MUST set this field to false. + // If all of the snapshots in the list of snapshots in this message + // have ready_to_use set to true, the SP SHOULD set this field to + // true. + // This field is REQUIRED. + bool ready_to_use = 4; } ``` @@ -677,10 +649,10 @@ message DeleteVolumeGroupSnapshotResponse { } ``` -#### ControllerGetVolumeGroupSnapshot +#### GetVolumeGroupSnapshot ``` -message ControllerGetVolumeGroupSnapshotRequest { +message GetVolumeGroupSnapshotRequest { option (alpha_message) = true; // The ID of the group snapshot to fetch current group snapshot @@ -689,7 +661,7 @@ message ControllerGetVolumeGroupSnapshotRequest { string group_snapshot_id = 1; // Secrets required by plugin to complete - // ControllerGetVolumeGroupSnapshot request. + // GetVolumeGroupSnapshot request. // This field is OPTIONAL. Refer to the `Secrets Requirements` // section on how to use this field. // The secrets provided in this field SHOULD be the same as @@ -699,7 +671,7 @@ message ControllerGetVolumeGroupSnapshotRequest { map secrets = 2 [(csi_secret) = true]; } -message ControllerGetVolumeGroupSnapshotResponse { +message GetVolumeGroupSnapshotResponse { option (alpha_message) = true; // This field is REQUIRED From 9d51ef1bab001878d1d0f566c0dea216d0238ff8 Mon Sep 17 00:00:00 2001 From: xing-yang Date: Wed, 1 Feb 2023 20:04:31 -0500 Subject: [PATCH 15/19] Address comments --- .../3476-volume-group-snapshot/README.md | 29 +++++++++---------- 1 file changed, 13 insertions(+), 16 deletions(-) diff --git a/keps/sig-storage/3476-volume-group-snapshot/README.md b/keps/sig-storage/3476-volume-group-snapshot/README.md index 32ee8eb5ab5..e81e461c783 100644 --- a/keps/sig-storage/3476-volume-group-snapshot/README.md +++ b/keps/sig-storage/3476-volume-group-snapshot/README.md @@ -149,11 +149,14 @@ This proposal introduces new CRDs VolumeGroupSnapshot, VolumeGroupSnapshotConten A VolumeGroupSnapshot can be created from multiple PVCs with a label on the PVCs specified by the labelSelector in the VolumeGroupSnapshot if the CSI driver supports the CREATE_DELETE_GET_VOLUME_GROUP_SNAPSHOT capability. +Note: In the following, we will use VolumeGroupSnapshot Controller to refer to the control logic for VolumeGroupSnapshot. This is not a new controller. It will be new control logic added to the existing Snapshot Controller and the csi-snapshotter sidecar. + #### Dynamic provisioning * Admin creates a VolumeGroupSnapshotClass. * User creates a VolumeGroupSnapshot with label selector that matches the label applied to all PVCs to be snapshotted together. -* This will trigger the VolumeGroupSnapshot controller to create a VolumeGroupSnapshotContent API object, and also call the CreateVolumeGroupSnapshot CSI function. It will also create multiple VolumeSnapshot API objects with volumeGroupSnapshotName in the status and the corresponding VolumeSnapshotContents with the snapshot handle. The VolumeSnapshot and VolumeSnapshotContent will point to each other before these objects are created in the API server to avoid triggering the VolumeSnapshot controller to create new individual objects. The CSI snapshotter sidecar will not call CSI driver in this case. If needed, GetVolumeGroupSnapshot CSI function will be called to retrieve individual snapshot statuses until all snapshots are ready to use. +* This will trigger the VolumeGroupSnapshot controller to create a VolumeGroupSnapshotContent API object, and also call the CreateVolumeGroupSnapshot CSI function. +* The controller will retrieve all volumeSnapshotHandles in the Volume Group Snapshot from the CSI CreateVolumeGroupSnapshotResponse, create VolumeSnapshotContents pointing to the volumeSnapshotHandles. Then the controller will create VolumeSnapshots pointing to the VolumeSnapshotContents. * CreateVolumeGroupSnapshot CSI function response * The CreateVolumeGroupSnapshot CSI function should return a list of snapshots (Snapshot message defined in CSI Spec) in its response. The VolumeGroupSnapshot controller can use the returned list of snapshots to construct corresponding individual VolumeSnapshotContents and VolumeSnapshots, wait for VolumeSnapshots and VolumeSnapshotContents to be bound, and update SnapshotList in the VolumeGroupSnapshot Status and SnapshotContentList in the VolumeGroupSnapshotContent Status. @@ -176,7 +179,7 @@ status: Admin can create a VolumeGroupSnapshotContent, specifying an existing VolumeGroupSnapshotHandle in the storage system and specifying a VolumeGroupSnapshot name and namespace. Then the user creates a VolumeGroupSnapshot that points to the VolumeGroupSnapshotContent name. -Admin will retrieve all volumeSnapshotHandles in the Volume Group Snapshot from the storage system, create VolumeSnapshotContents pointing to the volumeSnapshotHandles. Then the user can create VolumeSnapshots pointing to the VolumeSnapshotContents. +The controller will retrieve all volumeSnapshotHandles in the Volume Group Snapshot from the storage system, create VolumeSnapshotContents pointing to the volumeSnapshotHandles. Then the controller will create VolumeSnapshots pointing to the VolumeSnapshotContents. ### Delete VolumeGroupSnapshot @@ -502,9 +505,9 @@ spec: volumeGroupSnapshotClassName: volumeGroupSnapshotClass1 ``` -A new external VolumeGroupSnapshot controller will handle VolumeGroupSnapshotClass, VolumeGroupSnapshot, and VolumeGroupSnapshotContent resources. We may need to split this into two controllers, one common controller that handles common functions such as binding, and one sidecar controller that calls the CSI driver. +The new VolumeGroupSnapshot logic will be added to the Snapshot Controller and the csi-snapshotter sidecar to handle VolumeGroupSnapshotClass, VolumeGroupSnapshot, and VolumeGroupSnapshotContent resources accordingly. -Snapshot controller will be modified so that it will not delete an indiviual VolumeSnapshot that is part of a VolumeGroupSnapshot. External snapshotter sidecar will be modified so that it will not delete an individual VolumeSnapshotContent that is part of a VolumeGroupSnapshotContent. +Snapshot controller will also be modified so that it will not delete an indiviual VolumeSnapshot that is part of a VolumeGroupSnapshot. External snapshotter sidecar will be modified so that it will not delete an individual VolumeSnapshotContent that is part of a VolumeGroupSnapshotContent. ### CSI Changes @@ -688,9 +691,9 @@ _This section must be completed when targeting alpha to a release._ * **How can this feature be enabled / disabled in a live cluster?** - [x] Other - Describe the mechanism: - The external volume group snapshot controllers do not have a - feature gate because they are out of tree. - It is enabled when these external controller sidecars are deployed with the CSI driver. + We don't have a feature gate because this feature is out of tree. + We will use a flag called enable-volume-group-snapshot to enable this + feature when the snapshot controller and csi-snapshotter sidecar are started. - Will enabling / disabling the feature require downtime of the control plane? From the controller side, it only affects the external controller sidecars. @@ -703,21 +706,15 @@ _This section must be completed when targeting alpha to a release._ * **Can the feature be disabled once it has been enabled (i.e. can we rollback the enablement)?** - Yes. In order to disable this feature once it has been enabled, we first need to make sure that all VolumeGroupSnapshot API objects are deleted. Then the new controllers for VolumeGroupSnapshot can be stopped/removed, and external-snapshotter controller/sidecar can be downgraded to a version without this feature. - -If we don't delete the VolumeGroupSnapshot API objects and CRDs but just uninstall the VolumeGroupSnapshot controllers and downgrade the other sidecars, the API objects continue to exist in the API server. User may delete an individual VolumeSnapshot that is associated with a VolumeGroupSnapshot. After that if the user starts the controllers/sidecars again, the pre-existing VolumeGroupSnapshot still has the deleted individual VolumeSnapshots in its status so it is out of sync with the storage system and provides out-dated information to the user. User can still restore individual PVCs from individual VolumeSnapshots that are not deleted, but they cannot restore PVCs from the deleted VolumeSnapshots. - -If the API objects and VolumeGroupSnapshot controllers are running, but the snapshotter sidecars are downgraded to a lower version that does not support this feature, it should be fine as individual snapshots that are part of a group snapshot will be created and deleted by the VolumeGroupSnapshot controller. + Yes. In order to disable this feature once it has been enabled, we first need to make sure that all VolumeGroupSnapshot API objects are deleted. Then external-snapshotter controller/sidecar can be restarted without the feature flag. -If the external-snapshotter sidecar which supports this feature is running but VolumeGroupSnapshot controller is not (CRDs are still installed), creating VolumeGroupSnapshot will not be successful. Ready status in VolumeGroupSnapshot API objects will be false until those controllers are running again. +If we don't delete the VolumeGroupSnapshot API objects and CRDs but just disable the feature and restart Snapshot controller and the csi-snapshotter sidecar, the API objects continue to exist in the API server. User may delete an individual VolumeSnapshot that is associated with a VolumeGroupSnapshot. After that if the user enables the feature again, the pre-existing VolumeGroupSnapshot still has the deleted individual VolumeSnapshots in its status so it is out of sync with the storage system and provides out-dated information to the user. User can still restore individual PVCs from individual VolumeSnapshots that are not deleted, but they cannot restore PVCs from the deleted VolumeSnapshots. * **What happens if we reenable the feature if it was previously rolled back?** We will be able to create new VolumeGroupSnapshot API objects again. * **Are there any tests for feature enablement/disablement?** - Since there is no feature gate for this feature on the external controller side and the only way to - enable or disable this feature is to install or unistall the sidecar, we cannot write - tests for feature enablement/disablement. + Unit tests will be added with or without the feature flag enabled. ### Rollout, Upgrade and Rollback Planning From e1bcf82bd5f175f121bc8745b9e6fe87bddc3092 Mon Sep 17 00:00:00 2001 From: xing-yang Date: Wed, 1 Feb 2023 20:29:17 -0500 Subject: [PATCH 16/19] Changed VolumeGroupHandle to PersistentVolumeNames. --- keps/sig-storage/3476-volume-group-snapshot/README.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/keps/sig-storage/3476-volume-group-snapshot/README.md b/keps/sig-storage/3476-volume-group-snapshot/README.md index e81e461c783..ea200ea7ce3 100644 --- a/keps/sig-storage/3476-volume-group-snapshot/README.md +++ b/keps/sig-storage/3476-volume-group-snapshot/README.md @@ -186,7 +186,9 @@ The controller will retrieve all volumeSnapshotHandles in the Volume Group Snaps A VolumeGroupSnapshot can be deleted if the CSI driver supports the CREATE_DELETE_GET_VOLUME_GROUP_SNAPSHOT capability. * When a VolumeGroupSnapshot is deleted, the VolumeGroupSnapshot controller will call the DeleteVolumeGroupSnapshot CSI function which will delete individual snapshots as well. * Since CSI driver handles individual snapshot creation in CreateVolumeGroupSnapshot, it should handle individual snapshot deletion in DeleteVolumeGroupSnapshot as well. DeleteSnapshot CSI function will not be called. + * When DeleteVolumeGroupSnapshot CSI function returns success, it is assumed that all individual snapshots on the storage system have been deleted. VolumeGroupSnapshot controller should remove all the finalizers and delete the VolumeSnapshot and VolumeSnapshotContent API objects. * DeleteSnapshot on a single snapshot that belongs to a group snapshot is not allowed. +* The Snapshot Controller and csi-snapshotter sidecar will be modified to skip the handling of VolumeSnapshot and VolumeSnapshotContent deletion if they are part of a group. ### Restore @@ -426,8 +428,9 @@ type VolumeGroupSnapshotContentSpec struct { // OneOf type VolumeGroupSnapshotContentSource struct { // Dynamical provisioning of VolumeGroupSnapshot + // A list of PersistentVolume names to be snapshotted together // +optional - VolumeGroupHandle *string + PersistentVolumeNames []string // Pre-provisioned VolumeGroupSnapshot // +optional From 7c7e865495c0d169f39e0de00e8889016af52a46 Mon Sep 17 00:00:00 2001 From: xing-yang Date: Mon, 6 Feb 2023 22:41:35 -0500 Subject: [PATCH 17/19] Address comments --- .../3476-volume-group-snapshot/README.md | 63 +++++++++---------- 1 file changed, 28 insertions(+), 35 deletions(-) diff --git a/keps/sig-storage/3476-volume-group-snapshot/README.md b/keps/sig-storage/3476-volume-group-snapshot/README.md index ea200ea7ce3..9848e4b274a 100644 --- a/keps/sig-storage/3476-volume-group-snapshot/README.md +++ b/keps/sig-storage/3476-volume-group-snapshot/README.md @@ -159,9 +159,13 @@ Note: In the following, we will use VolumeGroupSnapshot Controller to refer to t * The controller will retrieve all volumeSnapshotHandles in the Volume Group Snapshot from the CSI CreateVolumeGroupSnapshotResponse, create VolumeSnapshotContents pointing to the volumeSnapshotHandles. Then the controller will create VolumeSnapshots pointing to the VolumeSnapshotContents. * CreateVolumeGroupSnapshot CSI function response * The CreateVolumeGroupSnapshot CSI function should return a list of snapshots (Snapshot message defined in CSI Spec) in its response. The VolumeGroupSnapshot controller can use the returned list of snapshots to construct corresponding individual VolumeSnapshotContents and VolumeSnapshots, wait for VolumeSnapshots and VolumeSnapshotContents to be bound, and update SnapshotList in the VolumeGroupSnapshot Status and SnapshotContentList in the VolumeGroupSnapshotContent Status. + * Individual VolumeSnapshots will be named in this format: + * snap++ + (truncate if it exceeds the max length) + * If the exact same name already exists, append a "1" at the end. If that still exists, replace the suffix "1" with "2", and so on. -apiVersion: snapshot.storage.k8s.io/v1 ``` +apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshot metadata: name: snapshot1 @@ -172,14 +176,14 @@ status: volumeGroupSnapshotName: groupSnapshot1 ``` -* An admissions controller or finalizer should be added to prevent an individual snapshot from being deleted that belongs to a VolumeGroupSnapshot. +* An admissions controller or finalizer should be added to prevent an individual snapshot from being deleted that belongs to a VolumeGroupSnapshot. Note that there is a [KEP](https://github.com/kubernetes/enhancements/pull/2840/files) that is proposing the Liens feature which could potentially be used for this purpose. * In the CSI spec, it is specified that it is required for individual snapshots to be returned along with the group snapshot. #### Pre-provisioned VolumeGroupSnapshot Admin can create a VolumeGroupSnapshotContent, specifying an existing VolumeGroupSnapshotHandle in the storage system and specifying a VolumeGroupSnapshot name and namespace. Then the user creates a VolumeGroupSnapshot that points to the VolumeGroupSnapshotContent name. -The controller will retrieve all volumeSnapshotHandles in the Volume Group Snapshot from the storage system, create VolumeSnapshotContents pointing to the volumeSnapshotHandles. Then the controller will create VolumeSnapshots pointing to the VolumeSnapshotContents. +The controller will call the CSI GetVolumeGroupSnapshot method to retrieve all volumeSnapshotHandles in the Volume Group Snapshot from the storage system, create VolumeSnapshotContents pointing to the volumeSnapshotHandles. Then the controller will create VolumeSnapshots pointing to the VolumeSnapshotContents. ### Delete VolumeGroupSnapshot @@ -339,12 +343,19 @@ type VolumeGroupSnapshot struct { // VolumeGroupSnapshotSpec describes the common attributes of a group snapshot type VolumeGroupSnapshotSpec struct { + // VolumeGroupSnapshotClassName may be left nil to indicate that + // the default class will be used. + // Empty string is not allowed for this field. // +optional - VolumeSnapshotClassName *string + VolumeGroupSnapshotClassName *string // A label query over persistent volume claims to be grouped together // for snapshotting. // This labelSelector will be used to match the label added to a PVC. + // Note that if volumes are added/removed from the label after a group snapshot + // is created, the existing snapshots won't be modified. + // Once a VolumeGroupSnapshotContent is created and the sidecar starts to process it, + // the volume list will not change with retries. Selector *metav1.LabelSelector // VolumeGroupSnapshotSecretRef is a reference to the secret object containing @@ -368,22 +379,12 @@ Type VolumeGroupSnapshotStatus struct { CreationTime *metav1.Time // +optional - Error *VolumeGroupSnapshotError - - // List of volume snapshots - // +optional - SnapshotList []VolumeSnapshot -} - -// Describes an error encountered on the group snapshot -type VolumeGroupSnapshotError struct { - // time is the timestamp when the error was encountered. - // +optional - Time *metav1.Time + Error *VolumeSnapshotError - // message details the encountered error + // List of volume snapshot refs + // The max number of snapshots in the group is 100 // +optional - Message *string + VolumeSnapshotRefList []core_v1.ObjectReference } ``` @@ -418,6 +419,8 @@ type VolumeGroupSnapshotContentSpec struct { // Required Driver string + // This field may be unset for pre-provisioned snapshots. + // For dynamic provisioning, this field must be set. // +optional VolumeGroupSnapshotClassName *string @@ -427,7 +430,7 @@ type VolumeGroupSnapshotContentSpec struct { // OneOf type VolumeGroupSnapshotContentSource struct { - // Dynamical provisioning of VolumeGroupSnapshot + // Dynamic provisioning of VolumeGroupSnapshot // A list of PersistentVolume names to be snapshotted together // +optional PersistentVolumeNames []string @@ -453,11 +456,12 @@ Type VolumeGroupSnapshotContentStatus struct { CreationTime *int64 // +optional - Error *VolumeGroupSnapshotError + Error *VolumeSnapshotError - // List of volume group snapshot contents + // List of volume snapshot content refs + // The max number of snapshots in a group is 100 // +optional - VolumeSnapshotContentList []VolumeSnapshotContent + VolumeSnapshotContentRefList []core_v1.ObjectReference } ``` @@ -1460,23 +1464,12 @@ Type VolumeGroupSnapshotStatus struct { CreationTime *metav1.Time // +optional - Error *VolumeGroupSnapshotError + Error *VolumeSnapshotError // List of volume snapshots // +optional SnapshotList []VolumeSnapshot } - -// Describes an error encountered on the group snapshot -type VolumeGroupSnapshotError struct { - // time is the timestamp when the error was encountered. - // +optional - Time *metav1.Time - - // message details the encountered error - // +optional - Message *string -} ``` #### VolumeGroupSnapshotContent @@ -1544,7 +1537,7 @@ Type VolumeGroupSnapshotContentStatus struct { CreationTime *int64 // +optional - Error *VolumeGroupSnapshotError + Error *VolumeSnapshotError // List of volume group snapshot contents // +optional From d68d08d56c59781987c814a1c6f5c59c64c791c6 Mon Sep 17 00:00:00 2001 From: xing-yang Date: Tue, 7 Feb 2023 20:14:51 -0500 Subject: [PATCH 18/19] Moved alternative VolumeGroup proposal out --- .../3476-volume-group-snapshot/README.md | 678 +----------------- 1 file changed, 3 insertions(+), 675 deletions(-) diff --git a/keps/sig-storage/3476-volume-group-snapshot/README.md b/keps/sig-storage/3476-volume-group-snapshot/README.md index 9848e4b274a..2bf8caceb9c 100644 --- a/keps/sig-storage/3476-volume-group-snapshot/README.md +++ b/keps/sig-storage/3476-volume-group-snapshot/README.md @@ -54,28 +54,6 @@ - [Immutable VolumeGroup](#immutable-volumegroup) - [ModifyVolume](#modifyvolume) - [VolumeGroup API Definitions](#volumegroup-api-definitions) - - [Use cases for the VolumeGroup](#use-cases-for-the-volumegroup) - - [Future use cases for the VolumeGroup](#future-use-cases-for-the-volumegroup) - - [Proposal for VolumeGroup and VolumeGroupSnapshot](#proposal-for-volumegroup-and-volumegroupsnapshot) - - [Create VolumeGroup](#create-volumegroup) - - [Delete VolumeGroup and PVC](#delete-volumegroup-and-pvc) - - [Modify VolumeGroup](#modify-volumegroup) - - [Create and Modify VolumeGroup](#create-and-modify-volumegroup) - - [Create VolumeGroupSnapshot](#create-volumegroupsnapshot-2) - - [Delete VolumeGroupSnapshot](#delete-volumegroupsnapshot-1) - - [Restore](#restore-1) - - [VolumeGroupClass](#volumegroupclass) - - [VolumeGroup](#volumegroup) - - [VolumeGroupContent](#volumegroupcontent) - - [VolumeGroupSnapshotClass](#volumegroupsnapshotclass-1) - - [VolumeGroupSnapshot](#volumegroupsnapshot-1) - - [VolumeGroupSnapshotContent](#volumegroupsnapshotcontent-1) - - [PersistentVolumeClaim and PersistentVolume](#persistentvolumeclaim-and-persistentvolume) - - [VolumeSnapshot and VolumeSnapshotContent](#volumesnapshot-and-volumesnapshotcontent-1) - - [Example Yaml Files](#example-yaml-files-1) - - [Create Volume Group](#create-volume-group) - - [Add PVC to VolumeGroup](#add-pvc-to-volumegroup) - - [Create VolumeGroupSnapshot](#create-volumegroupsnapshot-3) ## Release Signoff Checklist @@ -160,9 +138,8 @@ Note: In the following, we will use VolumeGroupSnapshot Controller to refer to t * CreateVolumeGroupSnapshot CSI function response * The CreateVolumeGroupSnapshot CSI function should return a list of snapshots (Snapshot message defined in CSI Spec) in its response. The VolumeGroupSnapshot controller can use the returned list of snapshots to construct corresponding individual VolumeSnapshotContents and VolumeSnapshots, wait for VolumeSnapshots and VolumeSnapshotContents to be bound, and update SnapshotList in the VolumeGroupSnapshot Status and SnapshotContentList in the VolumeGroupSnapshotContent Status. * Individual VolumeSnapshots will be named in this format: - * snap++ - (truncate if it exceeds the max length) - * If the exact same name already exists, append a "1" at the end. If that still exists, replace the suffix "1" with "2", and so on. + * - + * A label with VolumeGroupSnapshot name will also be added to the VolumeSnapshot ``` apiVersion: snapshot.storage.k8s.io/v1 @@ -1001,653 +978,4 @@ External-provisioner will be modified so that modifying PVC by adding VolumeGrou In an earlier version of this KEP, a VolumeGroup API is introduced to group volumes together. The VolumeGroup is removed from the KEP for a simpler design that supports group snapshot. -#### Use cases for the VolumeGroup - -* A VolumeGroup allows users to manage multiple volumes belonging to the same application together and therefore it is very useful in general. For example, it can be used to group all volumes in the same StatefulSet together and we can take a group snapshot of all the volumes in this StatefulSet. - -* For some storage systems, volumes are always managed in a group. For these storage systems, they will have to create a group for a single volume if they need to implement a create volume function in Kubernetes. Volume snapshotting, cloning, expansion, and deletion, etc. are all performed at a group level. Providing a VolumeGroup API will be very convenient for them. - -* Instead of taking individual snapshots one after another, VolumeGroup can be used as a source for taking a snapshot of all the volumes in the same volume group. This may be a storage level consistent group snapshot if the storage system supports it. For this use case, we will introduce another CRD VolumeGroupSnapshot. - -* VolumeGroup can also be used together with application snapshot. It can be a resource managed by the ApplicationSnapshot CRD. - -* Some applications may not want to use ApplicationSnapshot CRD because they don’t use Kubernetes workload APIs such as StatefulSet, Deployment, etc. Instead, they have developed their own operators. In this case it is more convenient to use VolumeGroup to manage persistent volumes used in those applications. - -* Application quiesce is time consuming. Some users may not want to do application quiesce very frequently for that reason. For example, a user may want to run weekly backups with application quiesce and nightly backups without application quiesce but with consistency group support which provides crash consistency across all volumes in the group. - -#### Future use cases for the VolumeGroup - -* VolumeGroup can be used to manage group replication or consistency group replication if the storage system supports it. Note replication is out of scope for this proposal. It is mentioned here as a potential future use case. - -* VolumeGroup can be used to manage volume placement to either spread the volumes across storage pools or stack the volumes on the same storage pool. Related KEPs proposing the concept of storage pool for volume placement is as follows: - https://github.com/kubernetes/enhancements/pull/1353 - https://github.com/kubernetes/enhancements/pull/1347 -We may not really need a VolumeGroup for this use case. A StoragePool is probably enough. This is to be determined. - -#### Proposal for VolumeGroup and VolumeGroupSnapshot - -This proposal introduces new CRDs VolumeGroupSnapshot, VolumeGroupSnapshotContent, and VolumeGroupSnapshotClass. - -##### Create VolumeGroup - -Create new VolumeGroup can be done in several ways: - -Phase 1 (Note: only Phase 1 will be covered in this KEP which is targeting Alpha in K8s v1.26): -1. Create an empty group first, then create a new PVC with the group name. This will create a new volume and add that volume to the already created group. When deleting this volume group, all volumes in the group will be deleted together with the group. A CSI driver supporting CREATE_DELETE_VOLUME_GROUP controller capability MUST implement this feature. -2. Create an empty group first, then add an existing PVC to the group one by one. A CSI driver supporting VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME MUST implement this feature. - -Phase 2 (After v1.26): -1. Create a new volume group by querying a label on existing persistent volume claims and adding them to the volume group. -2. Create a new volume group from an existing group snapshot or another volume group in one step. Design details will be added in a future KEP. - -Non-goal: Create a new empty group and in the same time create new empty PVCs and add to the new group. - -##### Delete VolumeGroup and PVC - -Deleting a volume group will delete the volume group along with all the PVCs in the group. - -An individual PVC needs to be removed from the group first before it can be deleted. A finalizer or webhook will be added that prevents an individual PVC in a group from being deleted. - -##### Modify VolumeGroup - -Modify an existing VolumeGroup: -1. Create a new volume with an existing VolumeGroup name will create a new volume and add it to the group. Option 1 of creating VolumeGroup above falls into this case. As mentioned earlier, a CSI driver supporting CREATE_DELETE_VOLUME_GROUP MUST implement this feature. -2. Add an existing volume to an existing VolumeGroup or remove a volume from a VolumeGroup. Option 2 of creating VolumeGroup above falls into this case. As mentioned earlier, a CSI driver supporting VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME MUST implement this feature. - -##### Create and Modify VolumeGroup - -VolumeGroups can be created and/or modified in several ways as described in the following. - -###### Create new PVC and add to the VolumeGroup - -* Admin creates a VolumeGroupClass, with the SupportVolumeGroupSnapshot boolean flag set to true. -* User creates a new empty VolumeGroup, specifying the above VolumeGroupClass. As a result, a new empty VolumeGroupContent will also be created and bound to the VolumeGroup. -* User creates a new PVC with an existing VolumeGroup name created above. As a result, a new PVC is created and added to VolumeGroup. VolumeGroup is modified so Status has this new PVC in PVCList. -* External-provisioner will be modified so that VolumeGroupName will be passed to the CSI driver when creating a volume. - -Only CSI drivers supporting CREATE_DELETE_VOLUME_GROUP capability can support the volume group this way. - -When a new PVC is created with the existing VolumeGroup name, the VolumeGroup will be modified and the PVC will be added to PVCList in the Status, and the VolumeGroupContent will also be modified and the PV will be added to the PVList in the Status. - -The same PVC can belong to different groups, i.e., different types of groups or different groups of the same type, if the storage system supports it. Storage system will decide whether to support this or not. If it does not support it, an INVALID_ARGUMENT error code should be returned with a message explaining why. We don't prevent it in the API or controller directly. - -###### Modify VolumeGroup with existing PVCs - -We can add an existing PVC to the group or remove a PVC from the group without deleting it. A VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME capability will be added to CSI Spec. Only CSI drivers supporting both CREATE_DELETE_VOLUME_GROUP and VOLUME_GROUP_ADD_REMOVE_EXISTING_VOLUME capabilities can support the volume group this way. - -* Admin creates a VolumeGroupClass, with the SupportVolumeGroupSnapshot boolean flag set to true. -* User creates a new empty VolumeGroup, specifying the above VolumeGroupClass. A new empty VolumeGroupContent will also be created and bound to the VolumeGroup. -* Add an existing PVC to an existing VolumeGroup (VolumeGroup can be empty to start with or it can have other PVCs already) by adding a label specified by the labelSelector in the VolumeGroup to the PVC. - * The VolumeGroup name is added by user to each PVC, not by the VolumeGroup controller. The VolumeGroup controller watches PVCs and reacts to the PVC updated with a VolumeGroup name event as described in the following step. -* VolumeGroup is modified so the existing PVC is added to the PVCList in the Status. VolumeGroupContent is also modified so the PV is added to the PVList in the Status. - * Note: The VolumeGroup controller will be implemented to have a desired state - of the world and an actual state of the world. The desired state of the world - contains VolumeGroups with the desired PVCList while the actual state of the - world contains VolumeGroups with the actual PVCList. The controller will try - to reconcile the two by handling adding and removing multiple PVCs through a - single CSI ModifyVolumeGroup RPC call each time. -* External-provisioner will be modified to update the status of PVC and PV. -* VolumeGroup controller will be triggered to update the VolumeGroup Status and VolumeGroupContent Status. -* If one volume fails to be added, it should not affect it if it is used by a pod, but there will be error messages. -* Removing a PVC from a VolumeGroup will trigger the external-provisioner and the VolumeGroup controller as well. - -###### Phase 2: Create VolumeGroup from VolumeGroupSnapshot or another VolumeGroup - -This is in Phase 2 so won't be discussed in detail here. Creating a new volume group from an existing group snapshot will be supported in Phase 2 if the CSI driver supports VOLUME_GROUP_FROM_GROUP_SNAPSHOT capability. As a result, PVCs will be created from source snapshots and placed in a new volume group. - -###### Pre-provisioned VolumeGroup - -Admin can create a VolumeGroupContent, specifying an existing VolumeGroupHandle in the storage system and specifying a VolumeGroup name and namespace. Then the user creates a VolumeGroup that points to the VolumeGroupContent name. - -Admin will retrieve all volumeHandles in the VolumeGroup from the storage system, create PVs pointing to the volumeHandles. Then the user creates PVCs pointing to the PVs. - -##### Create VolumeGroupSnapshot - -A VolumeGroupSnapshot can be created with a VolumeGroup as the source if the CSI driver supports the CREATE_DELETE_GROUP_SNAPSHOT capability. - -###### Dynamic provisioning - -* Admin creates a VolumeGroupSnapshotClass. -* User creates a VolumeGroupSnapshot with a VolumeGroup as the source. -* This will trigger the VolumeGroupSnapshot controller to create a VolumeGroupSnapshotContent API object, and also call the CreateVolumeGroupSnapshot CSI function and also create multiple VolumeSnapshot API objects with VolumeGroupSnapshot name parameter in each VolumeSnapshot Status. This will trigger the creation of VolumeSnapshotContent API objects in the snapshot controller and calls to the CreateSnapshot CSI function in the CSI snapshotter sidecar. The CSI snapshotter sidecar will pass the new group_snapshot_name parameter to the CSI Driver when calling CreatSnapshot. -* When CSI driver receives CreateSnapshot request for individual snapshots with a VolumeGroupSnapshot name: - * If it knows how to create a group snapshot on the storage system, it returns (nil, nil), and leaves it to the CreateVolumeGroupSnapshot function to handle the snapshot creation. -* CreateVolumeGroupSnapshot CSI function response - * The CreateVolumeGroupSnapshot CSI function should return a list of snapshots (Snapshot message defined in CSI Spec) in its response. The VolumeGroupSnapshot controller can use the returned list of snapshots to update corresponding individual VolumeSnapshotContents, wait for VolumeSnapshots and VolumeSnapshotContents to be bound, and update SnapshotList in the VolumeGroupSnapshot Status and SnapshotContentList in the VolumeGroupSnapshotContent Status. - -apiVersion: snapshot.storage.k8s.io/v1 -``` -kind: VolumeSnapshot -metadata: - name: snapshot1 -spec: - volumeSnapshotClassName: snapClass1 - source: - persistentVolumeClaimName: pvc1 -status: - volumeGroupSnapshotName: groupSnapshot1 -``` - -* An admissions controller or finalizer should be added to prevent an individual snapshot from being deleted that belongs to a VolumeGroupSnapshot. -* Since some storage systems require individual snapshots while others can only return a single group snapshot but not individual snapshots, we propose a two phase solution. - * In Phase 1, since we do not support creating a VolumeGroup directly from a VolumeGroupSnapshot, it is required for individual snapshots to be returned along with the group snapshot. - * In Phase 2, we plan to support creating a VolumeGroup directly from a VolumeGroupSnapshot. We propose the following solution for Phase 2: - * In VolumeGroupSnapshotStatus, if ReadyToUse is true and SnapshotList is empty, the VolumeGroupSnapshot Controller assumes the storage system does not return individual snapshots. - * If ReadyToUse is true and SnapshotList is not empty, the VolumeGroupSnapshot Controller knows there are individual snapshots created for this group. Those individual snapshots may be used as readonly, but they cannot be removed from the VolumeGroupSnapshot. - * In the CSI Spec, this means repeated .csi.v1.Snapshot snapshots in VolumeGroupSnapshot message from CreateVolumeGroupSnapshotResponse should be optional, not required. - * How to use the VolumeGroupSnapshot if individual snapshots are not returned? How can we create a volume from a snapshot if there are no individual snapshots? `snapshots` is optional while `group_snapshot_id` is required in VolumeGroupSnapshot message in CSI so it is fine to only specify `group_snapshot_id` not `snapshots` when creating a VolumeGroup from a VolumeGroupSnapshot. However, CSI Driver MUST return a list of `volumes` that are restored in `CreateVolumeGroupResponse`. - -###### Pre-provisioned VolumeGroupSnapshot - -Admin can create a VolumeGroupSnapshotContent, specifying an existing VolumeGroupSnapshotHandle in the storage system and specifying a VolumeGroupSnapshot name and namespace. Then the user creates a VolumeGroupSnapshot that points to the VolumeGroupSnapshotContent name. - -Admin will retrieve all volumeSnapshotHandles in the Volume Group Snapshot from the storage system, create VolumeSnapshotContents pointing to the volumeSnapshotHandles. Then the user can create VolumeSnapshots pointing to the VolumeSnapshotContents. - -##### Delete VolumeGroupSnapshot - -A VolumeGroupSnapshot can be deleted if the CSI driver supports the CREATE_DELETE_GROUP_SNAPSHOT capability. -* When a VolumeGroupSnapshot is deleted, the VolumeGroupSnapshot controller will call the DeleteVolumeGroupSnapshot CSI function as well as DeleteSnapshot CSI functions. - * Since CSI driver handles individual snapshot creation in CreateVolumeGroupSnapshot, it should handle individual snapshot deletion in DeleteVolumeGroupSnapshot. -* DeleteSnapshot on a single snapshot that belongs to a group snapshot is not allowed. - -##### Restore - -Restore can be done as follows: - -Phase 1: - -* A new empty volume group can be created first, and then a new volume can be created from a snapshot one by one and added to the volume group. This can be repeated for all the snapshots in the VolumeGroupSnapshot. - -Phase 2: - -* A VolumeGroup can be created from a VolumeGroupSnapshot or VolumeGroup source in one step. This is the same as what is described in the section `Create VolumeGroup from VolumeGroupSnapshot or another VolumeGroup`. - -API definitions are as follows: - -#### VolumeGroupClass - -``` -type VolumeGroupClass struct { - metav1.TypeMeta - // +optional - metav1.ObjectMeta - - // Driver is the driver expected to handle this VolumeGroupClass. - // This value may not be empty. - Driver string - - // Parameters hold parameters for the driver. - // These values are opaque to the system and are passed directly - // to the driver. - // +optional - Parameters map[string]string - - // +optional - VolumeGroupDeletionPolicy *VolumeGroupDeletionPolicy - - // This field specifies whether group snapshot is supported. - // The default is false. - // +optional - SupportVolumeGroupSnapshot *bool -} - -// VolumeGroupDeletionPolicy describes a policy for end-of-life maintenance of -// volume group contents -type VolumeGroupDeletionPolicy string - -const ( - // VolumeGroupContentDelete means the group will be deleted from the - // underlying storage system on release from its volume group. - VolumeGroupContentDelete VolumeGroupDeletionPolicy = "Delete" - - // VolumeGroupContentRetain means the group will be left in its current - // state on release from its volume group. - VolumeGroupContentRetain VolumeGroupDeletionPolicy = "Retain" -) -``` - -#### VolumeGroup - -``` -// VolumeGroup is a user's request for a group of volumes -type VolumeGroup struct { - metav1.TypeMeta - // +optional - metav1.ObjectMeta - - // Spec defines the volume group requested by a user - Spec VolumeGroupSpec - - // Status represents the current information about a volume group - // +optional - Status *VolumeGroupStatus -} - -// VolumeGroupSpec describes the common attributes of group storage devices -// and allows a Source for provider-specific attributes -Type VolumeGroupSpec struct { - // +optional - VolumeGroupClassName *string - - // Source has the information about where the group is created from. - // Required. - Source VolumeGroupSource -} - -// VolumeGroupSource contains several options. -// OneOf the options must be defined. -Type VolumeGroupSource struct { - // +optional - // Pre-provisioned VolumeGroup - VolumeGroupContentName *string - - // +optional - // Dynamically provisioned VolumeGroup - // A label query over persistent volume claims to be added to the volume group. - // This labelSelector will be used to match the label added to a PVC. - // In Phase 1, when the label is added to PVC, the PVC will be added to the matching group. - // In Phase 2, this labelSelector will be used to find all PVCs with matching label and add them to the group when the group is being created. - Selector *metav1.LabelSelector - - // Phase 2 - // +optional - // Dynamically provisioned VolumeGroup - // This field specifies the source of a volume group. (this is for restore) - // Supported Kind is VolumeGroupSnapshot or VolumeGroup - // GroupDataSource *TypedLocalObjectReference - } - -type VolumeGroupStatus struct { - // +optional - BoundVolumeGroupContentName *string - - // +optional - GroupCreationTime *metav1.Time - - // A list of persistent volume claims - // +optional - PVCList []PersistentVolumeClaim - - // +optional - Ready *bool - - // Last error encountered during group creation - // +optional - Error *VolumeGroupError -} - -// Describes an error encountered on the group -type VolumeGroupError struct { - // time is the timestamp when the error was encountered. - // +optional - Time *metav1.Time - - // message details the encountered error - // +optional - Message *string -} -``` - -#### VolumeGroupContent - -``` -// VolumeGroupContent represents a group of volumes on the storage backend -type VolumeGroupContent struct { - metav1.TypeMeta - // +optional - metav1.ObjectMeta - - // Spec defines the volume group requested by a user - Spec VolumeGroupContentSpec - - // Status represents the current information about a volume group - // +optional - Status *VolumeGroupContentStatus -} - -// VolumeGroupContentSpec -Type VolumeGroupContentSpec struct { - // +optional - VolumeGroupClassName *string - - // +optional - // VolumeGroupRef is part of a bi-directional binding between VolumeGroup and VolumeGroupContent. - VolumeGroupRef *core_v1.ObjectReference - - // +optional - Source *VolumeGroupContentSource - - // +optional - VolumeGroupDeletionPolicy *VolumeGroupDeletionPolicy - - // This field specifies whether group snapshot is supported. - // The default is false. - // +optional - SupportVolumeGroupSnapshot *bool - - // VolumeGroupSecretRef is a reference to the secret object containing - // sensitive information to pass to the CSI driver to complete the CSI - // calls for VolumeGroups. - // This field is optional, and may be empty if no secret is required. If the - // secret object contains more than one secret, all secrets are passed. - // +optional - VolumeGroupSecretRef *SecretReference -} - -// VolumeGroupContentSource -Type VolumeGroupContentSource struct { - // Required - Driver string - - // VolumeGroupHandle is the unique volume group name returned by the - // CSI volume plugin’s CreateVolumeGroup to refer to the volume group on - // all subsequent calls. - // Required. - VolumeGroupHandle string - - // +optional - // Attributes of the volume group to publish. - VolumeGroupAttributes map[string]string -} - -type VolumeGroupContentStatus struct { - // +optional - GroupCreationTime *metav1.Time - - // A list of persistent volumes - // +optional - PVList []PersistentVolume - - // +optional - Ready *bool - - // Last error encountered during group creation - // +optional - Error *VolumeGroupError -} -``` - -#### VolumeGroupSnapshotClass - -``` -type VolumeGroupSnapshotClass struct { - metav1.TypeMeta - // +optional - metav1.ObjectMeta - - // Driver is the driver expected to handle this VolumeGroupSnapshotClass. - // This value may not be empty. - Driver string - - // Parameters hold parameters for the driver. - // These values are opaque to the system and are passed directly - // to the driver. - // +optional - Parameters map[string]string - - // +optional - VolumeGroupSnapshotDeletionPolicy *VolumeGroupSnapshotDeletionPolicy -} - -// VolumeGroupSnapshotDeletionPolicy describes a policy for end-of-life maintenance of -// volume group snapshot contents -type VolumeGroupSnapshotDeletionPolicy string - -const ( - // VolumeGroupSnapshotContentDelete means the group snapshot will be deleted from the - // underlying storage system on release from its volume group snapshot. - VolumeGroupSnapshotContentDelete VolumeGroupSnapshotDeletionPolicy = "Delete" - - // VolumeGroupSnapshotContentRetain means the group snapshot will be left in its current - // state on release from its volume group snapshot. - VolumeGroupSnapshotContentRetain VolumeGroupSnapshotDeletionPolicy = "Retain" -) - -``` - -#### VolumeGroupSnapshot - -``` -// VolumeGroupSnapshot is a user's request for taking a group snapshot. -type VolumeGroupSnapshot struct { - metav1.TypeMeta `json:",inline"` - // Standard object's metadata. - // +optional - metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` - - // Spec defines the desired characteristics of a group snapshot requested by a user. - Spec VolumeGroupSnapshotSpec `json:"spec" protobuf:"bytes,2,opt,name=spec"` - - // Status represents the latest observed state of the group snapshot - // +optional - Status *VolumeGroupSnapshotStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"` -} - -// VolumeGroupSnapshotSpec describes the common attributes of a group snapshot -type VolumeGroupSnapshotSpec struct { - // +optional - VolumeSnapshotClassName *string - - // Source has the information about where the group snapshot is created from. - // Required. - Source VolumeGroupSnapshotSource - - // VolumeGroupSnapshotSecretRef is a reference to the secret object containing - // sensitive information to pass to the CSI driver to complete the CSI - // calls for VolumeGroupSnapshots. - // This field is optional, and may be empty if no secret is required. If the - // secret object contains more than one secret, all secrets are passed. - // +optional - VolumeGroupSnapshotSecretRef *SecretReference -} - -// OneOf VolumeGroupName or VolumeGroupSnapshotContentName -Type VolumeGroupSnapshotSource struct { - // +optional - // Dynamically provisioned VolumeGroupSnapshot - VolumeGroupName *string - - // +optional - // Pre-provisioned VolumeGroupSnapshot - VolumeGroupSnapshotContentName *string -} - -Type VolumeGroupSnapshotStatus struct { - // +optional - BoundVolumeGroupSnapshotContentName *string - - // ReadyToUse becomes true when ReadyToUse on all individual snapshots become true - // +optional - ReadyToUse *bool - - // +optional - CreationTime *metav1.Time - - // +optional - Error *VolumeSnapshotError - - // List of volume snapshots - // +optional - SnapshotList []VolumeSnapshot -} -``` - -#### VolumeGroupSnapshotContent - -``` -// VolumeGroupSnapshotContent -type VolumeGroupSnapshotContent struct { - metav1.TypeMeta `json:",inline"` - // Standard object's metadata. - // +optional - metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` - - // Spec defines the desired characteristics of a group snapshot content - Spec VolumeGroupSnapshotContentSpec `json:"spec" protobuf:"bytes,2,opt,name=spec"` - - // Status represents the latest observed state of the group snapshot content - // +optional - Status *VolumeGroupSnapshotContentStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"` -} - -// VolumeGroupSnapshotContentSpec describes the common attributes of a group snapshot content -type VolumeGroupSnapshotContentSpec struct { - // Required - // VolumeGroupSnapshotRef specifies the VolumeGroupSnapshot object - // to which this VolumeGroupSnapshotContent object is bound. - VolumeGroupSnapshotRef core_v1.ObjectReference - - // Required - VolumeGroupSnapshotDeletionPolicy VolumeGroupSnapshotDeletionPolicy - - // Required - Driver string - - // +optional - VolumeGroupSnapshotClassName *string - - // Required - Source VolumeGroupSnapshotContentSource -} - -// OneOf -type VolumeGroupSnapshotContentSource struct { - // Dynamical provisioning of VolumeGroupSnapshot - // +optional - VolumeGroupHandle *string - - // Pre-provisioned VolumeGroupSnapshot - // +optional - VolumeGroupSnapshotHandle *string -} - -Type VolumeGroupSnapshotContentStatus struct { - // VolumeGroupSnapshotHandle is a unique id returned by the CSI driver - // to identify the VolumeGroupSnapshot on the storage system. - // If a storage system does not provide such an id, the - // CSI driver can choose to return the VolumeGroupSnapshot name. - // +optional - VolumeGroupSnapshotHandle *string - - // ReadyToUse becomes true when ReadyToUse on all individual snapshots become true - // +optional - ReadyToUse *bool - - // +optional - CreationTime *int64 - - // +optional - Error *VolumeSnapshotError - - // List of volume group snapshot contents - // +optional - VolumeSnapshotContentList []VolumeSnapshotContent -} -``` - -#### PersistentVolumeClaim and PersistentVolume - -For PersistentVolumeClaim, the user can request it to be added to a VolumeGroup by adding the same label specified by the labelSelector in the VolumeGroup. In the initial phase, no changes will be proposed to PersistentVolumeClaim and PersistentVolume API objects. Before moving to Beta, we will re-evaluate this. - -#### VolumeSnapshot and VolumeSnapshotContent - -For VolumeSnapshot, we cannot request a VolumeSnapshot to be added to be VolumeGroupSnapshot, therefore VolumeGroupSnapshotName is only in the Status but not in the Spec. - -``` -type VolumeSnapshotStatus struct{ - ...... - // +optional - VolumeGroupSnapshotName *string - ...... -} - -type VolumeSnapshotContentStatus struct{ - ...... - // +optional - VolumeGroupSnapshotContentName *string - ...... -} -``` - -### Example Yaml Files - -#### Create Volume Group - -Example yaml files to create a VolumeGroupClass and a VolumeGroup are in the following. - -Create a VolumeGroupClass that supports volumeGroupSnapshot: -``` -apiVersion: volumegroup.storage.k8s.io/v1alpha1 -kind: VolumeGroupClass -metadata: - name: volumeGroupClass1 -spec: - parameters: - …... - supportVolumeGroupSnapshot: true -``` - -Create a VolumeGroup belongs to this VolumeGroupClass: -``` -apiVersion: volumegroup.storage.k8s.io/v1alpha1 -kind: VolumeGroup -metadata: - Name: volumeGroup1 -spec: - volumeGroupClassName: volumeGroupClass1 -``` - -#### Add PVC to VolumeGroup - -Create a PVC that belongs to the volume group which supports volumeGroupSnapshot: -``` -apiVersion: v1 -kind: PersistentVolumeClaim -metadata: - name: pvc1 - labels: - volumegroup:myApp -spec: - accessModes: - - ReadWriteOnce - dataSource: null - resources: - requests: - storage: 1Gi - storageClassName: storageClass1 - volumeMode: Filesystem - volumeGroupNames: [volumeGroup1] -``` - -#### Create VolumeGroupSnapshot - -Create a VolumeGroupSnapshotClass: -``` -apiVersion: volumegroup.storage.k8s.io/v1alpha1 -kind: VolumeGroupSnapshotClass -metadata: - name: volumeGroupSnapshotClass1 -spec: - parameters: - …... -``` - -A VolumeGroupSnapshot taken from the VolumeGroup dynamically: -``` -apiVersion: volumegroup.storage.k8s.io/v1alpha1 -kind: VolumeGroupSnapshot -metadata: - name: my-group-snapshot -spec: - source: - volumeGroupName: volumeGroup1 - volumeGroupSnapshotClassName: volumeGroupSnapshotClass1 -``` - -A new external VolumeGroup controller will handle VolumeGroupClass, VolumeGroup, and VolumeGroupContent resources. We may need to split this into two controllers, one common controller that handles common functions such as binding, and one sidecar controller that calls the CSI driver. - -External provisioner will be modified to read information from volume groups (through volumeGroupNames) and pass them down to the CSI driver. - -A new external VolumeGroupSnapshot controller will handle VolumeGroupSnapshotClass, VolumeGroupSnapshot, and VolumeGroupSnapshotContent resources. We may need to split this into two controllers, one common controller that handles common functions such as binding, and one sidecar controller that calls the CSI driver. - -Snapshot controller will be modified to update VolumeSnapshot status. External snapshotter sidecar will be modified to update VolumeSnapshotContent status. +For details of the VolumeGroup API design, see [here](https://docs.google.com/document/d/1VlrJGLr6YZvMrhyeQ3mJ-2Kuet9goUyzfg4Bq-NdymE/edit#). From 564a00b2f611fc6584ba1344a103edebba2ac386 Mon Sep 17 00:00:00 2001 From: xing-yang Date: Wed, 8 Feb 2023 14:34:23 -0500 Subject: [PATCH 19/19] Address comments --- keps/sig-storage/3476-volume-group-snapshot/README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/keps/sig-storage/3476-volume-group-snapshot/README.md b/keps/sig-storage/3476-volume-group-snapshot/README.md index 2bf8caceb9c..fcb4167cc30 100644 --- a/keps/sig-storage/3476-volume-group-snapshot/README.md +++ b/keps/sig-storage/3476-volume-group-snapshot/README.md @@ -136,7 +136,7 @@ Note: In the following, we will use VolumeGroupSnapshot Controller to refer to t * This will trigger the VolumeGroupSnapshot controller to create a VolumeGroupSnapshotContent API object, and also call the CreateVolumeGroupSnapshot CSI function. * The controller will retrieve all volumeSnapshotHandles in the Volume Group Snapshot from the CSI CreateVolumeGroupSnapshotResponse, create VolumeSnapshotContents pointing to the volumeSnapshotHandles. Then the controller will create VolumeSnapshots pointing to the VolumeSnapshotContents. * CreateVolumeGroupSnapshot CSI function response - * The CreateVolumeGroupSnapshot CSI function should return a list of snapshots (Snapshot message defined in CSI Spec) in its response. The VolumeGroupSnapshot controller can use the returned list of snapshots to construct corresponding individual VolumeSnapshotContents and VolumeSnapshots, wait for VolumeSnapshots and VolumeSnapshotContents to be bound, and update SnapshotList in the VolumeGroupSnapshot Status and SnapshotContentList in the VolumeGroupSnapshotContent Status. + * The CreateVolumeGroupSnapshot CSI function should return a list of snapshots (Snapshot message defined in CSI Spec) in its response. The VolumeGroupSnapshot controller can use the returned list of snapshots to construct corresponding individual VolumeSnapshotContents and VolumeSnapshots, wait for VolumeSnapshots and VolumeSnapshotContents to be bound, and update SnapshotRefList in the VolumeGroupSnapshot Status and SnapshotContentList in the VolumeGroupSnapshotContent Status. * Individual VolumeSnapshots will be named in this format: * - * A label with VolumeGroupSnapshot name will also be added to the VolumeSnapshot @@ -146,6 +146,8 @@ apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshot metadata: name: snapshot1 + labels: + volumeGroupSnapshotName: groupSnapshot1 spec: source: persistentVolumeClaimName: vsc1