You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When ceph CSI started maintaining its journal in RADOS one if the concerns raised [1] was around the scale limits of a single RADOS object (which as of this writing is ~200k keys per object).
The CSI volumes directory RADOS object, per pool, maintains a single object that contains a key per currently in-use volumes (the csi.volumes.<InstanceID> object). At the scale limits of keys per object, meaning ~200k images per pool (or per cephfs instance, as we use the cephfs metadata pool to store the same), we would need to shard this object to overcome the RADOS per object key limits.
This is a concern in the future, and not an medium term problem, but noting this down in order that it is not missed as a reference when work is taken up towards the same.
Also noting this down due to discussion around key counts in this PR with @dillaman
[1] Older discussion references on sharding the CSI directory object:
For snapshot OMap directory, as there can be more snapshots per volume, we may reach the RADOS per object key limits sooner.
There was a fleeting desire to create a per parent-image-UUID named snapshot OMap directory, but that will not catch snapshot name collisions across different parent UUIDs, which is required by the CSi plugins.
For the snapshot OMap as well, we would need some form of sharding, @dillaman had some further thoughts on the same that I am capturing verbatim below:
RGW "solved" this issue in the past by first using sharding across a
fixed number of objects (i.e. hash the name and pick the destination
omap index object by modulo the number of objects). The downside to
that approach was that it required the user to pick the expected
maximum number of objects prior to establishing the cluster (or do an
offline reshard). Since RGW needs to always be up and it can expect to
continuously grow, they then switched to dynamic sharding [1] to
permit growth into the hundreds of millions of indexed objects.
I realistically would never expect the CSI to need the extra
complexity of something like dynamic sharding. However, you could
implement a backwards compatible fixed sharding scheme in the future
where the CSI driver sets the shard object upper limit (power of two)
and it recursively searches for a hit in decreasing shard objects
upper limits until it finds a hit. If and when it finds a hit, it
should move it to the correct shard so that future accesses don't need
to perform the search and it helps to reduce pressure on the "older"
objects.
[1] https://docs.ceph.com/docs/mimic/radosgw/dynamicresharding/
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
When ceph CSI started maintaining its journal in RADOS one if the concerns raised [1] was around the scale limits of a single RADOS object (which as of this writing is ~200k keys per object).
The CSI volumes directory RADOS object, per pool, maintains a single object that contains a key per currently in-use volumes (the
csi.volumes.<InstanceID>
object). At the scale limits of keys per object, meaning ~200k images per pool (or per cephfs instance, as we use the cephfs metadata pool to store the same), we would need to shard this object to overcome the RADOS per object key limits.This is a concern in the future, and not an medium term problem, but noting this down in order that it is not missed as a reference when work is taken up towards the same.
Also noting this down due to discussion around key counts in this PR with @dillaman
[1] Older discussion references on sharding the CSI directory object:
The text was updated successfully, but these errors were encountered: