-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CORE-7061] sr/store: Deep copy schema_def iobufs at the interface #23114
[CORE-7061] sr/store: Deep copy schema_def iobufs at the interface #23114
Conversation
sharded_store's main purpose is to supply a consistent interface to an underlying (sharded) in-memory store. So to process a given request, it must first work out the home shard for the desired schema ID or subject, dispatching requests to that shard. Recently, schema registry was refactored to store schema definitions as iobufs (rather than contiguous strings) to avoid large allocations. To avoid semantically unnecessary _copies_, get_schema_definition was implemented to return a iobuf share across a shard boundary. This is an unsafe operation, as the reference counting and deletion machinery of the underlying temporary buffers is totally unsynchronized. This is UB in principle (data race) and leads to memory access faults (UAF, etc.) in practice. So this commit changes 'store::get_schema_definition' to return a copy of the stored schema definition (iobuf), eliminating any cross-shard sharing of the underlying temp buffers. This approach is rather blunt and carries some cost at deallocation time (seastar allocator must work out the appropriate site for memory reclamation for each component temp buffer of a given schema def), but it should be safe since the component temp buffers are never live on multiple shards simultaneously. Note that this path could be optimized somewhat by returning a shared iobuf managed by foreign_ptr and carrying out the necessary copy (a read-only operation) on the requesting shard. This commit also includes a "unit test" that is more of a memory access canary, of sorts. Reverting the change in 'store::gets_schema_definition' will cause the test to fail reliably, producing memory faults in line with the reasoning above. With the change in place, this test reliably runs to completion without issue. Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
@ballard26 - talked this over with @BenPope a bit, and we came to the conclusion that returning an iobuf copy across the shard boundary should be safe since it's temporaries all the way out to the call site and no data is shared. It's a pessimization, and I assume we could do something w/ foreign_ptr to push the copy operation out to the call site. But it doesn't seem strictly necessary here since the path isn't particularly hot. WDYT? I get the impression that the seastar allocator should do the right thing when it tears down the copy, but I don't exactly know. |
I agree that using a foreign_ptr isn't needed here. It's done in the fetch path as an optimization to defer the copy since we don't always include the read batches in the fetch reply. And in those cases deferring it allows the fetch to avoid the copy altogether. The seastar allocator will also be able to recognize that the memory being free'd doesn't belong to the current shard and will return it back to the proper shard so no worries there. The only real concern is whether two shards can hold references to one of the underlying temporary_buffers in parallel. Since I think that's the cause for most if not all the race conditions. As you said since the copied One gotcha may be that in debug mode we restrict certain iobuf operations to the shard it was originally allocated on. So if a dev tries to use any of those operations on the copied iobuf they may find debug-mode tests failing. That isn't a correctness issue though and easily discovered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome
// Upsert a large(ish) number of schemas to the store, all with different | ||
// subject names and IDs, so they should hash to different shards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same schema should end up with the same ID, so the comment is a little confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In real operation, yeah, but I thought that disambiguation happened somewhere else, like on the command apply path? it looks like when I directly upsert, sharded store is happy to shove an (id, def)
pair into the schema map in memory on shard hash(id)
as requested. I might be missing/misreading something.
/backport v24.2.x |
/backport v24.1.x |
/backport v23.3.x |
sharded_store's main purpose is to supply a consistent interface to an underlying (sharded) in-memory store. So to process a given request, it must first work out the home shard for the desired schema ID or subject, dispatching requests to that shard.
Recently, schema registry was refactored to store schema definitions as iobufs (rather than contiguous strings) to avoid large allocations. To avoid semantically unnecessary copies, get_schema_definition was implemented to return a iobuf share across a shard boundary. This is an unsafe operation, as the reference counting and deletion machinery of the underlying temporary buffers is totally unsynchronized. This is UB in principle (data race) and leads to memory access faults (UAF, etc.) in practice.
So this commit changes 'store::get_schema_definition' to return a copy of the stored schema definition (iobuf), eliminating any cross-shard sharing of the underlying temp buffers. This approach is rather blunt and carries some cost at deallocation time (seastar allocator must work out the appropriate site for memory reclamation for each component temp buffer of a given schema def), but it should be safe since the component temp buffers are never live on multiple shards simultaneously.
Note that this path could be optimized somewhat by returning a shared iobuf managed by foreign_ptr and carrying out the necessary copy (a read-only operation) on the requesting shard.
This commit also includes a "unit test" that is more of a memory access canary, of sorts. Reverting the change in 'store::gets_schema_definition' will cause the test to fail reliably, producing memory faults in line with the reasoning above. With the change in place, this test reliably runs to completion without issue.
Backports Required
Release Notes
Bug Fixes