Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema Registry: GET /schemas/ids/<id> should return references #11216

Merged

Conversation

BenPope
Copy link
Member

@BenPope BenPope commented Jun 5, 2023

Return references for GET /schemas/ids/<id>.

Fixes #11194

Notes for reviewer:

The store has two mappings:

  1. schema_map: Schema ID to Schema
  2. subject_map: Subject-Versions to Schema ID

Each of these mappings are then balanced across shards, coordinated by sharded_store.

Prior to this change, references were attached to the Subject-Versions.

This is an incorrect modelling; Schema should own the references to other schema. This commit does that refactoring by moving references from typed_schema to typed_schema_definition.

This allows returning the references when querying a schema by ID.

Additionally there is a fix to ignore references when a subject-version has been soft-deleted.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

Bug Fixes

  • Schema Registry: Return references for GET /schemas/ids/<id>.

This will make the next refactor slightly easier.

Signed-off-by: Ben Pope <ben@redpanda.com>
This will make the next refactor slightly easier to comprehend.

Signed-off-by: Ben Pope <ben@redpanda.com>
Signed-off-by: Ben Pope <ben@redpanda.com>
@BenPope BenPope self-assigned this Jun 5, 2023
@BenPope BenPope added area/schema-registry Schema Registry service within Redpanda and removed area/redpanda labels Jun 5, 2023
@BenPope BenPope force-pushed the schema_registr_schema_by_id_with_references branch from a42bfb9 to 7a03737 Compare June 5, 2023 22:42
@BenPope BenPope force-pushed the schema_registr_schema_by_id_with_references branch from 7a03737 to a408202 Compare June 5, 2023 22:46
@BenPope BenPope force-pushed the schema_registr_schema_by_id_with_references branch from a408202 to 05de998 Compare June 5, 2023 23:54
dotnwat
dotnwat previously approved these changes Jun 6, 2023
acc.insert(acc.end(), refs.begin(), refs.end());
return acc;
});
absl::c_sort(references);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is htis different than std::sort used above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by using absl::btree_set

Comment on lines 321 to 340
std::vector<schema_id>
subject_versions_with_any_of(const std::vector<schema_id>& ids) {
std::vector<schema_id> has_ids;
for (const auto& s : _subjects) {
for (const auto& r : s.second.versions) {
if (absl::c_binary_search(ids, r.id)) {
has_ids.push_back(r.id);
}
}
}
return has_ids;
}

bool subject_versions_has_any_of(const std::vector<schema_id>& ids) {
return absl::c_any_of(_subjects, [&ids](const auto& s) {
return absl::c_any_of(s.second.versions, [&ids](const auto& v) {
return absl::c_binary_search(ids, v.id);
});
});
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i assume these require the inputs to be sorted? probably worth a comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by using absl::btree_set

This should be a pure refactoring.

The store has two mappings:
1) `schema_map`: Schema ID to Schema (sharded across cores)
2) `subject_map`: Subject-Versions to Schema ID

Each of these mappings are then split across shards, coordinated
by sharded_store.

Prior to this change, references were attached to the Subject-Versions.

This is an incorrect modelling; Schema should own the references
to other schema. This commit does that refactoring by moving references
from `typed_schema` to `typed_schema_definition`.

There are a couple of things that are not straightforward:
1) `ValidSchema` (`protobuf_schema_definition`, `avro_schema_definition`)
   are the respective library instantiations of the representation,
   and are unable to reproduce their references to form a
   `canonical_schema_definition`, so references are stored inside the
   wrappers.
2) `referenced_by` and `is_reference` now require two phases to
   obtain the references, due to the sharded and diconnected nature of
   the two mappings (schema, and subject-versions).
   a) Obtain a list of schema ids that reference the subject-version
   b) Filter that list by subject-versions that are not soft-deleted

Signed-off-by: Ben Pope <ben@redpanda.com>
Signed-off-by: Ben Pope <ben@redpanda.com>
`GET /schemas/ids/<id>` now returns its references.

Fixes redpanda-data#11194

Signed-off-by: Ben Pope <ben@redpanda.com>
@BenPope BenPope force-pushed the schema_registr_schema_by_id_with_references branch from 05de998 to 9298788 Compare June 6, 2023 11:10
@BenPope
Copy link
Member Author

BenPope commented Jun 6, 2023

Changes in force-push:

  • Refactor the store references code by using an absl::btree_set to simplify the logic and enforce the constraints.

@BenPope BenPope requested a review from dotnwat June 6, 2023 17:38
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

just to confirm:

    schema_id_set referenced_by(const subject& sub, schema_version ver) {
        schema_id_set references;
        for (const auto& s : _schemas) {
            for (const auto& r : s.second.definition.refs()) {
                if (r.sub == sub && r.version == ver) {
                    references.emplace_back(s.first);
                    references.insert(s.first);
                }
            }
        }

The referenced_by helper doesn't need to integrate the soft-delete flag?

@BenPope
Copy link
Member Author

BenPope commented Jun 6, 2023

The referenced_by helper doesn't need to integrate the soft-delete flag?

Good question. I poked around and found no mechanism for it to be required.

@dotnwat dotnwat merged commit 82ed4e4 into redpanda-data:dev Jun 6, 2023
@vbotbuildovich
Copy link
Collaborator

/backport v23.1.x

@vbotbuildovich
Copy link
Collaborator

/backport v22.3.x

@vbotbuildovich
Copy link
Collaborator

/backport v22.2.x

@vbotbuildovich
Copy link
Collaborator

Failed to run cherry-pick command. I executed the commands below:

git checkout -b backport-pr-11216-v23.1.x-210 remotes/upstream/v23.1.x
git cherry-pick -x f9178757c31231ab2fa3896e8a9ac13a556a3c2f 4972bcc7f33802561db7ea44df077e7cbcf6f705 5d5072bd79603128768841b3dba98aa4e5b54dbd 2fc2dfee3b14b19ecc729c80d2b52bf344a7de73 c13d874f464da1b7d1981c0cdb626b7256986a76 92987882a32cdd6221e20a58d43092890fa5a178

Workflow run logs.

@vbotbuildovich
Copy link
Collaborator

Failed to run cherry-pick command. I executed the commands below:

git checkout -b backport-pr-11216-v22.2.x-354 remotes/upstream/v22.2.x
git cherry-pick -x f9178757c31231ab2fa3896e8a9ac13a556a3c2f 4972bcc7f33802561db7ea44df077e7cbcf6f705 5d5072bd79603128768841b3dba98aa4e5b54dbd 2fc2dfee3b14b19ecc729c80d2b52bf344a7de73 c13d874f464da1b7d1981c0cdb626b7256986a76 92987882a32cdd6221e20a58d43092890fa5a178

Workflow run logs.

@vbotbuildovich
Copy link
Collaborator

Failed to run cherry-pick command. I executed the commands below:

git checkout -b backport-pr-11216-v22.3.x-596 remotes/upstream/v22.3.x
git cherry-pick -x f9178757c31231ab2fa3896e8a9ac13a556a3c2f 4972bcc7f33802561db7ea44df077e7cbcf6f705 5d5072bd79603128768841b3dba98aa4e5b54dbd 2fc2dfee3b14b19ecc729c80d2b52bf344a7de73 c13d874f464da1b7d1981c0cdb626b7256986a76 92987882a32cdd6221e20a58d43092890fa5a178

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda area/schema-registry Schema Registry service within Redpanda
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing references on Protobuf Schema when accessing /schemas/ids/{id}
3 participants