Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: rio parsing #13

Merged
merged 13 commits into from
Jun 19, 2024
Merged

refactor: rio parsing #13

merged 13 commits into from
Jun 19, 2024

Conversation

cmdoret
Copy link
Member

@cmdoret cmdoret commented Jun 18, 2024

This PR refactors the whole triple parsing and pseudonymization to use oxigraph's rio types.

It does the following:

  1. Add a Pseudonymizer trait to allow easy pseudonymization of subjects and objects in the future.
    It is implemented for MaskedTriple and the rio base types (Subject and Term).
    When calling pseudo() on a masked triple, it forwards the call to the masked components to call their respective pseudo() implementation (although they currently do nothing).
  2. Replace the main parsing loop with rio's low level parse_step() method, which takes care of reading the next triple and should support multiple formats, including when 1 line != 1 triple.

@cmdoret cmdoret self-assigned this Jun 18, 2024
@cmdoret cmdoret added the enhancement New feature or request label Jun 18, 2024
@cmdoret cmdoret linked an issue Jun 18, 2024 that may be closed by this pull request
@gabyx
Copy link
Contributor

gabyx commented Jun 18, 2024

@cmdoret : Wuu, now we are getting fancy. But I think its good. I have a look at it.

Copy link
Contributor

@gabyx gabyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice I think.
Well done.

One thing: Read the comment with triple.pseudo(mask) first.
Might help, then the rest can be ignored.

src/crypto.rs Outdated Show resolved Hide resolved
src/model.rs Outdated Show resolved Hide resolved
src/model.rs Outdated Show resolved Hide resolved
src/model.rs Outdated Show resolved Hide resolved
src/model.rs Outdated Show resolved Hide resolved
src/model.rs Outdated Show resolved Hide resolved
src/pass_second.rs Show resolved Hide resolved
src/pass_second.rs Outdated Show resolved Hide resolved
src/pass_second.rs Outdated Show resolved Hide resolved
src/model.rs Outdated Show resolved Hide resolved
@cmdoret cmdoret requested a review from gabyx June 19, 2024 16:57
@cmdoret
Copy link
Member Author

cmdoret commented Jun 19, 2024

Thanks for the thorough review, you made very good points and I did quite a bit of refactoring in crate::model based on it, learning a lot in the process 😃 .
The code is definitely better, but I was hesitant between two alternatives:

We now have a TriplePart enum which can be cast to/from u8 and a TripleMask. Alternatively, we could have only TripleMask and implement many small methods has_subject, set_subject, has_predicate, ... which I found a bit redundant (i.e. something along these lines).

Not sure if there is even a better solution. At least in terms of performance it should be quite good; the mask is just a u8 and accessor/setter methods perform a bitwise operation 🚀 .

Copy link
Contributor

@gabyx gabyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!
Way better now.

The bitflag thing of yours might be good too and actually nice when we could write a for loop in the pseudonimize loop. Without that there is not much benefit over using bitflags ! (rust macros are not evil: check codes ability to expand them rust-analyzer but jeah sometimes they make things less boilerplaty but also confusing.

You can choose.
Happy wit both!

src/model.rs Outdated Show resolved Hide resolved
src/model.rs Show resolved Hide resolved
src/model.rs Outdated Show resolved Hide resolved
src/model.rs Outdated Show resolved Hide resolved
src/model.rs Show resolved Hide resolved
@cmdoret cmdoret merged commit 294fea2 into main Jun 19, 2024
@cmdoret cmdoret deleted the fix/triples branch June 19, 2024 21:16
supermaxiste pushed a commit that referenced this pull request Jun 24, 2024
supermaxiste pushed a commit that referenced this pull request Jun 24, 2024
supermaxiste pushed a commit that referenced this pull request Jun 24, 2024
supermaxiste pushed a commit that referenced this pull request Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement proper parser
2 participants