Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filterx: remove fields with empty values #226

Open
jszigetvari opened this issue Jul 25, 2024 · 7 comments
Open

filterx: remove fields with empty values #226

jszigetvari opened this issue Jul 25, 2024 · 7 comments
Assignees
Labels

Comments

@jszigetvari
Copy link
Contributor

jszigetvari commented Jul 25, 2024

ability to remove certain fields with empty values:

  • need to specifiy multiple potential null values ("", N/A, NA, None, 0, Default)
  • alternative names: drop_fields/remove_fields/ ...
@jszigetvari
Copy link
Contributor Author

We should also add different modes of operations, or separate the functionality to two functions:

  • to only set the properties to empty string (to be used with CSV-like formatting)
  • to properly remove the properties (for KV-like formatting)

@pepov pepov added the filterx label Jul 30, 2024
@MrAnno
Copy link
Member

MrAnno commented Aug 2, 2024

A more general solution would be implementing group_unset() for filterx (probably with a better name), where lists and regexp patterns can be specified.
I think it's not a much bigger task, so we should do this instead.

For AxoRouter, we can always add an axo-remove-null-values() SCL that parameterizes a group_set call with the known null-values.

@alltilla
Copy link
Member

alltilla commented Aug 2, 2024

I would go even further in the generalization, with something like this:

  • create kv transformation callback functions, that can only be passed to some other functions, but not called on their own
  • the callbacks could have a result to either modify the key or the value, or to skip/unset the kv
  • make our parser functions accept a list of those callbacks, and apply them before adding a value to the dict they are creating
  • make a map()-like function that accepts a dict and a list of those callbacks and applies them on all the elements

I think this architecture would scale really good with our parsers and transformation logics.

@jszigetvari
Copy link
Contributor Author

@MrAnno

For AxoRouter, we can always add an axo-remove-null-values() SCL that parameterizes a group_set call with the known null-values.

Well, in that regard we would need something where we could specify somehow (maybe different functions, or through a parameter) whether we need to actually unset the attributes or set them to emty string. (This is important because of CSV-like data, where the order of values is important.)
On top of that, perhaps there should be a default set of known null/empty values, which the user could extend or override (through an argument) if necessary.

@MrAnno
Copy link
Member

MrAnno commented Aug 2, 2024

I don't fully see how this could scale well if we consider our original decision of not allowing to define custom functions/callbacks in filterx. This means we will have to implement a set of transformation functions and also allow them to be passed to almost all of our functions that "create" new sets of data.

Coupling these transformations with the functions that create new values seems unnecessary to me in such a language where we have functional-style building blocks and where were are closed to in-language extensions (defining our own functions).

It seems cleaner to me to provide general enough transformation functions that can work on their own.

that can only be passed to some other functions, but not called on their own

This was my first trigger to think that coupling transformations with the "source of creation" may not be the best idea (both from the user's and the C implementation's perspective).

@alltilla
Copy link
Member

alltilla commented Aug 2, 2024

Sure we can do everything by having separate functions for iterating through the dict and modifying its values, but it quickly gets resource intensive if we do it multiple times for the necessary transformations. I thought we could optimize it, but that might introduce some complications in our implementation.

@MrAnno
Copy link
Member

MrAnno commented Aug 2, 2024

Let's measure some filterx performance in real-world use cases, identify the bottlenecks, and do perf-related optimizations on places we are sure are worth the complexity compared to the actual numbers.

I'm not against adding some complexity when the end result is cleaner for the user, but in this case, this coupling seems comfortable, but it's not that clean if we think about it.

@pepov pepov changed the title filterx: unset_empties() improvements filterx: remove fields with empty values Aug 6, 2024
@pepov pepov assigned alltilla and bshifter and unassigned alltilla Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants