Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

off-topic - would you like help create a package for symbolic representation of sets? #46

Closed
fkiraly opened this issue Apr 5, 2024 · 10 comments

Comments

@fkiraly
Copy link

fkiraly commented Apr 5, 2024

Given your profile and obvious interest for the intersection of math, python, and symbolic computation: would you be interested in creating a package for symbolic representation of sets?

I've been looking for sth like this for:

  • domain and domain validation of parameters in scikit-learn-like packages
  • for symbolic representation of the domains of probability distributions, in skpro: https://github.com/sktime/skpro
@VascoSch92
Copy link
Owner

Hey 👋

it seems really interesting.

Do you have already some code on what you would like, or do you have a road-map/project plan on how the package should look like and what are the feature/IPA you are expecting?

PS (Off-topic): thanks for your other contributions (the PR and the suggestions). I will look deeply into them as soon as possible :-)

@fkiraly
Copy link
Author

fkiraly commented Apr 6, 2024

Do you have already some code on what you would like, or do you have a road-map/project plan on how the package should look like and what are the feature/IPA you are expecting?

Not in too much detail, but in essence I would like a symbolic representation language with eager computations for containedness, cardinality, and possibly other things like volumne/mass and expectation-like integrals.

I'd imagine the python counterpart of:

Architecturally:

  • sets would be parametric objects, inheriting from scikit-base BaseObject,
  • skpro distributions get an attribute "domain", which is then sets-valued
  • optionally, skpro distributions get a "mass" method which takes sets and outputs the mass w.r.t. the distr

From a roadmap perspective, therefore, it would be crucial to cover first:

  • the reals, integers, and natural numbers
  • finite part of Borel algebra over reals
  • finite sets of integers and arbitrary objects
  • composition: Cartesian product sets, power sets

@fkiraly
Copy link
Author

fkiraly commented Apr 6, 2024

Another related location is sklearn.utils._param_validation, which has a primitive language to define sets for parameter constraints. This is probably closest to hypothesis.

@VascoSch92
Copy link
Owner

It seems really interesting.

But why a package and not just a module in skpro? From what I see the main purpose of this project will be to serve the skpro distributions right?

If then we are happy, we could make it a proper package.

@fkiraly
Copy link
Author

fkiraly commented Apr 7, 2024

But why a package and not just a module in skpro? From what I see the main purpose of this project will be to serve the skpro distributions right?

Agreed, that might be a sensible starting point, and then one could spin it out, if other people are using it.

@VascoSch92
Copy link
Owner

Ok

Can we create a branch and we start working on it? Once we are happy with that we can merge, ok?

Do you have in mind a name for the module/package?

@VascoSch92
Copy link
Owner

I was reading the documentation about how SymPy implements sets (here). The API seems very interesting. Did you have taken in consideration their implementation for skpro or it doesn't have the functionalities you want to have?

@fkiraly
Copy link
Author

fkiraly commented Apr 7, 2024

very interesting point - I have not thought about this carefully, but perhaps the sympy sets module is sufficient for the use cases I have in mind.

There is even a statistics module with lots of distributions and support for composites: https://docs.sympy.org/latest/modules/stats.html
but that is not sufficient for the use case of tabular distribution representation and eager methods.

Perhaps some kind of a hybrid approach would work?

On the other hand, sympy is quite heavy as a dependency, for a use case where we do not need symbolic reasoning, simplification, evaluation, etc, but only representation mainly.

I think it's worth writing up the scope more precisely before we decide either way. I'll open an issue in skpro to discuss - given that you affirmed that you are interested, we can continue to scope options there?

@fkiraly
Copy link
Author

fkiraly commented Apr 8, 2024

opened here: sktime/skpro#244

@VascoSch92
Copy link
Owner

Closing as the discussion is moved to skpro repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants