Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add polars version of dummy proba regressor #447

Open
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

julian-fong
Copy link
Contributor

@julian-fong julian-fong commented Aug 3, 2024

Completes #440 and is a polars extension of #437

Adds polars version of dummy proba regressor - also to help test e2e input flow of polars tables

In this pull request:

  • Introduces a Dummy probabilistic regressor that supports the x and y inner mtype polars_eager_table. However, since BaseDistribution objects are currently is not supported for polars DataFrames, we leverage pandas DataFrames inside both _fit and _predict_proba. Otherwise the functionality should mirror [ENH] DummyProbaRegressor - probabilistic dummy regressor #437.
  • Introduces a skpro.utils.polars file which contains 2 functions polars_split_index_values_frame and polars_combine_index_values_frame. The goal of these two functions is to split and combine __index__ columns in and out of the main polars dataFrame (if they are created via the _convert utils from pandas to polars) in order to properly do predictions or fitting.
  • Adds a simple polars e2e test inside test_polars.py, as well as other polars tests that is related to the adapter enhancements as part of [ENH] Polars adapter enhancements #449

@julian-fong
Copy link
Contributor Author

tests in check_estimator should be failing... for _predict_* functions i.e quantile and proba

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 5, 2024

They are not failing because there are no tests covering the new code, i.e., feeding polars objects, is that not so?

@fkiraly fkiraly added enhancement module:regression probabilistic regression module module:datatypes datatypes module: data containers, checkers & converters implementing algorithms Implementing algorithms, estimators, objects native to skpro labels Aug 5, 2024
@julian-fong

This comment was marked as resolved.

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 5, 2024

that's strange. Could you kindly investigate?

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 5, 2024

oh, I think I got it - the estimator is in a private module. The test framework skips estimators that are private and not publicly exported.

@julian-fong
Copy link
Contributor Author

@fkiraly How do we specify to ignore 'pandas dataframe' tests for estimators that do not have pandas specified x/y mtypes?

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 12, 2024

we should not skip these - it should still work with back/from conversion.

fkiraly pushed a commit that referenced this pull request Aug 18, 2024
adds index support as part of #440 and is used to sync up polars
conversion utilities between skpro and sktime.

Correponding sktime pr for polars conversion utilities is
sktime/sktime#6455.

In this pr:

If a pandas Dataframe is a `from_type` and polars frame is a `to_type`
then during the conversion, we will save the index (assumed never to be
in multi-index format) and insert it as an individual column with column
name `__index__`. Then the resulting pandas dataframe will be converted
to a polars dataframe.

In the inverse function, if we are converting from polars dataframe to
pandas dataframe, if the column `__index__` exists in the pandas
dataframe post-conversion, then we will map that column to the index
before returning the pandas Dataframe

After this is merged, #447 will be implemented as a `polars` only
estimator. tests will also be written to check polars input end to end
and pandas input and output through the polars estimator (i.e pandas
input into polars estimator -> polars predictions -> pandas output)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement implementing algorithms Implementing algorithms, estimators, objects native to skpro module:datatypes datatypes module: data containers, checkers & converters module:regression probabilistic regression module
Projects
Status: Under review
Development

Successfully merging this pull request may close these issues.

2 participants