Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multitarget stratification #1308

Merged
merged 7 commits into from
Jul 2, 2024
Merged

Conversation

DRMPN
Copy link
Collaborator

@DRMPN DRMPN commented Jun 30, 2024

This is a 🐛 bug fix.

Summary

Disable multitarget stratification when classes are unbalanced.
Add test for this specific scenario.

Context

Fixes #1307

@DRMPN DRMPN requested review from nicl-nno and Lopa10ko June 30, 2024 20:03
@DRMPN DRMPN self-assigned this Jun 30, 2024
Copy link

docu-mentor bot commented Jun 30, 2024

👋 Hi, I'm @docu-mentor, an LLM-powered GitHub app
powered by Anyscale Endpoints
that gives you actionable feedback on your writing.

Simply create a new comment in this PR that says:

@docu-mentor run

and I will start my analysis. I only look at what you changed
in this PR. If you only want me to look at specific files or folders,
you can specify them like this:

@docu-mentor run doc/ README.md

In this example, I'll have a look at all files contained in the "doc/"
folder and the file "README.md". All good? Let's get started!

@pep8speaks
Copy link

pep8speaks commented Jun 30, 2024

Hello @DRMPN! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2024-07-02 08:45:21 UTC

Copy link
Contributor

github-actions bot commented Jun 30, 2024

All PEP8 errors has been fixed, thanks ❤️

Comment last updated at

@DRMPN
Copy link
Collaborator Author

DRMPN commented Jun 30, 2024

Метод внутри пакета sklearn в train_test_split() выполняет one hot encode и превращает изначальные 7 классов в 11:

image

Вследствие появления "новых" классов, их количества не хватает для стратификации, из-за чего появляется ошибка. Предлагаю превентивно производить проверку и отключать стратификацию.

@DRMPN
Copy link
Collaborator Author

DRMPN commented Jun 30, 2024

/fix-pep8

Copy link
Collaborator

@Lopa10ko Lopa10ko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

выглядит валидно, если действительно спасает от ошибки деления со стратификацией, как показано на скрине.

test/unit/data/test_data_split.py Outdated Show resolved Hide resolved
@Lopa10ko
Copy link
Collaborator

Lopa10ko commented Jul 2, 2024

/fix-pep8

@DRMPN DRMPN merged commit 03e4736 into master Jul 2, 2024
6 checks passed
@DRMPN DRMPN deleted the DRMPN-fix-multitarget-stratify branch July 2, 2024 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Stratify ValueError: The least populated class in y has only 1 member, which is too few. [...]
4 participants