Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: add random seed to dataset functions for reproducibility #3474

Closed
MattGPT-ai opened this issue Jun 21, 2024 · 0 comments
Closed
Labels
feature A new feature

Comments

@MattGPT-ai
Copy link
Contributor

Problem statement

When performing random splitting and downsampling of datasets, including when loading Flair datasets, there is no option to set the random seed specifically for that function. You can set random seed outside of the function, but this does not seem to provide as full control and reproducibility if you want to set specific different random seeds for each operation. I have had trouble controlling this when instantiating datasets that can be imported from Flair.

Solution

Ideally there would be an option to pass a random seed to functions like randomly_split_into_two_datasets and downsample. Perhaps you could pass in a random seed to dataset objects like flair.datasets.sequence_labeling.CONLL_03

Additional Context

No response

@MattGPT-ai MattGPT-ai added the feature A new feature label Jun 21, 2024
MattGPT-ai added a commit to MattGPT-ai/flair that referenced this issue Jun 21, 2024
alanakbik added a commit that referenced this issue Jul 2, 2024
…atasets

GH-3474: add random seed parameter to dataset splitting and downsampl…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature
Projects
None yet
Development

No branches or pull requests

1 participant