Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added iterate_episodes and made dataset an iterable #54

Merged
merged 8 commits into from
Apr 16, 2023

Conversation

Howuhh
Copy link
Contributor

@Howuhh Howuhh commented Apr 6, 2023

Description

Although the documentation says that Minari doesn't serve the purpose of creating replay buffers, the dataset itself only allows you to sample random episodes, which is not very useful in practice for creating custom replay buffers or dataloaders. It's much more useful to be able to iterate or get a generator by episodes (which won't load everything into memory). I am currently using Minari to build dataset for my tasks and this kind of functionality is very lacking for convenient use. So I added the ability to iterate through the episodes.

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have run pytest -v and no errors are present.
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I solved any possible warnings that pytest -v has generated that are related to my code to the best of my knowledge.
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@rodrigodelazcano
Copy link
Member

Hey @Howuhh thanks a lot for this PR! I'm wondering if merging this new method iterate_episodes and sample_episodes will be possible. Have a single generator iterate_episodes, with a batch_size and shuffle parameters. Let me know your thoughts about this.

I can merge this PR first after the pre-commit is fixed and make an issue for the single generator implementation

@rodrigodelazcano
Copy link
Member

rodrigodelazcano commented Apr 7, 2023

Second thoughts, I don't think batch_size is necessary since what we are trying to do here is populate a buffer. @Howuhh Can you remove the sample_episodes in favor of iterate_episodes in this PR as well?

@Howuhh
Copy link
Contributor Author

Howuhh commented Apr 7, 2023

@rodrigodelazcano To be honest, I don't really understand the current issue with the pre-commit, and what's more, locally, all the checks go through without errors (with setup described in CONTRIBUTING.md).

@rodrigodelazcano
Copy link
Member

@rodrigodelazcano To be honest, I don't really understand the current issue with the pre-commit, and what's more, locally, all the checks go through without errors (with setup described in CONTRIBUTING.md).

Interesting, let me have a look at it.

@rodrigodelazcano
Copy link
Member

@Howuhh regarding the pre-commit errors I'm also unable to replicate them locally. Looking at the logs it seems that the issue comes from pyright identifying the ndarray type as Unknown. I also think that we can omit the use of a numpy array to pass the list of indices, passing a list should be enough. Can you remove this type and see if the error is fixed?

@Howuhh
Copy link
Contributor Author

Howuhh commented Apr 7, 2023

@rodrigodelazcano Yup, but then it will be a little bit inconsistent with type of episode_indices for MinariDataset class.

@Howuhh
Copy link
Contributor Author

Howuhh commented Apr 7, 2023

also, there are examples such as dataset.name the documentation, however .name attr does not exist in the code (as far as I can tell)

@rodrigodelazcano I actually don't think it's necessary to remove sample_episodes, maybe it could be useful to someone else. For example, we do exactly that in CORL for Decision Transformer.

Copy link
Member

@younik younik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me; I just added a small comment

Comment on lines 234 to 236
for (
episode_index
) in episode_indices: # pyright: ignore [reportOptionalIterable]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to avoid #pyright: ignore here and thus make it one line?

It seems it doesn't recognize in line 140 that episode_indices can't be None, but it should work with an assert

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Howuhh can you edit this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rodrigodelazcano yup, been a little busy, I'll correct it

Copy link
Contributor Author

@Howuhh Howuhh Apr 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, checks are all green locally, so mb this will work

Copy link
Member

@rodrigodelazcano rodrigodelazcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I'll merge after @younik reviews are fixed

@rodrigodelazcano rodrigodelazcano merged commit a614223 into Farama-Foundation:main Apr 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants