-
-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add demo for dataset generation. #7
Conversation
As I'm unfamiliar with the dm ecosystem, I find this impressive yet a tad difficult to understand. It'll be very useful in the near future; for now I'd vote for focussing on a simple mvp inspired by |
The PR consists of two parts,
For 1, I think it's important to describe formally what the spec looks like. My proposal is to leverage the already defined spec from RLDS and store a flattened nested group of datasets in HDF5. Ideally, all datasets that will be released by Kabuki should use the same format. I described a few things in D4RL I wish that we can avoid in future datasets. e.g., the omission of terminal observations in D4RL which can cause problems for some lines of offline RL research. If anything, I think for future datasets we should aim for capturing lossless information whenever possible so that these datasets can be utilized by a wider community. This also means that we should support storing episodes that
I feel we could open a discussion about whether that suffices a broad range of use cases and where they are scenarios where that won't suffice. Ideally, this would be a format that all future RL datasets in Kabuki should use so it's also important to consider the implications of
I agree that we should maybe focus on an MVP for now, but here are just some of my thoughts and I hope they would be useful in one way or another. It will also be very good if potential dataset contributors have specific questions and would like to clarify. Regarding the MDPDatasets. I believe @WillDudley is referring to #6. I briefly looked through the PR and here are some of my thoughts.
|
Thanks for the detail! I'll comment on my PR first: My PR was and is intended to be a very rough prototype for a MVP. D3RL's MDPDataset was chosen as I've had some experience of playing with MDPDatasets before - meaning that it was relatively easy for me to copy/paste over to Kabuki. The fact that d3rl uses cython wasn't a factor in choosing it, so I don't have much to comment regarding that. My PR focuses more on the hosting side of things, namely
|
I fully agree that the more information recorded the better. I've also been thinking that users should have the flexibility to use their own logger if they wish. Making it easy for a user to convert their dataset/buffer into our format is important.
For sure. Some datasets will likely be fairly large, though I can't say to what extent. The question is are there any considerations we need to take into account now to reduce complications regarding scaling in the future?
Indeed. Users may possibly get overwhelmed if there are too many columns in the datasets, but datasets should provide maximal opportunity for various models. Certain niche columns may need to be separated somehow if they end up being fairly data intensive.
I'm unfamiliar with best practices for storing time series data, I just used hdf5 for my PR due to familiarity. The user wouldn't really interact with the file format so my vote is whatever's fast and reliable.
Oh, this is very useful! Please continue on this! I'm away tomorrow but back Monday evening :) |
Another thing to consider is the security of the file format, but that should be fine as long as it's not pickle |
Hey it would help if you could elaborate on what additional data fields d3rl could do with, as you say "Providing a dataset loading where only transitions of the form (s, a, r, s') will not suffice for some offline methods such as Decision Transformers" without elaborating. The first and last states can be inferred from their position. In any means, I'm adding term and trunc to the dataset to distinguish between the two. The only info rlds provides that d3rl doesn't is the discount factor, but that isn't really used anyway. |
This PR includes a demonstration for logging environment interactions and generating datasets in HDF5 format.
Please refer to README.md for an explanation of the dataset specification.