-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What should we do with all our input files? #22
Comments
Agreed.
I also worry about the implicit endorsement that we give to certain input files. However, I think splitting the data files up among the examples would lead to a mess of duplicate and near-duplicate files. The
Agreed.
Agreed. I assume that the plan would be to add deprecation warnings to the affected |
For CHEMKIN input, I do think that there is a place in examples (rather than removing them): see proposed input example in Cantera/cantera#777 |
Regarding sample input files, one option would be to be explicit in the description fields that they are not meant for production. I drafted a description for Cantera/cantera#789 to provide input to this discussion. |
For example/sample input files, what about adding a folder: For example, instead of putting the example input files in I'm not sure how this would work for the other interfaces, though. Maybe just adding a I recognize this is basically a semantic difference, so I don't want to draw out a long discussion. Just trying to think of any simple alternatives, in addition to or in place of editing the files themselves with more context.
Yes, this was what I was thinking. |
Not having input files on the search path would likely be confusing to many people. An alternative 'samples' folder could be added to the search path (in case it's not recursive already), but there wouldn't be any difference in user handling. I honestly only see two alternatives:
Option (1) has the advantage that input files won't be available outside of the folder a sample resides in, whereas (2) is simple and easy to maintain (while assuming some user awareness). PS: adding a relative path to input files is a third option (3), which however reduces code legibility PPS: Option (1) also has the disadvantage to clutter folders containing multiple examples (e.g. in Python). |
Yes, I agree, and I think we should avoid doing that 😄 Let me clarify my suggestion (as it has become clearer to me in the last few hours). I am suggesting that we commit files like As another point, I think we should try to reduce the overall number of example-related input files in the repository. For instance, we have two different versions of Deutshmann's methane-on-Pt mechanism. I don't see a reason why the example using the older Last, for the record, the case that best represents what I really want from this change is that these types of input files go into Regarding Cantera/cantera#789, it is likely that the best place for that input file to go will be this new structure that we're discussing here, or else the sample should be converted to use a different input file (I don't recall which phases/thermo are specified in that example, so I'm not sure if we have an existing data file that would suffice). |
@bryanwweber ... I see what you’re trying to accomplish, but I am unsure that the differentiation won’t be lost on the users. If the search path is augmented, then there won’t be a difference in usage, while it will be more difficult for users to locate the actual data folder. I.e. I’m not sure that people will pick up on this suggested change in the intended way. Since your suggestion is closer to ( Overall, I don’t think that the storage location will make a huge difference, as I ultimately believe that a clarification in the yaml description should suffice to steer users away from using input files that are not intended for production. Further: As was mentioned elsewhere, even GRI30 is not up to par any longer. I guess adding a clarification to a |
I agree with @ischoegl that moving the files to a new directory but still including that directory on the default search path wouldn't be something most users notice. As an alternative, what if we moved sample data to the I think there's a lot of utility in having at least a few input files that can be used right out of the box for reasonable calculations, if not necessarily cutting-edge research, especially if they are from reputable sources, have descriptions of their origins, and are named appropriately (e.g. I think |
@speth ... now that you're writing it out, the relative path doesn't look as awkward as I anticipated. I'd still suggest loading things as As a likely controversial comment, I am wondering about the rationale between the split of sample locations for different platforms (i.e. C++ samples, Python samples, etc.)? Long-term, it may be simpler to just have one root folder holding all sample variants? In this case, there may be an obvious location for sample data that is shared for all sample versions (which may deviate from the current discussion). |
I very much like that solution (no pun intended). We'd just need to make sure that this doesn't hide a local
I agree, and I think @ischoegl suggestion to include information about that in the
I'm not sure what the rationale was at the time it was done 🤷♂ I think this does deviate significantly enough from the current discussion that it's worth a separate issue. |
@bryanwweber ... per your suggestion, I added (edit: and closed) a new issue #27 for discussion.
I think this is a very good idea. |
Sure, I was mainly attempting to describe the mechanics of the approach, not settle on the specific directory name. But I agree that
I don't think there's any risk of that. As I envision it, if your Cantera search path is:
If that wasn't sufficiently clear, I am in no way opposed to deduplication. One implementation issue, then, is how to handle moving sample data files from their current location to the |
@speth ... looks like an excellent solution (I assume you're not suggesting a hard-coded folder name, i.e. |
@speth I agree with everything you've got there. @ischoegl I'm not sure what you mean by "hard-coded", but I think the folder will be In terms of a tactical approach to this, I'd suggest having two pull requests: 1) Remove/move CK input files and deprecate YAML files that will be removed and 2) Move example input files and implement the necessary checks and changes to the examples. Hopefully that will limit the scope of each and make them easier to review... |
This provides the ability to organize data files included with Cantera and differentiate data which is useful for different purposes, e.g. sample data which is only meant to demonstrate capabilities and may not be physically meaningful. See Cantera/enhancements#22. Fixes Cantera#774
This provides the ability to organize data files included with Cantera and differentiate data which is useful for different purposes, e.g. sample data which is only meant to demonstrate capabilities and may not be physically meaningful. See Cantera/enhancements#22. Fixes Cantera#774
This provides the ability to organize data files included with Cantera and differentiate data which is useful for different purposes, e.g. sample data which is only meant to demonstrate capabilities and may not be physically meaningful. See Cantera/enhancements#22. Fixes Cantera#774
This provides the ability to organize data files included with Cantera and differentiate data which is useful for different purposes, e.g. sample data which is only meant to demonstrate capabilities and may not be physically meaningful. See Cantera/enhancements#22. Fixes Cantera#774
This provides the ability to organize data files included with Cantera and differentiate data which is useful for different purposes, e.g. sample data which is only meant to demonstrate capabilities and may not be physically meaningful. See Cantera/enhancements#22. Fixes #774
Some input of a regular user: I was trying to follow this example: Somewhere in these help pages, there should also be one (or several) links to where to find the yaml files. https://cantera.org/examples/index.html |
Given the goal of Cantera/cantera#1681 of converting the Jupyter notebook examples into Python examples in the main repo, there is the need to deal with the input files used by a couple of these examples. Specifically, stirred_reactor.ipynb is a nice example that makes comparisons between experimental and numerical simulation data using a mechanism from the literature which happens to be fairly large. The mechanism file, galway.yaml, has 1268 species and 5336 reactions, and weighs in 1.2 MB, more than all the input files in the main cantera Beyond just this one case, I think there's value in having more examples that make use of more up-to-date mechanisms than GRI 3.0, but I'd like to keep what's included in the base One option that comes to mind is to create a separate We can then make whether and how to install this data a SCons option, and configure it as we like for different packaging routes. For example, if we want to keep the PyPI package light, we can exclude this data (since the examples that need it aren't part of the PyPI package anyway). But for Conda or Ubuntu where there are finer-grained packaging controls, we could build an optional |
I like this suggestion quite a bit. Hopefully, we can convince researchers to contribute their own mechanisms so this will become 'community-maintained', although I'm not necessarily having high hopes for that to happen. In an ideal world ... . There are some community members who have created collections of mechanisms, but I'm not sure that this is an objective as we do not want to (implicitly) recommend some mechanisms over others? |
I've definitely had some concerns about endorsing the use of specific mechanisms, but the existing CollectionOfMechanisms repository that @jiweiqi created has been around for a while now and doesn't seem to have generated any controversy that I'm aware of. It has also seen at least a few contributions from other researchers. The suggestion I'm making here, though, is focused on data used in our examples, not a broader repository of potentially-useful mechanisms. However, building an easy-to-install package that would make a larger set of mechanisms available on the Cantera data path could be interesting as well. |
Another option is to use git LFS, although I see you've already set up the new repo so this may be a bit late 😊 |
I wouldn't say it's too late -- there's nothing stopping us from just deleting that repo. I did consider git LFS, but hesitated due to lack of experience with it and not having a good idea of whether there were any pitfalls to be aware of. If you've used it and found it easy to work with, that would be a good enough reason to try it out here. |
I have not found git LFS to be a good solution. More like a necessary evil when you have gigabyte-sized files. |
I would still be 👍 with the submodule (despite having used git LFS in the past without problems). My question would be whether some of the mechanisms in |
Thanks for the feedback on git LFS, @rwest. While adding this one file certainly wouldn't make a significant difference, what I'm hoping to get out of doing this is to relieve the tension between using a wider variety of reasonably up-to-date mechanisms in the examples and bloating the main repo.
Potentially, but most of the mechanisms in the
My thought with keeping the structure flat is that it would then let users load the mechanisms as |
I am not necessarily concerned about unit tests requiring a submodule (which I believe makes sense). The bigger concern I have is packaging: as an example, conda packages distribute Python samples. At the same time, it's pretty simple to copy things over during the installation/packaging process, which is probably the obvious solution here. |
I see that as a fairly clean place to draw a dividing line. The main repo should contain everything (save external dependencies) needed to compile and test Cantera. This is important for packaging systems such as Ubuntu/Launchpad where the expectation is to build from a tarball, not a Git repository. If we really want to clarify that some of the input files currently in the main
Yes, my original suggestion was to have SCons handle this as an option during installation, though in some cases it may end up falling to the scripts responsible for building each distinct package. |
Arguably, it may make sense to create a new |
I'm not sure I caught all the details, I've had the flu but wanted to add the following for consideration: So the plumed-nest model is what I would advocate for. In fact, I wanted to make something like that for this field of work even before plumed existed. A separate repository with something simpler than plumed-nest [a simpler contribution form without more than regex validations] could be a foundation for something like plumed-nest. |
Abstract
We have many input files that are just committed to the repository and others that are distributed with Cantera. We should discuss which of these should be deprecated and removed, which should be removed without deprecation, and where the files that are staying should live.
Motivation
In Cantera/cantera#768, we determined that several input files should be deprecated and/or removed. However, due to the size of that PR, specific discussion was deferred. This issue is where I propose to have the relevant discussion.
At present, the following input files are committed to the repository:
The list of files in the
data
folderThere are several "classes" of files here:
lithium_ion_battery.cti
)nasa_condensed.cti
,liquidvapor.yaml
)critProperties.xml
,elements.xml
)Some input files might be cross some of these lines (e.g.,
liquidvapor.yaml
), so the distinctions are a little fuzzy.Possible Solutions
To get the discussion started, I'll propose the following changes:
data
folder. They are (or will shortly be) no longer needed at build time, so they will become clutter. Select files should be moved totest/data
for tests of the conversion scripts.data
implies that the files therein are more important somehow, and moving the input files to an example-specific folder removes some of the implicit endorsement that we are giving to these files by distributing them.a. This is mildly complicated because several files are used in several examples across a few interfaces
elements.xml
,critProperties.xml
, andliquidvapor.yaml
remain in thedata
folderI'm not particularly attached to this arrangement, just throwing it out there to get something started. I'm also not particularly attached to this happening on any particular timeline, so discussion is more than welcome.
The text was updated successfully, but these errors were encountered: