Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add references field to yaml input standard #777

Closed
wants to merge 6 commits into from

Conversation

ischoegl
Copy link
Member

@ischoegl ischoegl commented Dec 19, 2019

  • There is a clear use-case for this code change
  • The commit message has a short title & references relevant issues
  • Build passes (scons build & scons test) and unit tests address code coverage
  • The pull request is ready for review

Please fill in the issue number this pull request is fixing

Fixes Cantera/enhancements#15, fixes #339

Changes proposed in this pull request

  • Add references field to YAML input standard
  • Entries are generated from BibTeX formatted files (passed as --bibtex= option to ck2yaml)
  • Add references to gri30.yaml for illustration:
references:
  gri30:  # @misc
    title: GRI-Mech 3.0
    author: Gregory P. Smith, David M. Golden, Michael Frenklach, Nigel
      W. Moriarty, Boris Eiteneer, Mikhail Goldenberg, C. Thomas Bowman,
      Ronald K. Hanson, Soonho Song, William C. Gardiner Jr., Vitali V.
      Lissianski and Zhiwei Qin
    year: 1999
    howpublished: |-
      \url{http://combustion.berkeley.edu/gri-mech/version30/text30.html}

@ischoegl ischoegl force-pushed the add-references-to-yaml branch 6 times, most recently from 5743959 to 7209405 Compare December 19, 2019 17:17
Copy link
Member

@bryanwweber bryanwweber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ischoegl for the ideas here! I'm not sure what intent you have with this PR. Is this intended to be an alternate to #768? As I said there, I think we should move away from generating the YAML files from CK input during build.

interfaces/cython/cantera/ck2yaml.py Show resolved Hide resolved
@ischoegl
Copy link
Member Author

ischoegl commented Dec 19, 2019

I'm not sure what intent you have with this PR.

Fixing #339. I believe the point it raised is valid.

Is this intended to be an alternate to #768? As I said there, I think we should move away from generating the YAML files from CK input during build.

The changed .inp are all I had in mind in my comments in #768. I attached it here as it's just a couple of lines and I can easily drop the separate commits. (argon.yaml was curious, as there was an unused argon.inp).

Please note that this pertains only to the ck2yaml route, where I believe it to be cleaner to keep the original files. This could likely be solved by creating a cantera.conf flag that loads YAML from another repo/location if ruamel.yaml isn't available. Regarding Python 2, I already opined elsewhere that it probably shouldn't be a limiting factor now that it has essentially reached end of life.

Moot (commits are dropped)

@ischoegl ischoegl mentioned this pull request Dec 19, 2019
3 tasks
@ischoegl ischoegl force-pushed the add-references-to-yaml branch 3 times, most recently from a129149 to b1ac013 Compare December 20, 2019 21:30
@ischoegl
Copy link
Member Author

ischoegl commented Dec 20, 2019

@bryanwweber ... I dropped the commits updating headers of .inp files.

I.e. this PR now just adds the references field.

@ischoegl
Copy link
Member Author

ischoegl commented Jan 7, 2020

@bryanwweber ... it looks like chemkin input files are no longer used in the build script, which eliminates the 'hook' I used for this PR. What do you suggest?

See SConstruct diff:

<<<<<<< HEAD
# Convert input files from Chemkin format to YAML
ck_sources = [
    dict(output='gri30.yaml', input='data/inputs/gri30.inp',
         thermo='data/thermo/gri30_thermo.dat',
         transport='data/transport/gri30_tran.dat',
         bibtex='data/bibtex/gri30.bib',
         phase='gri30'),
    dict(output='air.yaml', input='data/inputs/air.inp',
         transport='data/transport/gri30_tran.dat',
         phase='air'),
    dict(output='airNASA9.yaml', input='data/inputs/airNASA9.inp',
         thermo='data/thermo/airDataNASA9.dat',
         bibtex='data/bibtex/airNASA9.bib',
         phase='airNASA9'),
    dict(output='h2o2.yaml', input='data/inputs/h2o2.inp',
         transport='data/transport/gri30_tran.dat',
         phase='ohmech'),
    dict(output='silane.yaml', input='data/inputs/silane.inp')
]

for mech in ck_sources:
    convertedInputFiles.add(mech['output'])
    cmd = ('$python_cmd_esc interfaces/cython/cantera/ck2yaml.py'
           ' --quiet --no-validate --input=$SOURCE --output=$TARGET')
    if 'thermo' in mech:
        cmd += ' --thermo=' + mech['thermo']
    if 'transport' in mech:
        cmd += ' --transport=' + mech['transport']
    if 'phase' in mech:
        cmd += ' --name=' + mech['phase']
    if 'bibtex' in mech:
        cmd += ' --bibtex=' + mech['bibtex']
    b = build(env.Command('build/data/{}'.format(mech['output']), mech['input'], cmd))
    env.Depends(b, 'interfaces/cython/cantera/ck2yaml.py')

# preprocess input files (cti -> yaml)
for cti in mglob(env, 'data/inputs', 'cti'):
    outName = os.path.splitext(cti.name)[0] + '.yaml'
    if outName not in convertedInputFiles:
        b = build(env.Command('build/data/{}'.format(outName), cti.path,
                              '$python_cmd_esc interfaces/cython/cantera/cti2yaml.py $SOURCE $TARGET'))
        env.Depends(b, 'interfaces/cython/cantera/cti2yaml.py')
=======
for yaml in mglob(env, "data", "yaml"):
    dest = pjoin("build", "data", yaml.name)
    build(env.Command(dest, yaml.path, Copy("$TARGET", "$SOURCE")))
>>>>>>> master

@ischoegl ischoegl force-pushed the add-references-to-yaml branch 2 times, most recently from c4e1579 to 15a3d57 Compare January 7, 2020 18:37
@ischoegl
Copy link
Member Author

ischoegl commented Jan 7, 2020

@bryanwweber and @speth ... I opted to re-introduce the ck2yaml conversion (temporarily?) after #768 was merged, so this can be reviewed. Edit: Resolved by introducing an example instead.

@bryanwweber
Copy link
Member

@ischoegl I think the best approach is to add the reference directly into the YAML files that are now committed to the repo.

Can you add an example of what the reference format will look like in the file, either as a comment here or in one of the YAML files?

@ischoegl
Copy link
Member Author

ischoegl commented Jan 7, 2020

@bryanwweber ... sure (also, agreed on simply copying information over to your versions at least for gri30.yaml, due to deprecated phases).

This is the header portion for gri30.yaml, as generated from the current version of this PR

description: |-
  GRI-Mech Version 3.0 7/30/99  CHEMKIN-II format
  See README30 file at anonymous FTP site unix.sri.com, directory gri;
  WorldWideWeb home page http://www.me.berkeley.edu/gri_mech/ or
  through http://www.gri.org , under 'Basic  Research',
  for additional information, contacts, and disclaimer

references:
  gri30: |-
    @misc{gri30,
        title = "GRI-Mech 3.0",
        author = "Gregory P. Smith and David M. Golden and Michael Frenklach and
                  Nigel W. Moriarty and Boris Eiteneer and Mikhail Goldenberg and
                  C. Thomas Bowman and Ronald K. Hanson and Soonho Song and
                  William C. {Gardiner Jr.} and Vitali V. Lissianski and
                  Zhiwei Qin",
        year = 1999,
        howpublished = \url{http://combustion.berkeley.edu/gri-mech}
    }

generator: ck2yaml
input-files: [gri30.inp, gri30_thermo.dat, gri30_tran.dat, gri30.bib]
cantera-version: 2.5.0a3
date: Tue, 07 Jan 2020 19:02:10 +0000

units: {length: cm, time: s, quantity: mol, activation-energy: cal/mol}

@ischoegl ischoegl force-pushed the add-references-to-yaml branch 5 times, most recently from 3f7ad89 to 30ef54a Compare January 9, 2020 17:16
@ischoegl
Copy link
Member Author

ischoegl commented Jan 9, 2020

@bryanwweber ... I removed the temporary renaming, and instead moved the ck2yaml conversion to a python example (ck2yaml_demo.py in a new input folder, which calls the shell script via the subprocess module).

I.e. this PR no longer clashes with your changes from #768.

PS: there are likely some additional tweaks for the example and SConstruct, including a decision on the fate of some currently unused CHEMKIN thermo/transport files. I'll leave this up for discussion.

Copy link
Member Author

@ischoegl ischoegl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some comments.

SConstruct Outdated Show resolved Hide resolved
Comment on lines +26 to +28
# options for h2o2.yaml
opt['h2o2'] = ["--input=h2o2.inp", "--output=h2o2.yaml",
"--transport=gri30_tran.dat",
"--name=ohmech"]
Copy link
Member Author

@ischoegl ischoegl Jan 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add the gri30.bib reference here as well (same for others below), i.e. air.yaml and argon.yaml.

interfaces/cython/cantera/examples/input/ck2yaml_demo.py Outdated Show resolved Hide resolved
@ischoegl
Copy link
Member Author

ischoegl commented Jan 11, 2020

@bryanwweber ... for clarification (or users not familiar with BibTeX), I added an auto-generated comment to the references section. Below is the proposed output for nasa.yaml.

references:
  # references use BibTeX format
  nasa7: |-
    @misc{nasa7,
        title = "NASA thermodynamic database",
        note = "7-term polynomial coefficients in 1971 format",
        year = 1993,
        howpublished = \url{https://shepherd.caltech.edu/EDL/PublicResources/sdt/cti/NASA7/nasa7.dat}
    }
  gordon1971: |-
    @techreport{gordon1971,
        title = "Computer Program for Calculation of Complex Chemical Equilibrium
                 Composition, Rocket Performance, Incident and Reflected Shocks and
                 Chapman-Jouguet Detonations",
        author = "S. Gordon and B. J. McBride",
        year = 1971,
        number = "NASA SP-273"
    }
  mcbride1993: |-
    @techreport{mcbride1993,
        title = "Coefficients for Calculating Thermodynamic and Transport Properties
                 of Individual Species",
        author = "B. J. McBride and S. Gordon and M. A. Reno",
        year = 1993,
        number = "NASA TM-4513"
    }

@bryanwweber
Copy link
Member

@ischoegl Can you move the Python example to another branch and PR? I think that will probably be a separate discussion (what to do with the various required files, how to format so people learn the most, etc.) and I'd like to focus this PR on a discussion of the reference format. If you don't think that makes sense, that's OK, but I do think they are sufficiently separate ideas and I don't want a repeat of #768 😄

@ischoegl
Copy link
Member Author

ischoegl commented Jan 11, 2020

@bryanwweber ... I'd prefer not to (at least for the time being). After #768 the previously existing hook for this PR was removed, so I had to come up with a way to create a proper illustration for what I am suggesting. With less than 250 lines, I believe this PR is still small enough, and I don't see a reason why this PR couldn't partially address Cantera/enhancements#22?

PS: I am open to alternate suggestions to the example, and can also move it to another PR after the main solution approach has been worked out (I.e. waiting for @speth’s feedback)

@speth
Copy link
Member

speth commented Jan 11, 2020

While I agree that we should add reference information to the YAML files included with Cantera, and encourage users to do the same with input files that they generate, I'm not sure that this is the right solution to the latter concern.

I don't like the idea of embedding data in the raw BibTeX format inside the YAML file, when there is such an obvious mapping to a YAML key-value data structure, and many users aren't going to care about the BibTeX format specifically. I do see it as being a reasonable input source, if we are going to support automated processing of reference information, given that many publishers provide an easy way to download a .bib file for an article.

However, I don't think that we should be writing any sort of BibTeX parser ourselves, and I'm kind of skeptical that many users are going to bother installing an extra Python package and downloading or creating the relevant .bib file before running ck2yaml. Just adding the reference information manually seems about as easy to me.

@ischoegl
Copy link
Member Author

ischoegl commented Jan 11, 2020

@speth ... thanks for the feedback.

I don't like the idea of embedding data in the raw BibTeX format inside the YAML file, when there is such an obvious mapping to a YAML key-value data structure [...]. I do see it as being a reasonable input source, if we are going to support automated processing of reference information, given that many publishers provide an easy way to download a .bib file for an article.

I believe it’s relatively straightforward to parse valid syntax (which imho is a reasonable expectation), so I’ll give it a shot. I really don’t like the manual option if there’s a programmatic way of doing this using standardized (and easily obtainable) input. Otherwise there’s no way of ascertaining that the new feature is an improvement over the status quo.

PS: To further elaborate, as long as there aren’t any @string inputs, the BibTeX format is easily parsed.

@speth
Copy link
Member

speth commented Jan 11, 2020

Sure, we could parse BibTeX entries, but I don't think that's a rabbit hole we should go down. I can see the case for having a programmatic way of adding user-supplied metadata to the generated input file, but I think it would be better to make that interface fairly generic and not expect ck2yaml to do much interpretation of the content.

I would consider adding a --metadata option which takes an additional YAML file and adds the fields in that file as top-level fields in the generated YAML file. This would be more general than just a references field. You could for instance include a description field that overrides the one automatically determined by ck2yaml. Converting reference information from whatever format the user has it in to the YAML format would be left up to them. This would at least get away from telling users to hand-edit the file that is generated by ck2yaml.

@ischoegl
Copy link
Member Author

ischoegl commented Jan 12, 2020

@speth ... alright, I added a simple parser (around 30 lines), which produces

references:
  nasa7:  # @misc
    title: NASA thermodynamic database
    note: 7-term polynomial coefficients in 1971 format
    howpublished: |-
      \url{https://shepherd.caltech.edu/EDL/PublicResources/sdt/cti/NASA7/nasa7.dat}
  gordon1971: # @techreport
    title: Computer Program for Calculation of Complex Chemical Equilibrium
      Composition, Rocket Performance, Incident and Reflected Shocks and
      Chapman-Jouguet Detonations
    author: S. Gordon and B. J. McBride
    year: 1971
    number: NASA SP-273
  mcbride1993: # @techreport
    title: Coefficients for Calculating Thermodynamic and Transport Properties
      of Individual Species
    author: B. J. McBride, S. Gordon and M. A. Reno
    year: 1993
    number: NASA TM-4513

There are some minor warts to work out, but I really don't think that this is a rabbit hole.

Regarding the metadata idea, I’m not opposed, but it imho doesn’t address #339

@ischoegl ischoegl force-pushed the add-references-to-yaml branch 2 times, most recently from 6a02e7e to 7b4ab83 Compare January 12, 2020 05:06
@bryanwweber
Copy link
Member

@ischoegl I am still strongly opposed to implementing any sort of BibTeX parser. If we add this (no matter how simple), where does it stop? Next we get a request for RIS parser, Endnote parser, Crossref parser... Or someone has a BibTeX file that can be read by bibtex but not ck2yaml and they complain and we have to fix it (whether because of extra features in bibtex, a bug in our or their implementation, etc.)... I feel that these kind of parsers are not in the Cantera wheel-house and I really don't want to maintain them.

I think the ultimate resolution to #339 is to manually commit reference information into the YAML files for the files in this repository. Implementing the ability to automatically add extra fields to the YAML output from this conversion is a separate (although related) idea. Like @speth I think a more general --metadata flag would be more generally useful and much easier to maintain.

I think a better option for automatic parsing and addition of BibTeX into YAML is either

  1. implement a separate package (pip install bibtex2yaml) that depends on a fully tested parser (e.g., https://github.com/sciunto-org/python-bibtexparser) and outputs a YAML file suitable for use in the proposed --metadata argument or directly edits an existing YAML file. This would/could be much more generic than to be able to edit Cantera input YAML files, e.g., it could also be useful for PyKED files or
  2. implement these capabilities in something like ChemCheck, which is a (currently on hiatus) GSoC project to do online conversion and checking of Chemkin input files that we started this past summer

(or both).

Regarding the example, the reason I suggested to move that to its own PR is because I don't think the example as currently constituted is that useful, but I wanted to have a little longer discussion of it without getting distracted from the references stuff in this PR. I think the example would be more useful if it was either a Juypter Notebook, so it could have prose explanations of each option (but then that's bordering on the existing API documentation), or as a separate tutorial page on the website, showing the command line incantations and how they change various aspects of the output. Not to mention that the example as a Python script requires us to keep around all the Chemkin input files that I don't think are all that useful anymore... Like I said, this is a longer discussion that's separate from the references.

@ischoegl
Copy link
Member Author

ischoegl commented Jan 12, 2020

@bryanwweber (and @speth) ... thanks for your thoughts.

I am still strongly opposed to implementing any sort of BibTeX parser. [...] I feel that these kind of parsers are not in the Cantera wheel-house and I really don't want to maintain them.

I do see and acknowledge this argument, and including an external (and optional) library bibtex2yaml is a good suggestion. Edit: I don’t think there’s an easy way to use an external dependency.

If we add this (no matter how simple), where does it stop? Next we get a request for RIS parser, Endnote parser, Crossref parser...

I don't think that this 'slippery slope' argument is fair. The advantage of using a pre-existing input format is that there will be a well-defined references structure. Fwiw, the Cantera citation is provided in BibTeX format 😉 (and I do not think there's a need to include RIS, Endnote, Crossref, etc.).

I think the ultimate resolution to #339 is to manually commit reference information into the YAML files for the files in this repository. Implementing the ability to automatically add extra fields to the YAML output from this conversion is a separate (although related) idea. Like @speth I think a more general --metadata flag would be more generally useful and much easier to maintain.

As mentioned before, if we allow users to put whatever they deem appropriate into --metadata, then #339 is just kicked down the road, with the effect that there is no net improvement over the status quo. Imho, adding a --metadata option goes beyond what this PR seeks to accomplish.

To summarize:

  1. I believe that using BibTeX input to ck2yaml generates a well-defined reference structure that directly addresses Adding citation to the chem input file in the data folder #339
  2. I do not think that this is a rabbit hole, or that there is a need to provide alternate input formats to users (i.e. the alternative would be to hand-edit YAML)
  3. For a resolution, I am still suggesting to support a --bibtex option for ck2yaml if and only if bibtex2yaml is installed. This would be similar to what I did for pandas-based HDF input/output, where an error is raised if the optional package is not installed.

PS: looks like it’s not as simple ... a bibtex2yaml package does not exist, and the sciunto-org/python-bibtexparser is currently unmaintained. biblio2yaml (jgm/pandoc-citeproc) isn’t lightweight enough and the YAML output structure is lacking. Looks like using external dependencies may be more complex than the minimal parser I wrote.

Ad examples, as mentioned, I needed a hook to test what I wrote.

@ischoegl
Copy link
Member Author

@bryanwweber ... on further thought, I’m frankly unsure about your proposal to create an external bibtex2yaml package, which imho is a true rabbit hole with questionable benefits.

If there’s a constructive way to incorporate this PR into cantera, great. Otherwise, I’ll just close it.

@ischoegl
Copy link
Member Author

ischoegl commented Jan 13, 2020

@bryanwweber and @speth: I added a unit test to complete what I am suggesting as a resolution for #339. As mentioned before, I do not think that external dependencies are workable, as it would increase instead of decrease complexity.

If this PR is acceptable as a basis for discussion, I'd fine-tune the formatting, add more tests, copy references over to the yaml input files and move the example to a separate PR (I agree that there may be better suited alternatives) before this is ready to merge. Otherwise, I'll rest my case.

PS: implementing the (currently missing) @string definition is likewise not difficult, but I'd rather not get into this given the circumstances.

@codecov
Copy link

codecov bot commented Jan 13, 2020

Codecov Report

Merging #777 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #777      +/-   ##
==========================================
- Coverage   71.44%   71.44%   -0.01%     
==========================================
  Files         372      372              
  Lines       43950    43952       +2     
==========================================
- Hits        31402    31400       -2     
- Misses      12548    12552       +4
Impacted Files Coverage Δ
include/cantera/thermo/IdealSolidSolnPhase.h 57.69% <0%> (-10.49%) ⬇️
test_problems/rankine_democxx/rankine.cpp 93.18% <0%> (-0.3%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 358195e...7e5a74e. Read the comment docs.

@ischoegl ischoegl mentioned this pull request Jan 14, 2020
4 tasks
@ischoegl
Copy link
Member Author

Closing this, as #794 provides a more light-weight alternative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add reference field to YAML input files Adding citation to the chem input file in the data folder
3 participants