Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script to convert CTML to YAML input format #693

Merged
merged 99 commits into from
Dec 2, 2019

Conversation

bryanwweber
Copy link
Member

@bryanwweber bryanwweber commented Aug 14, 2019

The script in the PR will convert CTML to YAML input format. The first version here works for GRI-3.0 and I'm posting it to get feedback on the approach I'm taking. Obviously, there is still a lot to be done (I'll try to generate a checklist at some point).

I've attached the current gri30.yaml file to this message as a .txt file.

gri30.txt

Some notes about the YAML format I'm noticing as I do this:

  • Should the note field of a species entry should be at the species level, not the thermo level? This would match the CTI and CTML locations of this data, but I do believe it is extracted from the NASA thermo entry, so maybe it does belong in the thermo field. (ctml1yaml.py currently puts it at the species level, while I believe ck2yaml.py puts it at the thermo level)
  • The current gri30.yaml that is generated only has one phase entry for mixture averaged transport. This is different from gri30.cti and gri30.xml that have 3 phase entries.

@bryanwweber
Copy link
Member Author

bryanwweber commented Aug 15, 2019

Questions about this script:

  • How concerned do we need to be about ill-formed species, thermo, transport, and reaction entries? In other words, how much error checking should I include?
  • Can units ever be specified for the b constant in an Arrhenius-type rate? I don't think I've ever seen it, and I don't think it makes sense to have units for the exponent of the temperature, but 🤷‍♂ people do crazy things

@bryanwweber
Copy link
Member Author

bryanwweber commented Aug 16, 2019

To-do list

  • multiple reaction sections in the same file
  • multiple species sections in the same file
  • Make a list of all the thermo classes that can be instantiated from XML that will work from YAML
  • Non-reactant reaction orders
  • Negative reaction orders

@bryanwweber
Copy link
Member Author

How concerned do we need to be about ill-formed species, thermo, transport, and reaction entries? In other words, how much error checking should I include?

I anticipate that any ill-formed choice that the current CTI/XML parser catches/fixes will end up in form of a question on the user forum.

@ischoegl Yes, but to the extent that we can avoid such questions, it would be preferable. I guess the question is really, how likely is it that a) a user has a hand-written/hand-edited CTML file (since the files converted by ctml_writer.py should be correct) and b) they haven't loaded that file with Cantera to figure out that they have an error. Hopefully that cross-section is small enough that minimal error checking will be sufficient...

@speth
Copy link
Member

speth commented Aug 16, 2019

Should the note field of a species entry should be at the species level, not the thermo level? This would match the CTI and CTML locations of this data, but I do believe it is extracted from the NASA thermo entry, so maybe it does belong in the thermo field.

Given that for files generated from CK->CTI->XML, this is the note from the thermo data entry, I think this belongs with the thermo entry and not at the species level. (In ck2yaml, we look consider a comment on the line containing the species declaration to pertain to the species, if species are listed one per line. This style is used by RMG, and possibly other sources).

The current gri30.yaml that is generated only has one phase entry for mixture averaged transport. This is different from gri30.cti and gri30.xml that have 3 phase entries.

gri30.yaml is generated directly from the Chemkin input files, whereas gri30.cti is hand-modified to add those additional declarations. My opinion is that those extra phase definitions are the least useful way of switching between transport models, so it's not something I'm particularly interested in preserving.

How concerned do we need to be about ill-formed species, thermo, transport, and reaction entries? In other words, how much error checking should I include?

I think it's ok if an invalid input results in invalid output. You could potentially import the input file and use that to catch errors, but that would require the Python module to be present, and gets a little tricky if you have input files with surface definitions.

Can units ever be specified for the b constant in an Arrhenius-type rate? I don't think I've ever seen it, and I don't think it makes sense to have units for the exponent of the temperature, but 🤷‍♂️ people do crazy things

No, an exponent can't have units. If units for b were specified in the XML file, they would be silently ignored by the current parser. I don't think you need to worry about it.

@bryanwweber
Copy link
Member Author

I think it's ok if an invalid input results in invalid output.

My concern is more in the conversion script crashing out because, e.g., the text of a node that's supposed to be a float isn't and raises a ValueError. Is that likely to happen? Because I don't want to have to worry about passing around the context of the error and trying to catch appropriate things 😁

@bryanwweber
Copy link
Member Author

Hmmm... adding the id field results in a failure for the tests that were previously passing, complaining that the id contains an integer, not a string. There are quotes around the value of the id. Any reason for AnyMap to try to guess that the field is a long int?

***********************************************************************
InputFileError thrown by AnyValue::as:
Error on line 37 of /Users/bryan/GitHub/cantera/test/work/python/gri30.yaml:
Key 'id' contains a 'long int',
not a 'string'
|  Line |
|    32 |   thermo: ideal-gas
|    33 |   transport: multi-component
|    34 | reactions:
|    35 | - efficiencies: {AR: 0.83, C2H6: 3.0, CH4: 2.0, CO: 1.75, CO2: 3.6, H2: 2.4, H2O: 15.4}
|    36 |   equation: 2 O + M <=> O2 + M
>    37 >   id: '0001'
                ^
|    38 |   rate-constant: {A: 120000000000.0, Ea: 0.0 cal/mol, b: -1.0}
|    39 |   type: three-body
|    40 | - efficiencies: {AR: 0.7, C2H6: 3.0, CH4: 2.0, CO: 1.5, CO2: 2.0, H2: 2.0, H2O: 6.0}
***********************************************************************

@codecov
Copy link

codecov bot commented Sep 7, 2019

Codecov Report

Merging #693 into master will increase coverage by 0.52%.
The diff coverage is 89.47%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #693      +/-   ##
==========================================
+ Coverage   70.82%   71.35%   +0.52%     
==========================================
  Files         372      372              
  Lines       43745    43758      +13     
==========================================
+ Hits        30983    31223     +240     
+ Misses      12762    12535     -227
Impacted Files Coverage Δ
src/thermo/DebyeHuckel.cpp 56.53% <ø> (+0.79%) ⬆️
src/thermo/IdealMolalSoln.cpp 52.24% <ø> (+7.37%) ⬆️
src/thermo/IonsFromNeutralVPSSTP.cpp 43.16% <89.47%> (+1.18%) ⬆️
src/thermo/HMWSoln.cpp 71.58% <0%> (+0.04%) ⬆️
src/kinetics/Reaction.cpp 86.14% <0%> (+0.15%) ⬆️
src/transport/GasTransport.cpp 90.79% <0%> (+0.2%) ⬆️
src/base/ctml.cpp 71.72% <0%> (+0.42%) ⬆️
src/thermo/PDSS_Water.cpp 82.22% <0%> (+0.74%) ⬆️
src/transport/HighPressureGasTransport.cpp 0.88% <0%> (+0.88%) ⬆️
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d18757...6508179. Read the comment docs.

@codecov
Copy link

codecov bot commented Sep 7, 2019

Codecov Report

Merging #693 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #693   +/-   ##
=======================================
  Coverage   70.63%   70.63%           
=======================================
  Files         372      372           
  Lines       43567    43567           
=======================================
  Hits        30773    30773           
  Misses      12794    12794
Impacted Files Coverage Δ
src/transport/GasTransport.cpp 90.58% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 265a186...29904a1. Read the comment docs.

@bryanwweber
Copy link
Member Author

bryanwweber commented Sep 11, 2019

I'm confused about the implementation of the fixed-stoichiometry phase type and its associated species. For example,

- name: diamond
  thermo: fixed-stoichiometry
  elements: [C]
  species: [C(d)]

and the species

- name: C(d)
  composition: {C: 1}
  thermo:
    model: constant-cp
  equation-of-state:
    model: constant-volume
    density: 3.52 g/cm^3

The corresponding XML entries are

  <phase id="diamond" dim="3">
    <elementArray datasrc="elements.xml">C</elementArray>
    <speciesArray datasrc="#species_data">C(d)</speciesArray>
    <thermo model="StoichSubstance">
      <density units="g/cm3">3.52</density>
    </thermo>
    <transport model="None"/>
    <kinetics model="none"/>
  </phase>

and the species

  <species name="C(d)">
      <atomArray>C:1 </atomArray>
      <thermo>
        <const_cp Tmin="100.0" Tmax="5000.0">
           <t0 units="K">298.15</t0>
           <h0 units="J/mol">0.0</h0>
           <s0 units="J/mol/K">0.0</s0>
           <cp0 units="J/mol/K">0.0</cp0>
        </const_cp>
      </thermo>
    </species>

In the YAML, the density is included with the species definition, while in the XML it is included with the phase definition. This is inconsistent with the electron-cloud phase, where the density is included in the phase, the example below is from sofc.yaml.

- name: metal
  thermo: electron-cloud
  density: 9.0 kg/m^3
  elements: [E]
  species: [electron]

I'm wondering why this difference exists and whether it makes sense to resolve it.

@bryanwweber
Copy link
Member Author

Another question. Does the YAML format support Tmin and Tmax for const_cp species?

    <species name="Li[anode]">
      <atomArray>Li:1 C:6 </atomArray>
      <thermo>
        <const_cp Tmin="100.0" Tmax="5000.0">
           <t0 units="K">298.15</t0>
           <h0 units="kJ/mol">0.0</h0>
           <s0 units="J/mol/K">0.0</s0>
           <cp0 units="J/kmol/K">0.0</cp0>
        </const_cp>
      </thermo>
      <standardState model="constant_incompressible">
        <molarVolume units="cm3/mol">34.80484581497797</molarVolume>
      </standardState>
    </species>

@bryanwweber
Copy link
Member Author

@speth At this point, I've implemented all the equivalents to the cti2yaml tests. My next project will be to implement all the ck2yaml tests (that aren't duplicated) and then make sure I've covered all of the ThermoPhase classes. Can you do a review of the script before it gets over 1000 lines (it's at 938 right now)? 😂

I'm pretty good with how the architecture has worked out, but another opinion would be great. Especially for the activity coefficients or the constant density phase/species stuff where information from the phase node is required to build the species node.

gri30.yaml is generated directly from the Chemkin input files, whereas gri30.cti is hand-modified to add those additional declarations. My opinion is that those extra phase definitions are the least useful way of switching between transport models, so it's not something I'm particularly interested in preserving.

I believe @ischoegl deleted his comment that noted that many people probably rely on those extra phase definitions, so removing them would not be very nice, or at least, make the advice "replace .cti with .yaml and it will keep working" a bit more complicated.

@bryanwweber
Copy link
Member Author

I can't find any tests of ck2yaml. Are there any that parallel the ones for ck2cti or otherwise?

@speth
Copy link
Member

speth commented Sep 13, 2019

In the YAML, the density is included with the species definition, while in the XML it is included with the phase definition. This is inconsistent with the electron-cloud phase, where the density is included in the phase [...] I'm wondering why this difference exists and whether it makes sense to resolve it.

In general, with the YAML format I've tried to move properties of individual species out of the phase entry and into the species entries. With the electron-cloud class, the reason I didn't is because the density specified, at least in the examples I found, was the density of the unrepresented bulk material, not the density of the electrons in that space, and so I don't think it makes sense to assign that density to the electron species.

Does the YAML format support Tmin and Tmax for const_cp species?

They aren't currently read. I guess they could be, but they aren't used anywhere -- there are very few places where temperature limits are enforced in Cantera (for better or worse).

I can't find any tests of ck2yaml. Are there any that parallel the ones for ck2cti or otherwise?

I don't know how I missed this 😲 No, there are currently no explicit tests of ck2yaml. And unfortunately, there are a few too many API differences between the functions in ck2cti and ck2yaml to make a useful common base class for chemkinConverterTest viable, so it'll probably end up being a bit of a copy/paste job.

I believe @ischoegl deleted his comment that noted that many people probably rely on those extra phase definitions, so removing them would not be very nice, or at least, make the advice "replace .cti with .yaml and it will keep working" a bit more complicated.

I don't doubt that people are using these definitions, but I don't really want to keep treating the GRI 3.0 mechanism in a special way that is distinct from what is done with every other mechanism.

@bryanwweber
Copy link
Member Author

Related to #707, I'm not sure how to specify multiple options to the reactions list-of-maps, more specifically, the skip-undeclared-third-bodies option. Based on

kin.skipUndeclaredThirdBodies(
phaseNode.getBool("skip-undeclared-third-bodies", false));
it seems like skip-undeclared-third-bodies should be a phase-level option, but this feels more like something that should be under the reactions key, provided that it can be handled on a per-reaction-source level.

Here is a sample XML entry that contains this option (this is generated by ctml_writer.py converting ch4_ion.cti:

    <reactionArray datasrc="gri30.xml#reaction_data">
      <skip species="undeclared" third_bodies="undeclared"/>
    </reactionArray>
    <reactionArray datasrc="#reaction_data">
      <skip species="undeclared" third_bodies="undeclared"/>
    </reactionArray>

When converting this, should it be an error if both sources do not include skipping the undeclared third bodies? I'm leaning towards no, because deleting the option from the second array does not result in an error when loading this XML file with ct.Solution. However, I'm not sure if that will result in unexpected behavior of the YAML file compared to the XML file.

@bryanwweber
Copy link
Member Author

With the electron-cloud class, the reason I didn't is because the density specified, at least in the examples I found, was the density of the unrepresented bulk material, not the density of the electrons in that space, and so I don't think it makes sense to assign that density to the electron species.

OK, so I should handle the electron-cloud as the special case, rather than treating that as the general one. Sounds good.

They aren't currently read. I guess they could be, but they aren't used anywhere -- there are very few places where temperature limits are enforced in Cantera (for better or worse).

I have two conflicting thoughts about this. One is that it is useful to keep this data, in case someone wants to use this file for something else. The other is that it is confusing to include this data and then not use it in Cantera anywhere. 🤷‍♂

so it'll probably end up being a bit of a copy/paste job.

The ctml2yaml tests are a copy-paste of cti2yaml, so... oh well 😄

I don't doubt that people are using these definitions, but I don't really want to keep treating the GRI 3.0 mechanism in a special way that is distinct from what is done with every other mechanism.

I get that, and I still think we should keep these phase definitions in the "official" distributed version. Breaking people's code is not something we should do lightly, especially when we're trying to convince them to use a new format with unclear (immediate) benefits to them. If we do want to remove these, I think we should figure out a way to go through a deprecation cycle and warn people before their code just breaks. Are there other reasons you're concerned aside from not wanting to make an exception for GRI-3.0?

Copy link
Member

@speth speth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is still in-progress, but I figured it was worth sharing a few comments on the implementation now.

interfaces/cython/cantera/ctml2yaml.py Outdated Show resolved Hide resolved
interfaces/cython/cantera/ctml2yaml.py Outdated Show resolved Hide resolved
interfaces/cython/cantera/ctml2yaml.py Outdated Show resolved Hide resolved
interfaces/cython/cantera/ctml2yaml.py Outdated Show resolved Hide resolved
interfaces/cython/cantera/ctml2yaml.py Outdated Show resolved Hide resolved
interfaces/cython/cantera/ctml2yaml.py Outdated Show resolved Hide resolved
interfaces/cython/cantera/ctml2yaml.py Outdated Show resolved Hide resolved
interfaces/cython/cantera/ctml2yaml.py Outdated Show resolved Hide resolved
interfaces/cython/cantera/ctml2yaml.py Outdated Show resolved Hide resolved
interfaces/cython/cantera/ctml2yaml.py Outdated Show resolved Hide resolved
@ischoegl
Copy link
Member

ischoegl commented Oct 26, 2019

Just used this to convert a NASA9 xml file, it worked quite nicely! A default output name may be useful, i.e. ctml2yaml.py somefile.xml should automatically create an output file somefile.yaml

@bryanwweber
Copy link
Member Author

bryanwweber commented Oct 30, 2019

@speth How does YAML handle the stoichIsMods node for Debye Huckel thermo (I believe it was renamed weak-acid-charge in YAML)? It seems like in the XML the phase-level node is only used as an override for species level information? Is this behavior that we want to/can retain?

@speth
Copy link
Member

speth commented Oct 30, 2019

In the YAML format, I tried to put all species-specific information with the species, so the simple answer is that it doesn't. I don't think there's a compelling reason to have two ways of specifying this information in the input file. In all of the Debye Huckel examples, the phase-level node is only occasionally used, and when it is, it's just a duplicate of the species-node value.

@bryanwweber
Copy link
Member Author

Ok that makes sense. The code comments in that section of the init method say that the phase level node is only used to override the species specific information. Maybe what I'll do is move it from the phase to the species if the equivalent node is not already present in the species.

The changes now match all of the options in newReaction, including
synonyms. Butler-Volmer parameters are deprecated so the user is warned.
Simplify processing interface type reactions into one method, since the
electrochem node is optional.
Add a docstring for get_float_or_units. Use the clean_node_text
function.
As demonstrated in Cantera#683, sometimes the zero values are intentional and
meaningful. Rework the SpeciesTransport class to not use this function.
The Python Expat parser requires that the <?xml version...> tag occur as
the first characters in the file, even before any blank space, so lstrip
is used to remove any whitespace. In addition, raw & characters are
replaced with their escaped version.
If a reactionData or speciesData node has a duplicated id attribute,
combine the duplicate sections together. If duplicate reaction id
attributes or species names are found, either warn or raise an error.
Add intersphinx extension to auto-link to Python docs. Bump KaTeX
version.
A quantity is a value that can be either just a number (i.e., a float)
or a number plus a unit (i.e., a str).
Add parsing of arguments passed when the script is run from the command
line. A new function is introduced to parse the arguments and is the
default function when the file is executed as a script. Update the
documentation to add the new function.

Add a new argument to convert that takes a string containing the content
of a CTML file. This allows convert to be used from within the Python
interpreter more easily.
Node tag names are now quoted for clarity.
The Redlich-Kwong phase-thermo model doesn't require the
activityCoefficients node. If not specified, values will be found from
critProperties.xml
ctml_writer.py maintains global state between calling convert(). This
change commits the resulting XML file into the test/data directory
rather than converting during the test.
The index from the enumerate is not actually used anywhere.
There isn't a reason to convert gri30 every time the class is
instantiated. Move the conversion into the test_gri30 function.
The YAML converters are now installed as scripts from the Python
package. The zip_safe flag is needed to ensure that data files are
accessible by the package. This was added to resolve a build problem
with Python 3.8. I'm not sure why it hasn't been a problem until now.
Bump the minimum Python version to 3.5 to support features in ctml2yaml.
Use argparse to have the CLI match ctml2yaml.py
@bryanwweber
Copy link
Member Author

Thanks @speth and @ischoegl for reviewing this!

@bryanwweber bryanwweber deleted the ctml2yaml branch December 4, 2019 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants