Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable serialization of Cantera objects #984

Merged
merged 57 commits into from
Apr 14, 2021

Conversation

speth
Copy link
Member

@speth speth commented Feb 25, 2021

Changes proposed in this pull request

  • Adds getParameters(AnyMap&) methods to ThermoPhase, Kinetics, Reaction, etc. classes to collect data needed for serialization
  • The data in these fields is populated based on the current state of the objects, so it can be used for objects created not only for YAML files, but also objects created programmatically from scratch or from the legacy CTI and XML input files.
  • For objects that originate from YAML input files, "extra" YAML fields not used by Cantera are preserved and returned
  • Add the input_data property to the corresponding Python objects, which returns this information converted to native Python types, e.g. dicts and lists
  • Introduce the YamlWriter class for creating input files defining phases, species, and reactions. Generated output files can contain multiple phase definitions and reaction sections, and have use user-specified unit systems.

Examples

Python input_data property:

>>> gas = ct.Solution('gri30.yaml')
>>> gas.input_data
{'elements': ['O', 'H', 'C', 'N', 'Ar'],
 'kinetics': 'gas',
 'name': 'gri30',
 'species': ['H2',
  'H',
  ...
  'CH2CHO',
  'CH3CHO'],
 'state': {'T': 300.0, 'Y': {'H2': 1.0}, 'density': 0.08189392763801234},
 'thermo': 'ideal-gas',
 'transport': 'mixture-averaged'}

>>> gas.species(1).input_data
{'composition': {'H': 1.0},
 'name': 'H',
 'thermo': {'data': [[2.5,
    ...
    -0.446682853],
   [2.50000001,
   ...
    -0.446682914]],
  'model': 'NASA7',
  'note': 'L7/88',
  'temperature-ranges': [200.0, 1000.0, 3500.0]},
 'transport': {'diameter': 2.05,
  'geometry': 'atom',
  'model': 'gas',
  'well-depth': 145.0}}

>>> gas.reaction(1).input_data
{'efficiencies': {'AR': 0.7,
  'C2H6': 3.0,
  'CH4': 2.0,
  'CO': 1.5,
  'CO2': 2.0,
  'H2': 2.0,
  'H2O': 6.0},
 'equation': 'H + O + M <=> OH + M',
 'rate-constant': {'A': 500000000000.0001, 'Ea': 0.0, 'b': -1.0},
 'type': 'three-body'}

YamlWriter class in C++:

auto original = newSolution("gri30.yaml");
// Add or modify species reactions at will
YamlWriter writer;
writer.addPhase(original);
writer.setUnits({{"activation-energy", "K"}});
writer.toYamlFile("modified-mechanism.yaml");

From a Python Solution or Interface:

gas = ct.Solution('ptcombust.yaml', 'gas', transport_model='mixture-averaged')
surf = ct.Interface('ptcombust.yaml', 'Pt_surf', [gas])
surf.write_yaml('output.yaml', units={'length': 'cm', 'quantity': 'mol'})

Remaining issues

  • I have not figured out how to get Sphinx to include documentation for the _SolutionBase.input_data property and _SolutionBase.write_yaml method with the Solution class. I tried adding a :members: input_data, write_yaml directive in importing.rst, but it didn't seem to do anything.

Potential future improvements

  • YamlWriter could use some options to specify additional metadata fields, perhaps something akin to the the --extra option of ck2yaml. This probably requires adding a converter for Python data structures to AnyMap.
  • Units of "extra" fields not interpreted by Cantera are not subject to any unit conversions. For now, if you specify the same output units as the input file, it should be fine.
  • There are some cases where floating point rounding issues result in kind of ugly values when round-tripping input data, e.g. reactions with pre-exponential factors of 7.000000000000001e+11. The best workaround I have for this is judicious use of the precision option.

Addresses Cantera/enhancements#11.

Checklist

  • There is a clear use-case for this code change
  • The commit message has a short title & references relevant issues
  • Build passes (scons build & scons test) and unit tests address code coverage
  • The pull request is ready for review

@codecov
Copy link

codecov bot commented Feb 25, 2021

Codecov Report

Merging #984 (97263fe) into main (b3c69ac) will increase coverage by 1.73%.
The diff coverage is 94.13%.

❗ Current head 97263fe differs from pull request most recent head c8aea00. Consider uploading reports for the commit c8aea00 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main     #984      +/-   ##
==========================================
+ Coverage   70.66%   72.39%   +1.73%     
==========================================
  Files         361      364       +3     
  Lines       44522    46433    +1911     
==========================================
+ Hits        31460    33616    +2156     
+ Misses      13062    12817     -245     
Impacted Files Coverage Δ
include/cantera/base/Solution.h 100.00% <ø> (ø)
include/cantera/base/Units.h 100.00% <ø> (ø)
include/cantera/kinetics/Kinetics.h 51.51% <ø> (ø)
include/cantera/kinetics/Reaction.h 100.00% <ø> (ø)
include/cantera/kinetics/RxnRates.h 92.13% <ø> (ø)
...ude/cantera/thermo/BinarySolutionTabulatedThermo.h 100.00% <ø> (ø)
include/cantera/thermo/ConstCpPoly.h 66.66% <ø> (ø)
include/cantera/thermo/DebyeHuckel.h 100.00% <ø> (ø)
include/cantera/thermo/HMWSoln.h 33.33% <ø> (ø)
include/cantera/thermo/IdealMolalSoln.h 100.00% <ø> (ø)
... and 111 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b3c69ac...c8aea00. Read the comment docs.

Copy link
Member

@ischoegl ischoegl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@speth ... as indicated earlier, it is great that this is finally close to becoming reality (:tada:). I haven't looked at a very detailed level (I assume that the unit test suite is able to ensure that the emitted YAML code can be re-imported without issues), but do have some minor comments.

include/cantera/kinetics/Falloff.h Show resolved Hide resolved
@@ -72,3 +72,78 @@ class CanteraError(RuntimeError):
pass

cdef public PyObject* pyCanteraError = <PyObject*>CanteraError

cdef anyvalueToPython(string name, CxxAnyValue& v):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

names in utils.pyx are not very 'pythonic' (although it's not exposed to Python) - anyvalue_to_python would be (same for anymap_to_dict below - would suggest dict as this is what it returns)

return {key: value for (_, key, value) in py_items}


cdef mergeAnyMap(CxxAnyMap& primary, CxxAnyMap& extra):
Copy link
Member

@ischoegl ischoegl Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to handle that within the C++ layer? Also, as a comment, the anymapToPython (or anymap_to_dict) applies units whereas mergeAnyMap does not.

Regarding my other comment about getParameters, a C++ implementation of merge would go in hand with having unique_ptr<AnyMap> parameters() const returned throughout, which then would mean that you can AnyMap::merge(const AnyMap& other) (or similar).

I assume that there may be a small performance penalty for smart pointers (and potentially a larger one if merging requires copying), but it would be easier to track where objects are initially created (while this may be subjective, it is often difficult to track how things are assembled when the receiving function creates the object which is subsequently added to as in your getParameters approach).

Copy link
Member

@ischoegl ischoegl Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while my initial comment is mostly moot, I believe this function is mainly used to merge output from getParameters and input? I believe that it would be nice to merge this by passing an optional flag to getParameters where it applies (and handle the merge in C++). This may eliminate the need for this specific Cython function ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial reason for this came from the desire to control the order of the output keys. I want the user-defined keys in input to generally come last, and the ordering mechanism mostly relies on preserving insertion order. If the base Reaction class were responsible for adding in the user-defined keys, then they would come before any keys added by child classes.

I actually ended up finding a hybrid solution that both gets rid of the need to merge the fields defining object with the extra input data at the point of use, and avoids the requirement for the user-accessible function to take an AnyMap as it's input, by making the virtual getParameters method protected, and introducing a non-virtual method with the signature AnyMap parameters(bool withInput=true) on the base classes only. This way, the parameters method can initialize the AnyMap, use the virtual getParameters method to populate all of the relevant fields, and then add the extra keys last before returning it.

src/kinetics/Reaction.cpp Show resolved Hide resolved
src/kinetics/Reaction.cpp Show resolved Hide resolved
Copy link
Member

@ischoegl ischoegl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@speth ... I rescinded some of my initial comments as I believe your implementation is more efficient (despite being harder to track).

Beyond, I think adding an optional flag to getParameters to add input where it applies (and handling a merge within C++) may be more consistent and easier long-term. I'm also not sure about getSpeciesParameters, as it seems somewhat superfluous. Spot-wise benchmark tests that illustrate that a re-imported solution preserves results may add another level of security.

PS: I believe it would make sense to finalize this before #995 (and rebase the latter, as this would allow me to port getParameters immediately, rather than requiring another go-around).

PPS: Adding this capability to the extract_submechanism.py example would be neat.

self.assertEqual(len(generated['Pt_surf-reactions']), surf.n_reactions)
self.assertEqual(len(generated['species']), surf.n_total_species)


Copy link
Member

@ischoegl ischoegl Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make sense to expand tests some, e.g. generate yaml and benchmark test problems for both original and generated yaml input (along the lines of test_convert for CTI/XML->YAML). Including larger mechanisms (Reitz?) may be interesting also.

Individual objects are well covered by C++ tests, so this is mainly testing the YamlWriter's front end, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these tests were mainly just focused on the YamlWriter Python interface, but I did add a couple of additional checks here to show that whole Solution is being restored in a consistent state. I think the consistency of all the different phase and reaction types is reasonably-well covered by the ThermoYamlRoundTrip test suite (in test/thermo/thermoToYaml.cpp), and the checks done in the ReactionToYaml test suite (in test/kinetics/kineticsFromYaml.cpp) by the compareReactions function.

{"quantity", "mol"},
{"length", "cm"}
});
// Should fail because pre-exponential factors from XML can't be converted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that I am following based on what this comment states - while XML may have limitations for import, shouldn't this become moot once data are read and available to the C++ objects? (at which point everything should be standardized, i.e. everything is known implicitly)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely a thorny issue. Because reactions are instantiated from XML without having a Kinetics object available, there is no way to determine the dimensions of the rate constant. All you know is that the value itself has already been converted to Cantera's mks+kmol system. I guess you could set the value of Reaction.rate_units when calling Kinetics.addReaction, which would resolve this for what will presumably be the most common use case, serializing a complete phase definition. But a bare Reaction object created from XML does not contain the information needed to serialize it to a non-default unit system.

Copy link
Member

@ischoegl ischoegl Apr 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, the reaction order provides most of the missing information, so the main issues are non-standard units as well as volumetric vs surface geometry. Am I seeing this correctly? Regardless, I believe export should work if associated kinetics objects are defined (as you mention, this can be done implicitly and is likely the most common use case); this would leave an exception if the reaction isn’t fully set up.

Copy link
Member Author

@speth speth Apr 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the units are determined by the dimensionality of the various phases, plus the units of the standard concentration for each phase. And while kmol/m^3 or kmol/m^2 are probably the most common, there are several phases where the standard concentration is dimensionless (see the various implementations of standardConcentrationUnits).

Copy link
Member

@ischoegl ischoegl Apr 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification; I believe I’m on the same page now - units do pop up in #995.

Regarding the XML conversion, I still believe that implicitly setting units would make sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized while thinking about this that CTI/XML isn't the only case where you can manage to create a reaction without needing to set it's rate units -- this is also true if you just create a Reaction object from scratch. Setting Reaction.rate_units now occurs as part of Kinetics.addReaction, if it doesn't get set before then. I also ended up refactoring up the rateCoeffUnits function and making it a member function of the Reaction class (as calculateRateCoeffUnits).

Copy link
Member

@ischoegl ischoegl Apr 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s what had been my suspicion all along. If you look at #995, I have new constructors for ReactionRate from AnyMap where this will become relevant. Obviously things need to be consolidated as I didn’t have rate_units at my disposal before.

"Multiple species with different definitions are not "
"supported:\n>>>>>>\n{}\n======\n{}\n<<<<<<\n",
speciesDef.toYamlString(),
speciesDefs[speciesDefIndex[name]].toYamlString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As YamlWriter is central to serialization, it may make sense to add a unit test for special cases (here and elsewhere as marked by CodeCov).

out << valueStr;
width += name.size() + valueStr.size() + 4;
} else {
// Put items of an unknown (compound) type on a line alone
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As AnyMap is central to serialization, it may make sense to add a unit test for special cases (here and elsewhere as marked by CodeCov).

interfaces/cython/cantera/yamlwriter.pyx Show resolved Hide resolved
include/cantera/thermo/ThermoPhase.h Show resolved Hide resolved
include/cantera/kinetics/Falloff.h Show resolved Hide resolved
src/kinetics/Reaction.cpp Show resolved Hide resolved
src/kinetics/Reaction.cpp Show resolved Hide resolved
return {key: value for (_, key, value) in py_items}


cdef mergeAnyMap(CxxAnyMap& primary, CxxAnyMap& extra):
Copy link
Member

@ischoegl ischoegl Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while my initial comment is mostly moot, I believe this function is mainly used to merge output from getParameters and input? I believe that it would be nice to merge this by passing an optional flag to getParameters where it applies (and handle the merge in C++). This may eliminate the need for this specific Cython function ...

@ischoegl ischoegl mentioned this pull request Apr 10, 2021
4 tasks
Eliminates the 'mergeAnyMap' function, and introduces a 'parameters' method
for classes to return an AnyMap which can optionally contain the user-provided
input data, rather than needing to create the AnyMap in advance and add the
user-created fields separately.
@speth
Copy link
Member Author

speth commented Apr 14, 2021

I think I'm happy with the code coverage provided by the tests at this point.

@speth speth requested review from a team and removed request for bryanwweber April 14, 2021 00:45
@ischoegl
Copy link
Member

@speth ... thanks for going into more detail with test coverage. As mentioned above, this looks good to go.

@bryanwweber
Copy link
Member

@ischoegl Feel free to hit the merge button 😄

@ischoegl ischoegl merged commit 81cffde into Cantera:main Apr 14, 2021
@speth
Copy link
Member Author

speth commented Apr 14, 2021

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants