Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small fixes and improvements / sparse coefficient matrices #1088

Merged
merged 28 commits into from
Sep 17, 2021

Conversation

ischoegl
Copy link
Member

@ischoegl ischoegl commented Aug 22, 2021

Changes proposed in this pull request

This PR contains several minor updates that fix small issues accumulated over several PR's (or address inconsistencies that were left over from the 2.5 code base).

  • Make Plog interface consistent: existing code mixed std::vector<std::pair> and std::multimap (deprecate std::multimap versions)
  • Make Chebyshev interface consistent: existing code mixed Array2D and vector_fp (deprecate vector_fp versions)
  • Update internal Chebyshev routines to consistently use column major arrays
  • Fix warnings in compilation and test suite
  • Clarify docstring for SCons legacy_rate_constants
  • Allow for custom C++ NotImplementedError messages (allows for more granular user feedback)
  • Renamed eigen_dense.h to eigen_defs.h (prepares for use of sparse matrices)
  • Implement sparse stoichiometric coefficient matrices (StoichManagerN) via Eigen::SparseMatrix
  • Expose sparse stoichiometric coefficient matrices to Python API via scipy.sparse

Most of these changes are straight-forward; some are meant to preempt questions that may arise in the context of #1051 and #1081. Specifically, the sparse interface is a preamble to sparse Jacobian output for the Python API in #1081.

If applicable, provide an example illustrating new features this pull request is introducing

Almost all updates are internal. The main API change allows for sparse matrix output (implemented for stoichiometric coefficients):

In [1]: import cantera as ct
   ...: gas = ct.Solution("test/data/kineticsfromscratch.yaml")

In [2]: gas.reactant_stoich_coefficients
Out[2]: 
array([[0., 0., 0., 0., 0., 0.],
       [1., 2., 0., 0., 0., 0.],
       [1., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 1.],
       [0., 0., 2., 0., 0., 0.],
       [0., 0., 0., 1., 0., 1.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0.]])

In [3]: ct.use_sparse() # enable sparse output via scipy.sparse

In [4]: gas.reactant_stoich_coefficients
Out[4]: 
<9x6 sparse matrix of type '<class 'numpy.float64'>'
	with 9 stored elements in COOrdinate format>

In [5]: print(gas.reactant_stoich_coefficients)
  (1, 0)	1.0
  (2, 0)	1.0
  (1, 1)	2.0
  (4, 2)	2.0
  (2, 3)	1.0
  (5, 3)	1.0
  (8, 4)	1.0
  (3, 5)	1.0
  (5, 5)	1.0

In [6]: gas.reactant_stoich_coefficients.toarray()
Out[6]: 
array([[0., 0., 0., 0., 0., 0.],
       [1., 2., 0., 0., 0., 0.],
       [1., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 1.],
       [0., 0., 2., 0., 0., 0.],
       [0., 0., 0., 1., 0., 1.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0.]])

Checklist

  • The pull request includes a clear description of this code change
  • Commit messages have short titles and reference relevant issues
  • Build passes (scons build & scons test) and unit tests address code coverage
  • Style & formatting of contributed code follows contributing guidelines
  • The pull request is ready for review

Other thoughts

I also considered using Eigen::MatrixXd for the external Chebyshev interface (2-D matrix of coefficients), but ended up staying closer to the previous implementation.

For simplicity, the Eigen::SparseMatrix interface to Python is implemented using a C preprocessor; the alternative would be to expose some Eigen classes in _cantera.pxd.

@codecov
Copy link

codecov bot commented Aug 22, 2021

Codecov Report

Merging #1088 (842b149) into main (cf085e4) will increase coverage by 0.03%.
The diff coverage is 91.42%.

❗ Current head 842b149 differs from pull request most recent head c4e4a7b. Consider uploading reports for the commit c4e4a7b to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1088      +/-   ##
==========================================
+ Coverage   73.46%   73.49%   +0.03%     
==========================================
  Files         364      364              
  Lines       47667    47809     +142     
==========================================
+ Hits        35018    35138     +120     
- Misses      12649    12671      +22     
Impacted Files Coverage Δ
include/cantera/base/Units.h 100.00% <ø> (ø)
include/cantera/base/YamlWriter.h 100.00% <ø> (ø)
include/cantera/base/ctexceptions.h 73.68% <ø> (ø)
include/cantera/kinetics/GasKinetics.h 100.00% <ø> (ø)
include/cantera/kinetics/InterfaceKinetics.h 100.00% <ø> (ø)
include/cantera/kinetics/KineticsFactory.h 100.00% <ø> (ø)
include/cantera/kinetics/ReactionRate.h 61.33% <ø> (ø)
include/cantera/thermo/ThermoFactory.h 100.00% <ø> (ø)
src/kinetics/ReactionRate.cpp 94.25% <ø> (ø)
test/kinetics/rates.cpp 100.00% <ø> (ø)
... and 25 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf085e4...c4e4a7b. Read the comment docs.

@ischoegl ischoegl force-pushed the small-fixes branch 3 times, most recently from 2c0cc44 to 1b11a99 Compare August 24, 2021 00:34
@ischoegl ischoegl changed the title Small fixes and improvements Small fixes and improvements / sparse coefficient matrices Aug 24, 2021
@ischoegl ischoegl requested a review from a team August 24, 2021 01:29
@ischoegl
Copy link
Member Author

ischoegl commented Aug 24, 2021

@speth ... as #1081 was starting to exceed 2k lines, I decided to push a portion that includes relevant changes to the API first. In addition, there is some streamlining of code that arose in the context of #1051.

'scons help' reports setting values in terms of 'yes' and 'no' rather than
boolean flags that are otherwise used internally. This commit ensures that
documentation for the newly added flag 'legacy-rate-constants' is
consistent with this convention; accordingly, the setting is either enabled
or disabled.
Compilation with GNU g++ currently raises the following warning.

warning: 'nSolnValues' may be used uninitialized in this function [-Wmaybe-uninitialized]

The warning is resolved by providing an (invalid) initial value; the
value is overwritten by a subsequent (pre-existing) if-else tree.
Resolve GNU g++ warning:

warning: unused variable 'p' [-Wunused-variable]
Copy link
Member

@speth speth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the addition of the option to use sparse matrices for the stoichiometric coefficients, but I think there are a few tweaks needed to get it to make full use of the sparsity.

The one change here that I don't really understand is to the data structure used to initialize Plog objects.

include/cantera/cython/wrappers.h Outdated Show resolved Hide resolved
src/kinetics/RxnRates.cpp Outdated Show resolved Hide resolved
src/kinetics/RxnRates.cpp Outdated Show resolved Hide resolved
src/kinetics/RxnRates.cpp Outdated Show resolved Hide resolved
include/cantera/numerics/eigen_defs.h Outdated Show resolved Hide resolved
interfaces/cython/cantera/kinetics.pyx Outdated Show resolved Hide resolved
interfaces/cython/cantera/kinetics.pyx Outdated Show resolved Hide resolved
@ischoegl
Copy link
Member Author

ischoegl commented Sep 3, 2021

@speth ... thank you for the comments. I'll respond to some of your points right away, and should be able to get back to this before long.

@ischoegl ischoegl force-pushed the small-fixes branch 2 times, most recently from f6d7434 to a969367 Compare September 5, 2021 02:15
* Use consistent interface based on Array2D
* Use column-major consistently in internal calculations
* Deprecate Chebyshev::coeffs and Chebyshev::setup
Several internal methods defined for C1, C2, C3, and C_AnyN are unused;
as there are no simple accessor methods defined in StoichManagerN,
deprecation appears not to be required.
The 'Kinetics::finalizeSetup' method executes code after all reactions
have been added. The new flag 'finalize' is added to 'Kinetics::addReaction'
which determines whether 'finalizeSetup' is executed. While the default
is 'true', imports within 'KineticsManager' use 'addReaction' with the
flag set to 'false'.
as Kinetics::m_irrevProductManager is only used in conjunction with
Kinetics::m_revProductManager, a new Kinetics::m_productManager provides
for simpler code.
Deprecate Tmin/Tmax, Pmin/Pmax as well as constructors to use
std::pair<double,double> input/output
Deprecate Kinetics.*_stoich_coeffs() method in favor of property:
Behavior will change after Cantera 2.6; for new behavior, new properties
Kinetics.*_stoich_coefficients are implemented
This commit unifies parameter handling of Plog rate setters and
getters to use std::multimap<...>.
@ischoegl
Copy link
Member Author

ischoegl commented Sep 5, 2021

@speth ... thanks again for the feedback (and I'm happy you liked the sparse coefficient interface, despite the naming squabbles).

I believe I took care of everything you mentioned thus far. The interface to Eigen::SparseMatrix no longer uses excessive memory and the Plog/multimap issue is taken care of. Regarding the Python coefficient getter, I opted for a transitional name, so the old names won't change long-term.

One new significant change is that I streamlined Chebyshev to consistently use range pairs, e.g.

In [1]: import cantera as ct
   ...: gas = ct.Solution("h2o2.yaml")

In [2]:  rxn = ct.ChebyshevReaction(
   ...:             equation="HO2 <=> OH + O",
   ...:             rate={"temperature-range": [290.0, 3000.0],
   ...:                   "pressure-range": [1e3, 1e8],
   ...:                   "data": [[8.2883, -1.1397, -0.12059, 0.016034],
   ...:                            [1.9764, 1.0037, 7.2865e-03, -0.030432],
   ...:                            [0.3177, 0.26889, 0.094806, -7.6385e-03]]},
   ...:             kinetics=gas)

In [3]: rxn.rate.temperature_range # previously rxn.Tmin / rxn.Tmax
Out[3]: (290.0, 3000.0)

In [4]: rxn.rate.pressure_range # previously rxn.Pmin / rxn.Pmax
Out[4]: (1000.0, 100000000.0)

with corresponding changes in the C++ underpinnings (setters/getters/constructors).

Edit: Writing this, I realized that rxn.rate.coeffs really should be rxn.rate.data to be consistent with YAML, which is now fixed:

In [5]: rxn.rate.data  # previously rxn.coeffs
Out[5]: 
array([[ 8.2883e+00, -1.1397e+00, -1.2059e-01,  1.6034e-02],
       [ 1.9764e+00,  1.0037e+00,  7.2865e-03, -3.0432e-02],
       [ 3.1770e-01,  2.6889e-01,  9.4806e-02, -7.6385e-03]])

Finally, the transitional names just end with 3, i.e.

In [6]: ct.use_sparse() # enable sparse output via scipy.sparse
   ...: gas.reactant_stoich_coeffs3
   ...: 
Out[6]: 
<10x29 sparse matrix of type '<class 'numpy.float64'>'
	with 50 stored elements in COOrdinate format>

@ischoegl ischoegl requested a review from speth September 5, 2021 16:32
@ischoegl
Copy link
Member Author

ischoegl commented Sep 13, 2021

🎉 ... finally found something that will help exposing sparse matrices without copying ... it does look like scipy.sparse.csc_matrix can be initialized from the data structure used by compressed sparse matrices inEigen

csc_matrix((data, indices, indptr), [shape=(M, N)])

is the standard CSC representation where the row indices for column i are stored in indices[indptr[i]:indptr[i+1]] and their corresponding values are stored in data[indptr[i]:indptr[i+1]]. If the shape parameter is not supplied, the matrix dimensions are inferred from the index arrays.

@ischoegl
Copy link
Member Author

Switching to CSC (from COO) format in scipy.sparse leverages the internal Eigen storage order, and has some impact on performance:

In [2]: gas = ct.Solution("gri30.yaml")
   ...: dense = gas.reactant_stoich_coeffs3

In [3]: %timeit gas.reactant_stoich_coeffs3
10000 loops, best of 3: 64.3 us per loop

In [4]: ct.use_sparse()

In [5]: sparse = gas.reactant_stoich_coeffs3
   ...: sparse
Out[5]: 
<53x325 sparse matrix of type '<class 'numpy.float64'>'
	with 626 stored elements in Compressed Sparse Column format>

In [6]: (sparse == dense).all()
Out[6]: True

In [7]: %timeit gas.reactant_stoich_coeffs3
The slowest run took 4.78 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 48 us per loop

In [8]: %timeit gas.reactant_stoich_coeffs3_prev # previous implementation outputting COO matrix
The slowest run took 9.96 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 61.6 us per loop

These tests show a 20% speed improvement with CSC (not a major saving); CSC is, however, better suited for matrix-vector products, so the switch makes sense overall.

This change leverages the default internal storage format in Eigen (CSC)
Copy link
Member

@speth speth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, @ischoegl. I'm glad to see the reasonably efficient interface for exposing Eigen's sparse matrices to Python. I think that bodes well for some of the other things that I expect are on the horizon like Jacobians.

The change from Plog::setup to Plog::setRates makes sense, although I think the new function name implies that you could call it again to replace the existing rates and the current implementation doesn't support that as it leaves the existing contents of rates_ and pressures_ in tact.

I don't really understand the motivation for packing Chebyshev rate Tmin/Tmax and Pmin/Pmax into std::pair. All the extra use of std::make_pair and foo.first/foo.second that's now required doesn't really seem like an improvement over the old interface. In Python, on the other hand, implicit tuple packing/unpacking is clean and idiomatic, so I think that's a reasonable API change.

include/cantera/kinetics/StoichManager.h Show resolved Hide resolved
include/cantera/cython/wrappers.h Outdated Show resolved Hide resolved
include/cantera/kinetics/StoichManager.h Outdated Show resolved Hide resolved
include/cantera/kinetics/Kinetics.h Outdated Show resolved Hide resolved
include/cantera/kinetics/RxnRates.h Show resolved Hide resolved
include/cantera/kinetics/RxnRates.h Outdated Show resolved Hide resolved
include/cantera/kinetics/RxnRates.h Show resolved Hide resolved
include/cantera/kinetics/StoichManager.h Outdated Show resolved Hide resolved
interfaces/cython/cantera/_cantera.pxd Outdated Show resolved Hide resolved
include/cantera/kinetics/Kinetics.h Outdated Show resolved Hide resolved
@ischoegl
Copy link
Member Author

ischoegl commented Sep 16, 2021

@speth ... thank you for your comments!

The change from Plog::setup to Plog::setRates makes sense, although I think the new function name implies that you could call it again to replace the existing rates and the current implementation doesn't support that as it leaves the existing contents of rates_ and pressures_ in tact.

Good catch! Some of the renaming here pre-empts #1051 (which is currently on hold), so it will be possible to change that eventually even in memory.

I don't really understand the motivation for packing Chebyshev rate Tmin/Tmax and Pmin/Pmax into std::pair. All the extra use of std::make_pair and foo.first/foo.second that's now required doesn't really seem like an improvement over the old interface. In Python, on the other hand, implicit tuple packing/unpacking is clean and idiomatic, so I think that's a reasonable API change.

I partially reverted those changes to C++. I guess I wanted to make things as similar as possible for YAML / Python and C++, but for the latter I agree that the resulting API change wasn't great. The updated Python API remains of course.

Regarding Kinetics::finalizeSetup() I see your point, but there is indeed a need to call it at a different time than Kinetics::init() ... let me know if you can think of a better name, but initialize would be imho too confusing.

@ischoegl
Copy link
Member Author

ischoegl commented Sep 17, 2021

@speth ... after settling on Kinetics::resizeReactions (and StoichManagerN::resizeCoeffs), I believe everything is taken care of and this should be ready to be merged. Thanks for the review and the suggestions!

Update nomenclature for previously introduced finalizeSetup method;
new names are Kinetics::resizeReactions and StoichManagerN::resizeCoeffs
Copy link
Member

@speth speth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @ischoegl! This looks good to me.

@speth speth merged commit c4d6ecc into Cantera:main Sep 17, 2021
@ischoegl ischoegl deleted the small-fixes branch September 17, 2021 16:17
@ischoegl
Copy link
Member Author

@speth ... thanks again for the review. Now that this is merged, I will post a new PR wrapping up GasKinetics (much of this is done, but I'll have to clean up some of the commit history).

None of my other pending PR's are close to being merged, as most need to be rebased at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants