Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement C++ SolutionArray::save for CSV output #1508

Merged
merged 19 commits into from
Jun 22, 2023

Conversation

ischoegl
Copy link
Member

@ischoegl ischoegl commented Jun 20, 2023

Changes proposed in this pull request

  • Move CSV file output to C++ core
  • Escape component names or values containing commas for CSV output (per read_csv does not work for species name with a comma #1372 (comment))
  • CSV files can be read in Python using read_csv, where pandas is used by default (pandas reads CSV correctly); the fallback np.genfromtxt does not handle escaped CSV entries.
  • Deprecate Python-specific SolutionArray.write_csv and Sim1D.write_csv in favor of C++ backed save methods.

SolutionArray::restore for CSV should be addressed by a separate PR (not necessarily before Cantera 3.0).

If applicable, fill in the issue number this pull request is fixing

Addresses Cantera/enhancements#163
Closes #1372 (as long as pandas is installed`)

If applicable, provide an example illustrating new features this pull request is introducing

In [1]: import cantera as ct
   ...: import numpy as np
   ...: import pandas as pd

In [2]: gas = ct.Solution("h2o2.yaml")
   ...: extra = {"foo": range(4), "bar": range(4), "spam,eggs": "ab"}
   ...: states = ct.SolutionArray(gas, 4, extra=extra)
   ...: states.TPX = np.linspace(300, 1000, 4), 2e5, "H2:0.5, O2:0.4"
   ...: states.equilibrate("HP")
   ...: states
Out[2]:
          T         D          H2           H          O  ...          HO2         H2O2   AR   N2  foo  bar  spam,eggs
0   3070.79  0.149421  0.00550722  0.00167792  0.0372227  ...  0.000216387  1.18600e-05    0    0    0    0   ab
1   3110.37  0.145635  0.00614661  0.00200081  0.0424898  ...  0.000227334  1.21419e-05    0    0    1    1   ab
2   3148.44  0.141954  0.00679929  0.00235917  0.0481375  ...  0.000237674  1.23682e-05    0    0    2    2   ab
3   3185.20  0.138365  0.00746197  0.00275457  0.0541804  ...  0.000247395  1.25405e-05    0    0    3    3   ab

[4 rows x 15 components; state='TDY']

In [3]: states.save("test.csv", overwrite=True, basis="mole")
   ...:
/path/to/cantera/composite.py:1281: UserWarning: SolutionArray::writeEntry: One or more CSV column names or values contain commas.
Values have been escaped with double quotes, which may not be supported by all CSV readers.
  self._cxx_save(fname, name, key, description, overwrite, compression, basis)

In [4]: b = ct.SolutionArray(gas)
   ...: b.read_csv("test.csv")
   ...: b
   ...:
Out[4]:
          T         D          H2           H          O  ...          HO2         H2O2   AR   N2  foo  bar  spam,eggs
0   3070.79  0.149421  0.00550722  0.00167792  0.0372227  ...  0.000216387  1.18600e-05    0    0    0    0   ab
1   3110.37  0.145635  0.00614661  0.00200081  0.0424898  ...  0.000227334  1.21419e-05    0    0    1    1   ab
2   3148.44  0.141954  0.00679929  0.00235917  0.0481375  ...  0.000237674  1.23682e-05    0    0    2    2   ab
3   3185.20  0.138365  0.00746197  0.00275457  0.0541804  ...  0.000247395  1.25405e-05    0    0    3    3   ab

[4 rows x 15 components; state='TDY']

In [5]: !cat test.csv
T,D,X_H2,X_H,X_O,X_O2,X_OH,X_H2O,X_HO2,X_H2O2,X_AR,X_N2,foo,bar,"spam,eggs"
3070.78619,0.149421341,0.0521084718,0.0317525141,0.044379351,0.188954453,0.119923355,0.562750149,0.000125056118,6.65107699e-06,0,0,0,0,ab
3110.37252,0.145635333,0.0574154238,0.0373791208,0.0500120575,0.185766067,0.127049988,0.542240915,0.000129704536,6.72218875e-06,0,0,1,1,ab
3148.44466,0.141953983,0.0626643924,0.0434857768,0.0559034845,0.182554667,0.133782815,0.521468314,0.000133794021,6.75611595e-06,0,0,2,2,ab
3185.1969,0.138364824,0.0678155603,0.0500678826,0.0620462739,0.179314008,0.140105912,0.500506279,0.000137329823,6.75498517e-06,0,0,3,3,ab

Checklist

  • The pull request includes a clear description of this code change
  • Commit messages have short titles and reference relevant issues
  • Build passes (scons build & scons test) and unit tests address code coverage
  • Style & formatting of contributed code follows contributing guidelines
  • The pull request is ready for review

@codecov
Copy link

codecov bot commented Jun 20, 2023

Codecov Report

Merging #1508 (4800cd4) into main (ca7251d) will increase coverage by 0.03%.
The diff coverage is 76.85%.

@@            Coverage Diff             @@
##             main    #1508      +/-   ##
==========================================
+ Coverage   70.42%   70.46%   +0.03%     
==========================================
  Files         375      375              
  Lines       58285    58401     +116     
  Branches    20820    20897      +77     
==========================================
+ Hits        41050    41152     +102     
- Misses      14224    14229       +5     
- Partials     3011     3020       +9     
Impacted Files Coverage Δ
include/cantera/base/SolutionArray.h 94.44% <ø> (ø)
include/cantera/oneD/Sim1D.h 66.66% <ø> (ø)
src/base/SolutionArray.cpp 78.78% <73.93%> (+0.23%) ⬆️
src/oneD/Sim1D.cpp 70.74% <77.27%> (-0.02%) ⬇️
interfaces/cython/cantera/composite.py 84.04% <88.88%> (+0.65%) ⬆️
interfaces/cython/cantera/_onedim.pyx 81.95% <100.00%> (ø)
interfaces/cython/cantera/onedim.py 83.57% <100.00%> (+0.02%) ⬆️
interfaces/cython/cantera/solutionbase.pyx 89.50% <100.00%> (+0.06%) ⬆️

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ischoegl ischoegl force-pushed the write-csv branch 2 times, most recently from ed7c663 to 435ddd3 Compare June 20, 2023 16:54
@ischoegl ischoegl marked this pull request as ready for review June 20, 2023 17:15
@ischoegl ischoegl requested a review from a team June 20, 2023 17:53
@ischoegl ischoegl marked this pull request as draft June 20, 2023 20:28
@ischoegl ischoegl removed the request for review from a team June 20, 2023 20:28
@ischoegl ischoegl mentioned this pull request Jun 21, 2023
5 tasks
@ischoegl ischoegl marked this pull request as ready for review June 21, 2023 00:32
@ischoegl ischoegl requested a review from a team June 21, 2023 00:32
@ischoegl
Copy link
Member Author

Briefly gave a shot to implement the CSV parsing portion, see comment here Cantera/enhancements#163 (comment) ... I decided that this goes beyond the scope of this PR, so this is ready for a review.

Copy link
Member

@bryanwweber bryanwweber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ingmar! A big task, done well. Just a few small suggestions

include/cantera/oneD/Sim1D.h Outdated Show resolved Hide resolved
interfaces/cython/cantera/_onedim.pyx Show resolved Hide resolved
src/base/SolutionArray.cpp Outdated Show resolved Hide resolved
src/base/SolutionArray.cpp Outdated Show resolved Hide resolved
@ischoegl ischoegl force-pushed the write-csv branch 2 times, most recently from ee5663a to 62290dd Compare June 21, 2023 16:16
Resolve discrepancies of nomenclature in C++/Python
Argument names of SolutionArray IO in C++ and Python were not
consistent, which is resolved in this commit.
@ischoegl
Copy link
Member Author

ischoegl commented Jun 21, 2023

Thanks for the suggestions, @bryanwweber! I rebased and addressed the suggestions. I also took this occasion to improve some of the docstrings that had been somewhat neglected.

In this process, I noticed some discrepancies of nomenclature between C++ and Python methods - the former used the argument id to identify a storage location (inherited from Sim1D code), whereas the latter started with the (now defunct) write_hdf nomenclature, which avoided the Python id and used name instead; similarly container subgroups used sub and key, respectively. I consolidated the nomenclature to use name and sub consistently - as all methods are either new in Cantera 3.0 or are already part of a deprecation process, there aren't any further precautions necessary.

bryanwweber
bryanwweber previously approved these changes Jun 21, 2023
Copy link
Member

@bryanwweber bryanwweber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ischoegl !

@ischoegl
Copy link
Member Author

Thanks, @bryanwweber ... sorry for dismissing your review. Looking over the diff, I concluded that I may as well take care of documentation updates for the associated Sim1D save/restore methods. I believe it's done now though.

bryanwweber
bryanwweber previously approved these changes Jun 21, 2023
Copy link
Member

@bryanwweber bryanwweber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem 😁

@ischoegl
Copy link
Member Author

@bryanwweber ... again 😖. I really should use VSCode to review changes rather than pushing right away ...

@bryanwweber
Copy link
Member

The failing samples are fixed elsewhere?

@ischoegl
Copy link
Member Author

The failing samples are fixed elsewhere?

No. These were new - and are now fixed 😁

Copy link
Member

@bryanwweber bryanwweber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ischoegl
Copy link
Member Author

Thanks @bryanwweber! Will merge later tonight, unless there are more comments.

Copy link
Member

@speth speth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @ischoegl. This all looks good to me.

@ischoegl ischoegl merged commit e458764 into Cantera:main Jun 22, 2023
@ischoegl ischoegl deleted the write-csv branch June 22, 2023 03:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

read_csv does not work for species name with a comma
3 participants