Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dimensionality #146

Merged
merged 6 commits into from
Mar 29, 2022
Merged

Conversation

iangrooms
Copy link
Member

This PR addresses the problem of dimensionality described in #136. A quick summary: If a user tries any of the REGULAR Laplacians and uses a value of dx_min other than 1, they will get wrong answers. This PR will allow users to put in dimensional values of dx_min and filter_scale (in any system of units) and still get the right answers.

I am attempting to be minimally invasive. I've added a boolean class attribute is_nondimensional to all of the Laplacians. If the Laplacian is nondimensional, then at each step of the filter (in filter.py) it re-dimensionalizes by dividing by dx_min ** 2. The filter_func's don't have access to dx_min so I added dx_min_sq = dx_min ** 2 to the filter spec.

I initially started with @NoraLoose's suggestion (from #136) of defining new Laplacian classes, but as I worked on it it seemed like it would require bigger changes. For example, _create_filter_func accepts a BaseScalarLaplacian and I didn't want to have to define four new versions of _create_filter_func (scalar vs vector; dimensional vs non).

I also added a Note in the Basic Filtering section of the docs, saying that any grid_vars should have units that are consistent with filter_scale and dx_min, but that the specific system of units doesn't matter.

Adds a boolean `is_nondimensional` to the Laplacian classes. When True, the filter scales by `dx_min ** 2` to restore dimensionality. Had to add `dx_min_sq` (equals `dx_min ** 2` to the`filter_spec` to get it to work.
@iangrooms iangrooms added the bug Something isn't working label Feb 15, 2022
@iangrooms iangrooms linked an issue Feb 15, 2022 that may be closed by this pull request
ufield_bar += (
utemp_l * 2 * np.real(s_b) / np.abs(s_b) ** 2
+ utemp_b * 1 / np.abs(s_b) ** 2
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 271-274 should not be part of the if-statement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! That would have been bad.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering why the tests did not catch this? 😕

Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to see this fix Ian! I can see that this resolves the original bug.

I'd like to brainstorm a bit whether there is a simpler way to do this, which doesn't require us to carry around this is_nondimensional attribute with all of the Laplacians.

Looking at the code I am struck by the fact that the is_nondimensional condition seems to somehow be closely tied to the value of dx_min. Would it be possible to simply define a nondimensional Laplacian as a Laplaican for which dx_min==1? Are there scenarios that would break this assumption.

A corollary of this is the suggestion that perhaps we want to attach dx_min to the Laplacian / grid, rather than to the Filter itself. After all, dx_min is a property of the grid itself, not the filter.

Thanks for considering these comments. I'm just starting to really dig into this, but I know you have already through this through deeply.

@NoraLoose
Copy link
Member

NoraLoose commented Feb 16, 2022

Great work, Ian!

If we stick to the is_nondimensional attribute, we should add some tests to this PR. In an ideal scenario, we want to test every newly added line to the code. Based on this, I can think of 2 tests:

  1. A test that checks that the is_nondimensional attribute is there for each Laplacian, and has the expected value (e.g., to make sure a future PR does not accidentally change is_nondimenional from True to False for a Laplacian). Such a test could be added to test_kernels.py and could follow this structure:
@pytest.mark.parametrize(Laplacian, expected)
    ...

def test_nondimensional(Laplacian, expected):

    laplacian = Laplacian()
    assert laplacian.is_nondimensional == expected
  1. A test for the added lines in filter.py. For example, we could test if the filter gives the same thing if we put in values for dx_min, area etc. in cm vs. km.

Let me know if you would like help with these tests. But obviously, before implementing these test cases in test_*, we should agree on how we want to implement the code changes in kernels.py and filter.py.

@iangrooms
Copy link
Member Author

Replying to @rabernat:

Would it be possible to simply define a nondimensional Laplacian as a Laplacian for which dx_min==1?

The nondimensional Laplacians don't just have minimum grid spacing 1; they have every cell being a unit square. Identifying regular Laplacians with dx_min==1 could break if, for example, there was a variable grid whose minimum was coincidentally 1 km (or 1 m or 1 cm, etc). The user would set dx_min=1 and then the code would mistakenly think it was using a regular/nondimensional Laplacian.

perhaps we want to attach dx_min to the Laplacian / grid, rather than to the Filter itself

The Filter does need to know about dx_min. The Filter object obtains a polynomial approximation that is accurate over a range of wavenumbers, with the highest wavenumber in the range set by dx_min.

perhaps we want to attach dx_min to the Laplacian / grid, rather than to the Filter itself

As noted above, the filter needs dx_min. But maybe your suggestion points to a different approach. My proposed fix has the same effect as changing the grid size in the regular Laplacians from 1 to dx_min, but without having to change the actual Laplacian kernel code. Instead of having the Laplacian kernel know about the grid size (which would be natural), I hacked _create_filter_func to fix the output of the nondimensional Laplacians. Not sure this is the best way, but it does seem to work.

An more-natural alternative would be to add dx_min (or maybe just dx) to the grid_vars for the REGULAR Laplacians, and then change the code of the Laplacian kernels so that they divide by dx ** 2 right before returning the result. I didn't choose this path because it would break all of the code that currently uses the REGULAR Laplacians, in addition to requiring a fairly major overhaul of kernels.py. If there's a strong argument for doing it this way even if it breaks backwards compatibility, then I'm happy to code it up. Curious to hear what you think.

@iangrooms
Copy link
Member Author

I added some tests as suggested by @NoraLoose. In test_kernels.py I add a test that just checks if the is_nondimensional attribute has the correct value. In test_filters.py I run the REGULAR filter on test data with dx_min=1, filter_scale=4 and again with dx_min=2 and filter_scale=8, and make sure the results are the same.

@codecov-commenter
Copy link

codecov-commenter commented Feb 25, 2022

Codecov Report

Merging #146 (85f38a8) into master (766f624) will decrease coverage by 0.50%.
The diff coverage is 88.00%.

@@            Coverage Diff             @@
##           master     #146      +/-   ##
==========================================
- Coverage   98.49%   97.98%   -0.51%     
==========================================
  Files           9        9              
  Lines         997     1044      +47     
==========================================
+ Hits          982     1023      +41     
- Misses         15       21       +6     
Flag Coverage Δ
unittests 97.98% <88.00%> (-0.51%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
gcm_filters/filter.py 96.46% <70.00%> (-2.58%) ⬇️
gcm_filters/kernels.py 98.61% <100.00%> (+0.04%) ⬆️
tests/test_filter.py 99.56% <100.00%> (+0.01%) ⬆️
tests/test_kernels.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 766f624...85f38a8. Read the comment docs.

@iangrooms
Copy link
Member Author

The reason code coverage decreased is that I implemented the fix for non-dimensional vector Laplacians, but we don't have any non-dimensional vector Laplacians, so those lines don't get touched by the tests.

@rabernat
Copy link
Contributor

rabernat commented Mar 3, 2022

Hi Ian! Thanks for continuing to work on this PR. I have been a bit overwhelmed with getting ready for / participating in OSM for the past two weeks. Next week I intend to spend some time reviewing this PR in detail. (I am still confused about some parts of this and need to take some time to really understand what is happening.) Thanks for your patience.

Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great Ian. Sorry for taking so long to review this. I really appreciate your patience.

A minor nit, I would prefer dimensional = True, rather is_nondimensional = False, since i find double negatives to be confusing. If you agree, you can change the name and invert the parameter. But this is annoying and not that important, so feel free to ignore that suggestion.

The only real suggestion I have is a minor performance optimization.

@@ -191,13 +194,18 @@ def filter_func(field, *args):
if filter_spec.is_laplacian[i]:
s_l = np.real(filter_spec.s[i])
tendency = laplacian(field_bar) # Compute Laplacian
if Laplacian.is_nondimensional:
tendency /= filter_spec.dx_min_sq # dimensionalize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to be able to skip this step if dx_min_sq==1, which will probably often be the case.

Perhaps we could create a helper function like this

def _maybe_dimensionalize(data, dx_min_sq):
    if ds_min_sq == 1:
        return data
    else:
        return data / dx_min_sq

and use it wherever the normalization occurs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great Ian. Sorry for taking so long to review this. I really appreciate your patience.

No problem!

A minor nit, I would prefer dimensional = True, rather is_nondimensional = False, since i find double negatives to be confusing. If you agree, you can change the name and invert the parameter. But this is annoying and not that important, so feel free to ignore that suggestion.

Good idea. Avoiding the double negative will not fail (!) to make it clearer.

The only real suggestion I have is a minor performance optimization.

I don't understand how adding a function would improve performance here, probably because I have a wrong idea for how to use the new function.

But I do agree that the if statement is going to slow the whole code down unnecessarily. How about wrapping the whole block iteration in an if statement? If dimensional then we iterate as-is; it not dimensional then we divide by dx_min_sq at every iteration. Right now the if statement hits every time we use the Laplacian. If we instead put the test outside the loop then the if statement only hits once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than saying

tendency /= filter_spec.dx_min_sq 

you would say

tendency = _maybe_nondimensionalize(tendency, filter_spec.dx_min_sq)

You could accomplish the same thing with yet another nested if block, but I just find that syntax a little easier. Calling the function and then immediately returning the same array back is cheap. The expensive part is dividing a big array by 1 unnecessarily; that's what we want to avoid.

Alternatively, you could change all of the

if Laplacian.is_nondimensional:

calls to

if Laplacian.is_nondimensional and filter_spec.dx_min_sq != 1:

to accomplish the same thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. I thought it was the if statement that was the problem.

I think your fix is cleaner than mine, so let's go with that. I can work on both the double negative and the maybe_nondimensionalize and push something later today

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. When you're done feel free to self-merge so this is not blocking on me any longer.

Since black is evidently broken at the moment, I had to also pin version 8.0.4.
Once it gets fixed this can be reverted.
@iangrooms
Copy link
Member Author

Had some problems related to this psf/black#2964. My changes to the pre-commit configuration can be rolled back once this is fixed.

@iangrooms iangrooms merged commit e4177da into ocean-eddy-cpt:master Mar 29, 2022
@iangrooms iangrooms deleted the fix_dimensionality branch March 29, 2022 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Inconsistent dimensionality of discrete Laplacians
4 participants