Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of data-processing-level attributes #1001

Merged
merged 13 commits into from
Mar 24, 2023

Conversation

emiliom
Copy link
Collaborator

@emiliom emiliom commented Mar 22, 2023

Draft, rough, partial implementation of data-processing-level attribute insertion, addressing #980. I've got a "representative" core working now, that handles:

  • Assign L1A on the Top-level group of the EchoData object returned by open_raw, but only if Platform lat-lon data are present.
  • Assign L1A after running echodata.update_platform on an echodata object that previously did not have lat-lon data and hence had not been assigned processing level attributes.
  • Add L2A on consolidate.add_location.
  • Add L3A or L3B on commongrid.compute_MVBS. It looks for an existing processing_level attribute in the input Sv dataset. If it's not present, nothing is added. Otherwise, L3A or L3B are assigned based on the input Sv being L2A or L2B, respectively.

Two attributes are inserted: processing_level and processing_level_url. The processing_level value currently looks like this: "Level 1A". For EchoData object (Level 1A), the attributes are inserted into the Top-level group dataset.

The functionality is implemented as a decorator function, add_processing_level, with two one optional arguments in addition to the required processing level code. There is also a complementary function insert_input_processing_level that must be called in the decorated function to enable sublevel "propagation" -- eg, L2A > L3A, L2B > L3B. I placed all the new functionality in the existing module utils/prov.py; alternatively, there are good arguments for creating a new module, say, utils/processinglevels.py

See https://github.com/uw-echospace/data-processing-levels/blob/main/data_processing_levels.md for the draft description of data processing levels (and discussions about it here), and #817 (comment) for a listing of data processing functions that will ultimately receive this feature.

The code is full of comments for myself at this point, though mostly outside actual functions. I'll set it as draft. As is, it should give you a good idea of how it works and what it covers.

No tests yet, though I've run it locally, manually, on a couple of specific cases. Tomorrow.

@emiliom emiliom added enhancement This makes echopype better product levels labels Mar 22, 2023
@emiliom emiliom added this to the 0.7.0 milestone Mar 22, 2023
@emiliom emiliom requested a review from leewujung March 22, 2023 00:52
@emiliom emiliom self-assigned this Mar 22, 2023
@emiliom emiliom marked this pull request as draft March 22, 2023 01:02
@codecov-commenter
Copy link

codecov-commenter commented Mar 22, 2023

Codecov Report

Merging #1001 (8f1bcab) into dev (203d650) will decrease coverage by 30.93%.
The diff coverage is 54.79%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff             @@
##              dev    #1001       +/-   ##
===========================================
- Coverage   79.98%   49.05%   -30.93%     
===========================================
  Files          66       47       -19     
  Lines        5870     4813     -1057     
===========================================
- Hits         4695     2361     -2334     
- Misses       1175     2452     +1277     
Flag Coverage Δ
unittests 49.05% <54.79%> (-30.93%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
echopype/consolidate/api.py 73.33% <0.00%> (-21.19%) ⬇️
echopype/convert/api.py 74.26% <0.00%> (-14.55%) ⬇️
echopype/echodata/echodata.py 0.00% <0.00%> (-78.85%) ⬇️
echopype/mask/api.py 69.29% <33.33%> (-13.11%) ⬇️
echopype/commongrid/api.py 72.91% <40.00%> (-22.54%) ⬇️
echopype/utils/prov.py 58.94% <60.71%> (-33.56%) ⬇️
echopype/clean/api.py 81.25% <100.00%> (+2.67%) ⬆️

... and 55 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@emiliom
Copy link
Collaborator Author

emiliom commented Mar 22, 2023

I got the decorator working on the echodata.update_platform class method, too!

@leewujung
Copy link
Member

Thanks @emiliom, I did a quick scan through the code -- it's exciting to see these functionalities implemented!

My only comment at this moment is having to call insert_input_processing_level separately within the decorated function. Since you have access to and manipulate the dataset within the decorated function already in the decorator, it seems that you could have this as an optional argument that is a flag for whether or not to insert the input processing level attribute? The function does not have to change, but it would then be called under the decorator and not the decorated function.

@emiliom
Copy link
Collaborator Author

emiliom commented Mar 22, 2023

Thanks @emiliom, I did a quick scan through the code -- it's exciting to see these functionalities implemented!

Thanks! Yeah, isn't it? I was thrilled to also be able to reuse the same decorator function with class methods.

My only comment at this moment is having to call insert_input_processing_level separately within the decorated function. Since you have access to and manipulate the dataset within the decorated function already in the decorator, it seems that you could have this as an optional argument that is a flag for whether or not to insert the input processing level attribute? The function does not have to change, but it would then be called under the decorator and not the decorated function.

Definitely, having to explicitly call that function within the decorated function is not ideal (though not awful, either). The challenge with folding this into the decorator function is having access to the "input" xr.Dataset. The decorator function does have access to all inputs, but it doesn't know which function argument is which. I suppose that if there's a common pattern for the position of the input dataset (or the use of a keyword argument), the decorator function could be made to work. For example, in compute_MVBS the input xr.Dataset is the first argument, so args[0] could be used. I'd have to examine all target functions to see if that's always the case; if it is, that'll be wonderful! I'll take a look, but I suspect things won't be so simple.

Either way, these implementation decisions are not user-facing, so in principle we could go with one scheme for 0.7.0 and change it later if a better option is identified.

…ed level or sublevel propagation forms like L*A or L2*; other cleanups to the function
@emiliom
Copy link
Collaborator Author

emiliom commented Mar 23, 2023

In the latest commits:

  • Added the decorator to clean.remove_noise, mask.apply_mask and commongrid.compute_MVBS_index_binning.
  • Introduced a decorator processing_level_code "wildcard" pattern like "LA", "L2", etc (and removed the decorator argument inherit_sublevel, as it's no longer necessary).
    • Sublevel wildcard (eg, L2*): The decorated function's "input" dataset's sublevel (A or B) will be propagated to the function's output dataset. Used with L2 and L3. A function that takes L2A and L2B and produces corresponding L3 products will result in L2A -> L3A and L2B -> L3B.
    • Level wildcard (eg, L*B): The decorated function's "input" dataset's level (2 or 3) will be propagated to the function's output dataset. Used with L2 and L3. A function that takes L2A and L3A and produces corresponding B-sublevel products will result in L2A -> L2B and L3A -> L3B.

I've cleaned up the code in prov.py. I think this PR is ready for a final review and possible merge. I'm removing the draft label.

Regarding the folding of the insert_input_processing_level function into a call from the decorator function: I tried the strategy I mentioned in my previous comments. Using arg[0] does work, but there's a twist. When the corresponding first argument in the decorated functions can be called with an argument name, arg[0] is None and we'd have to rely on a kwarg whose name varies across functions. Let's leave this discussion for 0.7.1, when we can take a fresh look. I can open a new issue about it.

@emiliom emiliom marked this pull request as ready for review March 23, 2023 07:15
@emiliom
Copy link
Collaborator Author

emiliom commented Mar 23, 2023

I just remembered that I haven't added any tests 😞. I'll start working on that.

@emiliom
Copy link
Collaborator Author

emiliom commented Mar 24, 2023

Added tests (in a new, single module) that includes two test files, a hake survey EK60 raw file and an AZFP raw file. The tests check for the correction addition of processing level attributes (or they expected absence) on multiple processing steps along a sequence:

  • open_raw
  • ed.update_platform, if needed
  • compute_Sv
  • consolidate.add_location
  • clean.remove_noise
  • mask.apply_mask on a frequency-difference mask based on: a) Sv with add_location; b) Sv with add_location and remove_noise
  • compute_MVBS

@leewujung This PR is ready for review! The only final change I expect to make, other than those based on your input, is on the final url for processing_level_url.

Copy link
Member

@leewujung leewujung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emiliom : Thanks for the PR -- My comments are small and IMO this is ready to go!

BTW, I like the test scheme with _presence_test and _absense_test :)

Are you thinking the addition of processing_level_url will be another PR or in this one?

@emiliom
Copy link
Collaborator Author

emiliom commented Mar 24, 2023

Are you thinking the addition of processing_level_url will be another PR or in this one?

I think that'll be the cleanest. That way we can get this PR into dev ASAP.

In the meantime, I'll go through your comments now.

Co-authored-by: Wu-Jung Lee <leewujung@gmail.com>
@emiliom
Copy link
Collaborator Author

emiliom commented Mar 24, 2023

Alright, the sv_ds to Sv_ds change is in. I'll merge the PR once the CI tests are complete.

Then I'll submit another PR updating processing_level_url, within the hour.

@emiliom
Copy link
Collaborator Author

emiliom commented Mar 24, 2023

BTW, I have no idea what the windows build test failure is. I've never looked at that GH action.

@leewujung
Copy link
Member

Yeah, no idea why that particular windows test failed. The other windows test passed...!

@emiliom emiliom merged commit e9a5d86 into OSOceanAcoustics:dev Mar 24, 2023
@emiliom emiliom deleted the processing_level_attrs branch March 24, 2023 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement This makes echopype better
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants