Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3pt] One huc processes its own branches #806

Merged
merged 30 commits into from
Feb 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
1237362
One huc processes its own branches
RobHanna-NOAA Jan 30, 2023
273a988
Update CHANGELOG.md
RobHanna-NOAA Jan 30, 2023
559a959
quick text update
RobHanna-NOAA Jan 30, 2023
dd5696a
Merge branch 'dev-huc-n-branches' of https://github.com/NOAA-OWP/inun…
RobHanna-NOAA Jan 30, 2023
4228081
Update CHANGELOG.md
RobHanna-NOAA Jan 30, 2023
05ec5bb
Move and renamed process_unit_wb
RobHanna-NOAA Jan 31, 2023
e42f2a4
Merge branch 'dev-huc-n-branches' of https://github.com/NOAA-OWP/inun…
RobHanna-NOAA Jan 31, 2023
29c2584
update changelog
RobHanna-NOAA Jan 31, 2023
1493259
Various bugs and updates
RobHanna-NOAA Feb 1, 2023
cab7268
Update README.md
RobHanna-NOAA Feb 1, 2023
7c78ba2
renamed (removed) old file names
RobHanna-NOAA Feb 1, 2023
2f2812d
Merge branch 'dev-huc-n-branches' of https://github.com/NOAA-OWP/inun…
RobHanna-NOAA Feb 1, 2023
6ca24d4
updates based on changes to the deny list file names
RobHanna-NOAA Feb 1, 2023
58a0a68
Updates for the deny list files name changes
RobHanna-NOAA Feb 1, 2023
075f474
Update README.md
RobHanna-NOAA Feb 1, 2023
0a6b7f4
Update bash_variables.env
RobHanna-NOAA Feb 1, 2023
88fa5d2
Update README.md
RobHanna-NOAA Feb 1, 2023
ca9a393
Update CHANGELOG.md
RobHanna-NOAA Feb 1, 2023
9958538
Misc Fixes
RobHanna-NOAA Feb 2, 2023
9bd3657
improved error message for branch list aggregate
RobHanna-NOAA Feb 2, 2023
b3b5d51
Update CHANGELOG.md
RobHanna-NOAA Feb 3, 2023
9ab4587
change logging and error trapping
RobHanna-NOAA Feb 7, 2023
c1df364
Merge branch 'dev-huc-n-branches' of https://github.com/NOAA-OWP/inun…
RobHanna-NOAA Feb 7, 2023
a8142d7
Update CHANGELOG.md
RobHanna-NOAA Feb 7, 2023
c12ec82
fixed small bug in searching for branch log errors
RobHanna-NOAA Feb 8, 2023
e4b0669
Merge branch 'dev-huc-n-branches' of https://github.com/NOAA-OWP/inun…
RobHanna-NOAA Feb 8, 2023
8a431c1
fixed output print for incorrect path/file name
RobHanna-NOAA Feb 8, 2023
44917da
Quick text updates
RobHanna-NOAA Feb 9, 2023
64e943c
was stripping leading zeros
RobHanna-NOAA Feb 10, 2023
6e52f66
Update CHANGELOG.md
RobHanna-NOAA Feb 13, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 16 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,10 @@ aws s3 ls s3://noaa-nws-owp-fim/ --request-payer requester

Download a directory of outputs for a HUC8:
```
aws s3 cp --recursive s3://noaa-nws-owp-fim/hand_fim/fim_3_0_34_1/outputs/fr/12090301 12090301 --request-payer requester
aws s3 cp --recursive s3://noaa-nws-owp-fim/hand_fim/outputs/fim_4_0_18_02/12090301 /your_local_folder_name/12090301 --request-payer requester
```
**Note**: There may be newer editions than fim_3_0_34_1, and it is recommended to adjust the command above for the latest version.
By adjusting pathing, you can also download entire directories such as the fim_4_0_18_0 folder.
**Note**: There may be newer editions than fim_4_0_18_0, and it is recommended to adjust the command above for the latest version.


## Running the Code
Expand All @@ -60,32 +61,34 @@ Input data can be found on the ESIP S3 Bucket (see "Accessing Data through ESIP
This software is configurable via parameters found in the `config` directory. Copy files before editing and remove "template" pattern from the filename.
Make sure to set the config folder group to 'fim' recursively using the chown command. Each development version will include a calibrated parameter set of manning’s n values.
- `params_template.env`
- `mannings_default.json`
- must change filepath in `params_template.env` in `manning_n` variable name

This system has an optional tool called the `calibration database tool`. In order to use this system, you will need to install the calibration database service or disable it in the `params_template.env` file. See [calibration tool README](https://github.com/NOAA-OWP/inundation-mapping/blob/dev/tools/calibration-db/README.md) for more details.

### Produce HAND Hydrofabric
```
gms_pipeline.sh -u <huc8> -n <name_your_run>
fim_pipeline.sh -u <huc8> -n <name_your_run>
```
- There are a wide number of options and defaulted values, for details run ```gms_pipeline.sh -h```
- There are a wide number of options and defaulted values, for details run ```fim_pipeline.sh -h```.
- Manditory arguments:
- `-u` can be a single huc, a series passed in quotes space delimited, or a line-delimited file
i. To run entire domain of available data use one of the ```/data/inputs/included_huc[4,6,8].lst``` files or a huc list file of your choice.
- `-u` can be a single huc, a series passed in quotes space delimited, or a line-delimited (.lst) file. To run the entire domain of available data use one of the ```/data/inputs/included_huc8.lst``` files or a HUC list file of your choice. Depending on the performance of your server, especially the number of CPU cores, running the full domain can take multiple days.
- `-n` is a name of your run (only alphanumeric)
- Outputs can be found under ```/data/outputs/<name_your_run>```
- Outputs can be found under ```/data/outputs/<name_your_run>```.

Processing of HUC's in FIM4 (GMS) comes in two pieces: gms_run_unit and gms_run_branch. `gms_pipeline.sh` above takes care of both steps however, you can run each part seperately for faster development if you like.
Processing of HUC's in FIM4 comes in three pieces. You can run `fim_pipeline.sh` which automatically runs all of three major section, but you can run each of the sections independently if you like. The three sections are:
- `fim_pre_processing.sh` : This section must be run first as it creates the basic output folder for the run. It also creates a number of key files and folders for the next two sections.
- `fim_process_unit_wb.sh` : This script processes one and exactly one HUC8 plus all of it's related branches. While it can only process one, you can run this script multiple times, each with different HUC (or overwriting a HUC). When you run `fim_pipeline.sh`, it automatically iterates when more than one HUC number has been supplied either by command line arguments or via a HUC list. For each HUC provided, `fim_pipeline.sh` will `fim_process_unit_wb.sh`. Using the `fim_process_unit_wb.sh` script allows for a run / rerun of a HUC, or running other HUCs at different times / days or even different docker containers.
- `fim_post_processing.sh` : This section takes all of the HUCs that have been processed, aggregates key information from each HUC directory and looks for errors across all HUC folders. It also processes the group in sub-steps such as usgs guages processesing, rating curve adjustments and more. Naturally, running or re-running this script can only be done after running `fim_pre_processing.sh` and at least one run of `fim_process_unit_wb.sh`.

If you choose to do the two step hydrofabric creation, then run `gms_run_unit.sh`, then `gms_run_branch.sh`. See each of those files for details on arguments.
Running the `fim_pipeline.sh` is a quicker process than running all three steps independently.

### Testing in Other HUCs
To test in HUCs other than the provided HUCs, the following processes can be followed to acquire and preprocess additional NHDPlus rasters and vectors. After these steps are run, the "Produce HAND Hydrofabric" step can be run for the new HUCs.

```
/foss_fim/src/acquire_and_preprocess_inputs.py -u <huc4s_to_process>
```
Note: This tool is deprecated, updates will be coming soon.

- `-u` can be a single HUC4, series of HUC4s (e.g. 1209 1210), path to line-delimited file with HUC4s.
- Please run `/foss_fim/src/acquire_and_preprocess_inputs.py --help` for more information.
- See United States Geological Survey (USGS) National Hydrography Dataset Plus High Resolution (NHDPlusHR) [site](https://www.usgs.gov/core-science-systems/ngp/national-hydrography/nhdplus-high-resolution) for more information
Expand All @@ -94,10 +97,11 @@ To test in HUCs other than the provided HUCs, the following processes can be fol
```
/foss_fim/src/preprocess_rasters.py
```
Note: This tool is deprecated, updates will be coming soon.

----
### Evaluating Inundation Map Performance
After `gms_pipeline.sh` completes, you can evaluate the model's skill. The evaluation benchmark datasets are available through ESIP in the `test_cases` directory.
After `fim_pipeline.sh` completes, or combinations of the three major steps described above, you can evaluate the model's skill. The evaluation benchmark datasets are available through ESIP in the `test_cases` directory.

To evaluate model skill, run the following:
```
Expand Down
File renamed without changes.
61 changes: 0 additions & 61 deletions config/deny_gms_branches_dev.lst

This file was deleted.

File renamed without changes.
5 changes: 0 additions & 5 deletions config/params_template.env
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,4 @@ export CALB_DB_KEYS_FILE="/data/config/calb_db_keys.env"
#### computational parameters ####
export ncores_gw=1 # mpi number of cores for gagewatershed
export ncores_fd=1 # mpi number of cores for flow directions
export default_max_jobs=1 # default number of max concurrent jobs to run
RobHanna-NOAA marked this conversation as resolved.
Show resolved Hide resolved
export memfree=0G # min free memory required to start a new job or keep youngest job alive

#### logging parameters ####
export startDiv="\n##########################################################################\n"
export stopDiv="\n##########################################################################"
RobHanna-NOAA marked this conversation as resolved.
Show resolved Hide resolved
71 changes: 71 additions & 0 deletions docs/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,77 @@
All notable changes to this project will be documented in this file.
We follow the [Semantic Versioning 2.0.0](http://semver.org/) format.


## v4.1.0.0 - 2023-01-30 - [PR#806](https://github.com/NOAA-OWP/inundation-mapping/pull/806)

As we move to Amazon Web Service, AWS, we need to change our processing system. Currently, it is `gms_pipeline.sh` using bash "parallel" as an iterator which then first processes all HUCs, but not their branches. One of `gms_pipeline.sh`'s next steps is to do branch processing which is again iterated via "parallel". AKA. Units processed as one step, branches processed as second independent step.

**Note:** While we are taking steps to move to AWS, we will continue to maintain the ability of doing all processing on a single server using a single docker container as we have for a long time. Moving to AWS is simply taking portions of code from FIM and adding it to AWS tools for performance of large scale production runs.

Our new processing system, starting with this PR, is to allow each HUC to process it's own branches.

A further requirement was to split up the overall processing flow to independent steps, with each step being able to process itself without relying on "export" variables from other files. Note: There are still a few exceptions. The basic flow now becomes
- `fim_pre_processing.sh`,
- one or more calls to `fim_process_unit_wb.sh` (calling this file for each single HUC to be processed).
- followed by a call to `fim_post_processing.sh`.


Note: This is a very large, complex PR with alot of critical details. Please read the details at [PR 806](https://github.com/NOAA-OWP/inundation-mapping/pull/806).

### CRITICAL NOTE
The new `fim_pipeline.sh` and by proxy `fim_pre_processing.sh` has two new key input args, one named **-jh** (job HUCs) and one named **-jb** (job branches). You can assign the number of cores/CPU's are used for processing a HUC versus the number of branches. For the -jh number arg, it only is used against the `fim_pipeline.sh` file when it is processing more than one HUC or a list of HUCs as it is the iterator for HUCs. The -jb flag says how many cores/CPU's can be used when processing branches (note.. the average HUC has 26 branches).

BUT.... you have to be careful not to overload your system. **You need to multiply the -jh and the -jb values together, but only when using the `fim_pipeline.sh` script.** Why? _If you have 16 CPU's available on your machine, and you assign -jh as 10 and -jb as 26, you are actually asking for 126 cores (10 x 26) but your machine only has 16 cores._ If you are not using `fim_pipeline.sh` but using the three processing steps independently, then the -jh value has not need to be anything but the number of 1 as each actual HUC can only be processed one at a time. (aka.. no iterator).
</br>

### Additions

- `fim_pipeline.sh` : The wrapper for the three new major "FIM" processing steps. This script allows processing in one command, same as the current tool of `gms_pipeline.sh`.
- `fim_pre_processing.sh`: This file handles all argument input from the user, validates those inputs and sets up or cleans up folders. It also includes a new system of taking most input parameters and some key enviro variables and writing them out to a files called `runtime_args.env`. Future processing steps need minimal input arguments as it can read most values it needs from this new `runtime_args.env`. This allows the three major steps to work independently from each other. Someone can now come in, run `fim_pre_processing.sh`, then run `fim_process_unit_wb.sh`, each with one HUC, as many time as they like, each adding just its own HUC folder to the output runtime folder.
- `fim_post_processing.sh`: Scans all HUC folders inside the runtime folders to handle a number of processing steps which include (to name a few):
- aggregating errors
- aggregating to create a single list (gms_inputs.csv) for all valid HUCs and their branch ids
- usgs gage aggregation
- adjustments to SRV's
- and more
- `fim_process_unit_wb.sh`: Accepts only input args of runName and HUC number. It then sets up global variable, folders, etc to process just the one HUC. The logic for processing the HUC is in `run_unit_wb.sh` but managed by this `fim_process_unit_wb.sh` file including all error trapping.
- `src`
- `aggregate_branch_lists.py`: When each HUC is being processed, it creates it's own .csv file with its branch id's. In post processing we need one master csv list and this file aggregates them. Note: This is a similar file already in the `src/gms` folder but that version operates a bit different and will be deprecated soon.
- `generate_branch_list.py`: This creates the single .lst for a HUC defining each branch id. With this list, `run_unit_wb.sh` can do a parallelized iteration over each of its branches for processing. Note: This is also similar to the current `src/gms` file of the same name and the gms folder version will also be deprecated soon.
- `generate_branch_list_csv.py`. As each branch, including branch zero, has processed and if it was successful, it will add to a .csv list in the HUC directory. At the end, it becomes a list of all successful branches. This file will be aggregates with all similar .csv in post processing for future processing.
- `run_unit_wb.sh`: The actual HUC processing logic. Note: This is fundamentally the same as the current HUC processing logic that exists currently in `src/gms/run_by_unit.sh`, which will be removed in the very near future. However, at the end of this file, it creates and manages a parallelized iterator for processing each of it's branches.
- `process_branch.sh`: Same concept as `process_unit_wb.sh` but this one is for processing a single branch. This file manages the true branch processing file of `src/gms/run_by_branch.sh`. It is a wrapper file to `src/gms/run_by_branch.sh` and catches all error and copies error files as applicable. This allows the parent processing files to continue despite branch errors. Both the new fim processing system and the older gms processing system currently share the branch processing file of `src/gms/run_by_branch.sh`. When the gms processing file is removed, this file will likely not change, only moved one directory up and be no longer in the `gms` sub-folder.
- `unit_tests`
- `aggregate_branch_lists_unittests.py' and `aggregate_branch_lists_params.json` (based on the newer `src` directory edition of `aggregate_branch_lists.py`).
- `generate_branch_list_unittest.py` and `generate_branch_list_params.json` (based on the newer `src` directory edition of `generate_branch_list.py`).
- `generate_branch_list_csv_unittest.py` and `generate_branch_list_csv_params.json`

### Changes

- `config`
- `params_template.env`: Removed the `default_max_jobs` value and moved the `startDiv` and `stopDiv` to the `bash_variables.env` file.
- `deny_gms_unit.lst` : Renamed from `deny_gms_unit_prod.lst`
- `deny_gms_branches.lst` : Renamed from `deny_gms_branches_prod.lst`

- `gms_pipeline.sh`, `gms_run_branch.sh`, `gms_run_unit.sh`, and `gms_post_processing.sh` : Changed to hardcode the `default_max_jobs` to the value of 1. (we don't want this to be changed at all). They were also changed for minor adjustments for the `deny` list files names.

- `src`
- `bash_functions.env`: Fix error with calculating durations.
- `bash_variables.env`: Adds the two export lines (stopDiv and startDiv) from `params_template.env`
- `clip_vectors_to_wbd.py`: Cleaned up some print statements for better output traceability.
- `check_huc_inputs.py`: Added logic to ensure the file was an .lst file. Other file formats were not be handled correctly.
- `gms`
- `delineate_hydros_and_produce_HAND.sh`: Removed all `stopDiv` variable to reduce log and screen output.
- `run_by_branch.sh`: Removed an unnecessary test for overriding outputs.

### Removed

- `config`
- `deny_gms_branches_dev.lst`

<br/><br/>


## v4.0.19.5 - 2023-01-24 - [PR#801](https://github.com/NOAA-OWP/inundation-mapping/pull/801)

When running tools/test_case_by_hydroid.py, it throws an error of local variable 'stats' referenced before assignment.
Expand Down
Loading