NOAA-OWP · CarsonPruitt-NOAA · Feb 13, 2023 · Jan 30, 2023 · Jan 30, 2023 · Jan 30, 2023
diff --git a/README.md b/README.md
@@ -36,9 +36,10 @@ aws s3 ls s3://noaa-nws-owp-fim/ --request-payer requester
 
 Download a directory of outputs for a HUC8:
 ```
-aws s3 cp --recursive s3://noaa-nws-owp-fim/hand_fim/fim_3_0_34_1/outputs/fr/12090301 12090301 --request-payer requester
+aws s3 cp --recursive s3://noaa-nws-owp-fim/hand_fim/outputs/fim_4_0_18_02/12090301 /your_local_folder_name/12090301 --request-payer requester
 ```
-**Note**: There may be newer editions than fim_3_0_34_1, and it is recommended to adjust the command above for the latest version.
+By adjusting pathing, you can also download entire directories such as the fim_4_0_18_0 folder.
+**Note**: There may be newer editions than fim_4_0_18_0, and it is recommended to adjust the command above for the latest version.
 
 
 ## Running the Code
@@ -60,32 +61,34 @@ Input data can be found on the ESIP S3 Bucket (see "Accessing Data through ESIP
 This software is configurable via parameters found in the `config` directory. Copy files before editing and remove "template" pattern from the filename.
 Make sure to set the config folder group to 'fim' recursively using the chown command. Each development version will include a calibrated parameter set of manning’s n values.
 - `params_template.env`
-- `mannings_default.json`
-    - must change filepath in `params_template.env` in `manning_n` variable name
 
 This system has an optional tool called the `calibration database tool`. In order to use this system, you will need to install the calibration database service or disable it in the `params_template.env` file. See [calibration tool README](https://github.com/NOAA-OWP/inundation-mapping/blob/dev/tools/calibration-db/README.md) for more details.
 
 ### Produce HAND Hydrofabric
 ```
-gms_pipeline.sh -u <huc8> -n <name_your_run>
+fim_pipeline.sh -u <huc8> -n <name_your_run>
 ```
-- There are a wide number of options and defaulted values, for details run ```gms_pipeline.sh -h```
+- There are a wide number of options and defaulted values, for details run ```fim_pipeline.sh -h```.
 - Manditory arguments:
-    - `-u` can be a single huc, a series passed in quotes space delimited, or a line-delimited file
-    i. To run entire domain of available data use one of the ```/data/inputs/included_huc[4,6,8].lst``` files or a huc list file of your choice.
+    - `-u` can be a single huc, a series passed in quotes space delimited, or a line-delimited (.lst) file. To run the entire domain of available data use one of the ```/data/inputs/included_huc8.lst``` files or a HUC list file of your choice.  Depending on the performance of your server, especially the number of CPU cores, running the full domain can take multiple days.
     - `-n` is a name of your run (only alphanumeric)
-- Outputs can be found under ```/data/outputs/<name_your_run>```
+- Outputs can be found under ```/data/outputs/<name_your_run>```.
 
-Processing of HUC's in FIM4 (GMS) comes in two pieces: gms_run_unit and gms_run_branch. `gms_pipeline.sh` above takes care of both steps however, you can run each part seperately for faster development if you like.
+Processing of HUC's in FIM4 comes in three pieces. You can run `fim_pipeline.sh` which automatically runs all of three major section, but you can run each of the sections independently if you like. The three sections are:
+- `fim_pre_processing.sh` : This section must be run first as it creates the basic output folder for the run. It also creates a number of key files and folders for the next two sections. 
+- `fim_process_unit_wb.sh` : This script processes one and exactly one HUC8 plus all of it's related branches. While it can only process one, you can run this script multiple times, each with different HUC (or overwriting a HUC). When you run `fim_pipeline.sh`, it automatically iterates when more than one HUC number has been supplied either by command line arguments or via a HUC list. For each HUC provided, `fim_pipeline.sh` will `fim_process_unit_wb.sh`. Using the `fim_process_unit_wb.sh`  script allows for a run / rerun of a HUC, or running other HUCs at different times / days or even different docker containers.
+- `fim_post_processing.sh` : This section takes all of the HUCs that have been processed, aggregates key information from each HUC directory and looks for errors across all HUC folders. It also processes the group in sub-steps such as usgs guages processesing, rating curve adjustments and more. Naturally, running or re-running this script can only be done after running `fim_pre_processing.sh` and at least one run of `fim_process_unit_wb.sh`.
 
-If you choose to do the two step hydrofabric creation, then run `gms_run_unit.sh`, then `gms_run_branch.sh`. See each of those files for details on arguments.
+Running the `fim_pipeline.sh` is a quicker process than running all three steps independently.
 
 ### Testing in Other HUCs
 To test in HUCs other than the provided HUCs, the following processes can be followed to acquire and preprocess additional NHDPlus rasters and vectors. After these steps are run, the "Produce HAND Hydrofabric" step can be run for the new HUCs.
 
 ```
 /foss_fim/src/acquire_and_preprocess_inputs.py -u <huc4s_to_process>
 ```
+    Note: This tool is deprecated, updates will be coming soon.
+
 - `-u` can be a single HUC4, series of HUC4s (e.g. 1209 1210), path to line-delimited file with HUC4s.
 - Please run `/foss_fim/src/acquire_and_preprocess_inputs.py --help` for more information.
 - See United States Geological Survey (USGS) National Hydrography Dataset Plus High Resolution (NHDPlusHR) [site](https://www.usgs.gov/core-science-systems/ngp/national-hydrography/nhdplus-high-resolution) for more information
@@ -94,10 +97,11 @@ To test in HUCs other than the provided HUCs, the following processes can be fol
 ```
 /foss_fim/src/preprocess_rasters.py
 ```
+    Note: This tool is deprecated, updates will be coming soon.
 
 ----
 ### Evaluating Inundation Map Performance
-After `gms_pipeline.sh` completes, you can evaluate the model's skill. The evaluation benchmark datasets are available through ESIP in the `test_cases` directory.
+After `fim_pipeline.sh` completes, or combinations of the three major steps described above, you can evaluate the model's skill. The evaluation benchmark datasets are available through ESIP in the `test_cases` directory.
 
 To evaluate model skill, run the following:
 ```

diff --git a/config/deny_gms_branches_prod.lst → config/deny_gms_branches.lst b/config/deny_gms_branches_prod.lst → config/deny_gms_branches.lst
diff --git a/config/deny_gms_branches_dev.lst b/config/deny_gms_branches_dev.lst
diff --git a/config/deny_gms_unit_prod.lst → config/deny_gms_unit.lst b/config/deny_gms_unit_prod.lst → config/deny_gms_unit.lst
diff --git a/config/params_template.env b/config/params_template.env
@@ -51,9 +51,4 @@ export CALB_DB_KEYS_FILE="/data/config/calb_db_keys.env"
 #### computational parameters ####
 export ncores_gw=1 # mpi number of cores for gagewatershed
 export ncores_fd=1 # mpi number of cores for flow directions
-export default_max_jobs=1 # default number of max concurrent jobs to run
 export memfree=0G # min free memory required to start a new job or keep youngest job alive
-
-#### logging parameters ####
-export startDiv="\n##########################################################################\n"
-export stopDiv="\n##########################################################################"
diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md
@@ -1,6 +1,77 @@
 All notable changes to this project will be documented in this file.
 We follow the [Semantic Versioning 2.0.0](http://semver.org/) format.
 
+
+## v4.1.0.0 - 2023-01-30 - [PR#806](https://github.com/NOAA-OWP/inundation-mapping/pull/806)
+
+As we move to Amazon Web Service, AWS, we need to change our processing system. Currently, it is `gms_pipeline.sh` using bash "parallel" as an iterator which then first processes all HUCs, but not their branches. One of `gms_pipeline.sh`'s next steps is to do branch processing which is again iterated via "parallel". AKA. Units processed as one step, branches processed as second independent step.
+
+**Note:** While we are taking steps to move to AWS, we will continue to maintain the ability of doing all processing on a single server using a single docker container as we have for a long time. Moving to AWS is simply taking portions of code from FIM and adding it to AWS tools for performance of large scale production runs.
+
+Our new processing system, starting with this PR,  is to allow each HUC to process it's own branches.
+
+A further requirement was to split up the overall processing flow to independent steps, with each step being able to process itself without relying on "export" variables from other files. Note: There are still a few exceptions.  The basic flow now becomes
+- `fim_pre_processing.sh`, 
+- one or more calls to `fim_process_unit_wb.sh` (calling this file for each single HUC to be processed).
+- followed by a call to `fim_post_processing.sh`. 
+
+
+Note: This is a very large, complex PR with alot of critical details. Please read the details at [PR 806](https://github.com/NOAA-OWP/inundation-mapping/pull/806). 
+
+### CRITICAL NOTE
+The new `fim_pipeline.sh` and by proxy `fim_pre_processing.sh` has two new key input args, one named **-jh** (job HUCs) and one named **-jb** (job branches).  You can assign the number of cores/CPU's are used for processing a HUC versus the number of branches.  For the -jh number arg, it only is used against the `fim_pipeline.sh` file when it is processing more than one HUC or a list of HUCs as it is the iterator for HUCs.   The -jb flag says how many cores/CPU's can be used when processing branches (note.. the average HUC has 26 branches). 
+
+BUT.... you have to be careful not to overload your system.  **You need to multiply the -jh and the -jb values together, but only when using the `fim_pipeline.sh` script.**  Why? _If you have 16 CPU's available on your machine, and you assign -jh as 10 and -jb as 26, you are actually asking for 126 cores (10 x 26) but your machine only has 16 cores._   If you are not using `fim_pipeline.sh` but using the three processing steps independently, then the -jh value has not need to be anything but the number of 1 as each actual HUC can only be processed one at a time. (aka.. no iterator).
+</br>
+
+### Additions
+
+- `fim_pipeline.sh` :  The wrapper for the three new major "FIM" processing steps. This script allows processing in one command, same as the current tool of `gms_pipeline.sh`.
+- `fim_pre_processing.sh`: This file handles all argument input from the user, validates those inputs and sets up or cleans up folders. It also includes a new system of taking most input parameters and some key enviro variables and writing them out to a files called `runtime_args.env`.  Future processing steps need minimal input arguments as it can read most values it needs from this new `runtime_args.env`. This allows the three major steps to work independently from each other. Someone can now come in, run `fim_pre_processing.sh`, then run `fim_process_unit_wb.sh`, each with one HUC, as many time as they like, each adding just its own HUC folder to the output runtime folder. 
+- `fim_post_processing.sh`: Scans all HUC folders inside the runtime folders to handle a number of processing steps which include (to name a few): 
+    - aggregating errors
+    - aggregating to create a single list (gms_inputs.csv) for all valid HUCs and their branch ids
+    - usgs gage aggregation
+    - adjustments to SRV's
+    - and more    
+- `fim_process_unit_wb.sh`: Accepts only input args of runName and HUC number. It then sets up global variable, folders, etc to process just the one HUC. The logic for processing the HUC is in `run_unit_wb.sh` but managed by this `fim_process_unit_wb.sh` file including all error trapping.
+- `src`
+    - `aggregate_branch_lists.py`:  When each HUC is being processed, it creates it's own .csv file with its branch id's. In post processing we need one master csv list and this file aggregates them. Note: This is a similar file already in the `src/gms` folder but that version operates a bit different and will be deprecated soon.
+    - `generate_branch_list.py`: This creates the single .lst for a HUC defining each branch id. With this list, `run_unit_wb.sh` can do a parallelized iteration over each of its branches for processing. Note: This is also similar to the current `src/gms` file of the same name and the gms folder version will also be deprecated soon.
+    - `generate_branch_list_csv.py`. As each branch, including branch zero, has processed and if it was successful, it will add to a .csv list in the HUC directory. At the end, it becomes a list of all successful branches. This file will be aggregates with all similar .csv in post processing for future processing.
+    - `run_unit_wb.sh`:  The actual HUC processing logic. Note: This is fundamentally the same as the current HUC processing logic that exists currently in `src/gms/run_by_unit.sh`, which will be removed in the very near future. However, at the end of this file, it creates and manages a parallelized iterator for processing each of it's branches.
+    - `process_branch.sh`:  Same concept as `process_unit_wb.sh` but this one is for processing a single branch. This file manages the true branch processing file of `src/gms/run_by_branch.sh`.  It is a wrapper file to `src/gms/run_by_branch.sh` and catches all error and copies error files as applicable. This allows the parent processing files to continue despite branch errors. Both the new fim processing system and the older gms processing system currently share the branch processing file of `src/gms/run_by_branch.sh`. When the gms processing file is removed, this file will likely not change, only moved one directory up and be no longer in the `gms` sub-folder.
+- `unit_tests`
+    - `aggregate_branch_lists_unittests.py' and `aggregate_branch_lists_params.json`  (based on the newer `src` directory edition of `aggregate_branch_lists.py`).
+    - `generate_branch_list_unittest.py` and `generate_branch_list_params.json` (based on the newer `src` directory edition of `generate_branch_list.py`).
+    -  `generate_branch_list_csv_unittest.py` and `generate_branch_list_csv_params.json` 
+
+### Changes
+
+- `config`
+    - `params_template.env`: Removed the `default_max_jobs` value and moved the `startDiv` and `stopDiv` to the `bash_variables.env` file.
+    - `deny_gms_unit.lst` : Renamed from `deny_gms_unit_prod.lst`
+    - `deny_gms_branches.lst` : Renamed from `deny_gms_branches_prod.lst`
+
+- `gms_pipeline.sh`, `gms_run_branch.sh`, `gms_run_unit.sh`, and `gms_post_processing.sh` :  Changed to hardcode the `default_max_jobs` to the value of 1. (we don't want this to be changed at all). They were also changed for minor adjustments for the `deny` list files names.
+
+- `src`
+    - `bash_functions.env`: Fix error with calculating durations.
+    - `bash_variables.env`:  Adds the two export lines (stopDiv and startDiv) from `params_template.env`
+    - `clip_vectors_to_wbd.py`: Cleaned up some print statements for better output traceability.
+    - `check_huc_inputs.py`: Added logic to ensure the file was an .lst file. Other file formats were not be handled correctly.
+    - `gms`
+        - `delineate_hydros_and_produce_HAND.sh`: Removed all `stopDiv` variable to reduce log and screen output. 
+        - `run_by_branch.sh`: Removed an unnecessary test for overriding outputs.
+
+### Removed
+
+- `config`
+    - `deny_gms_branches_dev.lst`
+
+<br/><br/>
+
+
 ## v4.0.19.5 - 2023-01-24 - [PR#801](https://github.com/NOAA-OWP/inundation-mapping/pull/801)
 
 When running tools/test_case_by_hydroid.py, it throws an error of local variable 'stats' referenced before assignment.