REAM-lab · staadecker · Dec 30, 2021 · Dec 30, 2021 · Dec 30, 2021 · Jan 24, 2022
diff --git a/docs/Numerical Issues.md b/docs/Numerical Issues.md
@@ -223,7 +223,7 @@ Other than scaling, some techniques to resolve numerical issues are:
 
 - Avoiding unnecessarily large penalty terms
 
-- Changing the solver's method
+- Pick a slower but more robust solver's method
 
 - Loosening tolerances (at the risk of getting less accurate, or inaccurate results)
 

diff --git a/docs/Performance.md b/docs/Performance.md
@@ -1,18 +1,97 @@
 # Performance
 
-Memory use and solve time are two important factors that we try to keep to a minimum in our models. There are multiple
-things one can do to improve performance.
+## Introduction
 
-## Solving methods
+This document gives suggestions on how to improve performance and explains important concepts on the topic.
+Note that when talking about performance, I am talking about both memory use and solve time.
 
-By far the biggest factor that impacts performance is the method used by Gurobi. The fastest method is barrier solve
-without crossover (use `--recommended-fast`)
-however this method often returns a suboptimal solution. The next fastest is barrier solve followed by crossover and
-simplex (use `--recommended`) which almost always works. In some cases barrier solve encounters numerical issues (
+## Tips for improving performance
+
+### Pick an appropriate solving method
+
+By far the biggest factor that impacts performance is the solving method used by Gurobi. 
+
+The fastest method is barrier solve without crossover (use `--recommended-fast`). 
+Note that this method returns a solution that is optimal within a tolerance rather than *the* absolute optimal solution 
+(in my opinion this is not an issue, since the tolerance is small enough for all practical purposes).
+
+Sometimes, the barrier solve without crossover can't find the optimal solution. 
+The next fastest is barrier solve followed by crossover and simplex (use `--recommended`) 
+which almost always works. In some cases, barrier solve encounters numerical issues (
 see [`Numerical Issues.md`](./Numerical%20Issues.md))
 in which case the slower Simplex method must be used (`--solver-options-string method=1`).
 
-## Solver interface
+### Avoid unnecessary data
+
+The second largest way to improve performance is to reduce the size of the model.
+Surprisingly, this is often possible without comprising on the quality of results.
+Here are some ways to reduce model size without changing results.
+
+1. Don't include projects that get retired before the first modelled period. I.e. pre-filter
+  your input data to remove projects that get retired by the model rather than letting
+  the model retire them.
+
+2. If your model has a zero-emissions constraint, filter-out projects that emit CO2 ahead of time
+  since these projects can't contribute to the grid in any case.
+
+3. Consider excluding or reducing the set of projects that never get used. For example, if you
+  are modelling concentrated solar power (CSP) but find it is never being used across all your
+  scenarios (e.g. it's too expensive compared to PV), then consider modelling only one CSP per load
+  zone.
+
+Note that although Gurobi has a pre-solve function that automatically excludes unnecessary
+variables or constraints from the model, there is still a big overhead in Pyomo/SWITCH to load
+all the unnecessary data into Gurobi. I've also found Gurobi to behave differently when removing
+unnecessary projects, despite pre-solve. Hence, it's important to pre-filter out unnecessary projects.
+
+There are also ways to reduce the model size that would affect results. This decision is always
+a tradeoff between resolution/accuracy and performance.
+
+- Model fewer periods (e.g. only 2050 rather than 2030, 2040, 2050)
+
+- Model at lower timepoint resolutions (e.g. every 4 hours instead of every hour of the year).
+
+- Don't model all 365 days of the year.
+
+### Avoid numerical issues
+
+Numerical issues can slow down your solving or prevent you from using faster solving algorithms.
+Read [`Numerical Issues.md`](/docs/Numerical%20Issues.md) to find out whether
+you are having numerical issues and how to solve them.
+
+### Formulate the model carefully
+
+The way the model is formulated sometimes has an impact on performance. Here are some rules of thumb.
+
+- For constraints, it is faster to use `<=` or `>=` rather than `==` when possible. If your constraint
+  is an equality, try to think about whether it is already being pushed against one of the bounds
+  by the objective function.
+
+
+### Finding your own performance improvements!
+
+You can find other performance improvements by using these tools that help you understand
+where memory and time is being spent. These tools are also helpful in measuring whether a 
+change has resulted in a noticeable performance improvement.
+
+- [Memory profiler](https://pypi.org/project/memory-profiler/) for generating plots of the memory
+  use over time. Use `mprof run --interval 60 --multiprocess switch solve ...` and once solving is done
+  run `mprof plot -o profile.png` to make the plot.
+
+- [Fil Profiler](https://pypi.org/project/filprofiler/) is an amazing tool for seeing which parts of the code are
+  using up memory during peak memory usage.
+
+- Using `switch_model.utilities.StepTimer` to measure how long certain code blocks take to run. See examples
+  throughout the code.
+
+## Important concepts
+
+In this section I discuss concepts that are relevant to performance. Although
+we have not found them to lead to any direct performance improvements, it's
+important to understand them as they could provide future avenues to performance
+improvement.
+
+### Solver interfaces
 
 Solver interfaces are how Pyomo communicates with Gurobi (or any solver).
 
@@ -36,7 +115,7 @@ while Gurobi is solving and Pyomo is idle, the operating system can automaticall
 to [virtual memory](https://serverfault.com/questions/48486/what-is-swap-memory)
 which will free up more memory for Gurobi.
 
-## Warm starting
+### Warm starting
 
 Warm starting is the act of using a solution from a previous similar model to start the solver closer to your expected
 solution. Theoretically this can help performance however in practice there are several limitations. For this section, *
@@ -46,7 +125,7 @@ Current solution* refers to the solution you are trying to find while using the
 - To warm start a model use `switch solve --warm-start <path_to_previous_solution>`.
 
 - Warm starting only works if the previous solution does not break any constraints of the current solution. This usually
-  only happens if a) the model has the exact same set of variables b)
+  only happens if a) the model has the exact same set of variables, AND b)
   the previous solution was "harder" (e.g. it had more constraints to satisfy).
 
 - Warm starting always uses the slower Simplex method. This means unless you expect the previous solution and current
@@ -61,22 +140,4 @@ Current solution* refers to the solution you are trying to find while using the
 - `--save-warm-start` and `--warm-start` both use an extension of the `gurobi_direct` solver interface which is
   generally slower than the `gurobi` solver interface (see section above).
 
-## Model formulation
 
-The way the model is formulated often has an impact on performance. Here are some rules of thumb.
-
-- For constraints, it is faster to use `<=` and `>=` rather than `==` when possible. If your constraint
-should be an equality, try to think about whether it is already being pushed against one of the bounds
-  by the objective function.
-
-## Tools for improving performance
-
-- [Memory profiler](https://pypi.org/project/memory-profiler/) for generating plots of the memory
-use over time. Use `mprof run --interval 60 --multiprocess switch solve ...` and once solving is done
-  run `mprof plot -o profile.png` to make the plot.
-
-- [Fil Profiler](https://pypi.org/project/filprofiler/) is an amazing tool for seeing which parts of the code are
-using up memory during peak memory usage.
-
-- Using `switch_model.utilities.StepTimer` to measure how long certain code blocks take to run. See examples
-throughout the code.
diff --git a/switch_model/generators/core/build.py b/switch_model/generators/core/build.py
@@ -44,6 +44,20 @@
 dependencies = 'switch_model.timescales', 'switch_model.balancing.load_zones',\
     'switch_model.financials', 'switch_model.energy_sources.properties.properties'
 
+def period_cutoff_year(period_start, period_length):
+    """
+    Returns the year that is used to determine if a plant "makes it" into that period.
+    I.e. will a plant operate in that period based on when it comes online.
+    """
+    # Previously the code was just return period_start
+    # However using the midpoint of the period as the "cutoff" seems more correct so
+    # we've made the switch.
+    return period_start + 0.5 * period_length
+
+def is_plant_retired(year_comes_online, period_start, period_length, plant_lifetime):
+    """Returns True if the plant is retired at the given period."""
+    return year_comes_online + plant_lifetime <= period_cutoff_year(period_start, period_length)
+
 def define_components(mod):
     """
 
@@ -402,11 +416,9 @@ def gen_build_can_operate_in_period(m, g, build_year, period):
             online = m.period_start[build_year]
         else:
             online = build_year
-        retirement = online + m.gen_max_age[g]
-        # Previously the code read return online <= m.period_start[period] < retirement
-        # However using the midpoint of the period as the "cutoff" seems more correct so
-        # we've made the switch.
-        return online <= m.period_start[period] + 0.5 * m.period_length_years[period] < retirement
+        period_start = m.period_start[period]
+        period_length = m.period_length_years[period]
+        return online <= period_cutoff_year(period_start, period_length) and not is_plant_retired(online, period_start, period_length, m.gen_max_age[g])
 
     # This verifies that a predetermined build year doesn't conflict with a period since if that's the case
     # gen_build_can_operate_in_period will mistaken the prebuild for an investment build

diff --git a/switch_model/tools/drop.py b/switch_model/tools/drop.py
@@ -175,7 +175,7 @@ def get_valid_ids(primary_file, args):
     path = os.path.join(args.inputs_dir, filename)
 
     if not os.path.exists(path):
-        print("\n Warning: {} was not found.".format(filename))
+        warnings.warn("\n {} was not found.".format(filename))
         return None
 
     valid_ids = pandas.read_csv(path, dtype=str)[primary_key]
@@ -188,6 +188,7 @@ def drop_from_file(filename, foreign_key, valid_ids, args):
     if not os.path.exists(path):
         return 0
 
+    # dtype=str is necessary to ensure we don't interpret the types and accidently manipulate the data
     df = pandas.read_csv(path, dtype=str)
     count = len(df)
     if foreign_key not in df.columns:
@@ -202,7 +203,7 @@ def drop_from_file(filename, foreign_key, valid_ids, args):
             print("Removed {} rows {}.".format(rows_removed, filename))
         if rows_removed == count:
             if not args.silent:
-                print("WARNING: {} is now empty.".format(filename))
+                warnings.warn(" {} is now empty.".format(filename))
 
     return rows_removed
 

diff --git a/switch_model/tools/graph/main.py b/switch_model/tools/graph/main.py
@@ -604,7 +604,7 @@ def bar_label(self, filename=None):
         """
         ax = self.get_axes(filename=filename)
         for container in ax.containers:
-            ax.bar_label(container, fmt="%d", fontsize="x-small")
+            ax.bar_label(container, fmt="%.1f", fontsize="x-small")
 
     def get_figure(self, *args, **kwargs):
         # Create the figure

diff --git a/switch_model/tools/templates/config.yaml b/switch_model/tools/templates/config.yaml
@@ -54,6 +54,8 @@ post_process_steps: # The following post process steps will be run in order
 #  - derate_hydro
 #  - only_california
 #  - reserve_technologies
+#  - remove_emitting_plants_in_zero_emissions_model # Removes emitting plants from a zero emissions model
+  - remove_retired_plants # Removes plants that are retired before the first period.
 post_process_config:
 #   add_storage was used by Martin when studying LDES you likely don't need to uncomment these parameters
 #   add_storage:

diff --git a/switch_model/wecc/get_inputs/get_inputs.py b/switch_model/wecc/get_inputs/get_inputs.py
@@ -28,7 +28,7 @@ def write_csv_from_query(cursor, fname: str, headers: List[str], query: str):
     cursor.execute(query)
     data = cursor.fetchall()
     write_csv(data, fname, headers, log=False)
-    print(len(data))
+    print(f"(num rows: {len(data)})")
     if not data:
         warnings.warn(f"File {fname} is empty.")
 
@@ -275,7 +275,8 @@ def query_db(config, skip_cf):
         """
         SELECT
             name, reserves_area as balancing_area
-        FROM load_zone;""",
+        FROM load_zone
+        WHERE name != '_ALL_ZONES';""",
     )
 
     # Paty: in this version of switch this tables is named zone_coincident_peak_demand.csv
@@ -508,6 +509,7 @@ def query_db(config, skip_cf):
                 join generation_plant as t using(generation_plant_id)
                 JOIN temp_generation_plant_ids USING(generation_plant_id)
                 WHERE generation_plant_existing_and_planned_scenario_id={params.generation_plant_existing_and_planned_scenario_id}
+                ORDER BY 1, 2
                 ;
                 """,
     )
@@ -645,7 +647,7 @@ def query_db(config, skip_cf):
             JOIN switch.generation_plant USING(generation_plant_id)
             JOIN temp_generation_plant_ids USING(generation_plant_id)
         WHERE hydro_simple_scenario_id={params.hydro_simple_scenario_id}
-        ORDER BY 1;
+        ORDER BY 1, 2;
         """,
     )
 

diff --git a/switch_model/wecc/get_inputs/post_process_steps/add_storage.py b/switch_model/wecc/get_inputs/post_process_steps/add_storage.py
@@ -154,4 +154,4 @@ def post_process(config):
 
 
 if __name__ == "__main__":
-    main({})
+    post_process({})
diff --git a/switch_model/wecc/get_inputs/post_process_steps/only_california.py b/switch_model/wecc/get_inputs/post_process_steps/only_california.py
@@ -6,7 +6,7 @@
 
 @post_process_step(msg="Dropping all the zones outside of California")
 def post_process(_):
-    df = pd.read_csv("load_zones.csv", index_col=False)
+    df = pd.read_csv("load_zones.csv", index_col=False, dtype=str)
     df = df[df["LOAD_ZONE"].str.startswith("CA_")]
     df.to_csv("load_zones.csv", index=False)
     drop(["--silent", "--no-confirm", "--run", "--inputs-dir", "."])
diff --git a/...odel/wecc/get_inputs/post_process_steps/remove_emitting_plants_in_zero_emissions_model.py b/...odel/wecc/get_inputs/post_process_steps/remove_emitting_plants_in_zero_emissions_model.py
@@ -0,0 +1,28 @@
+import pandas as pd
+
+from switch_model.wecc.get_inputs.register_post_process import post_process_step
+from switch_model.tools.drop import main as drop
+
+@post_process_step(msg="Removing emitting plants if there's a zero-emissions constraint")
+def post_process(_):
+    # Check that there's a no emissions constraint on all the years.
+    carbon_policies = pd.read_csv("carbon_policies.csv", index_col=False, na_values=".")
+    if not (carbon_policies["carbon_cap_tco2_per_yr"] == 0).all():
+        return
+
+    # If so, find the list of emitting fuels.
+    fuels = pd.read_csv("fuels.csv", index_col=False, na_values=".")
+    fuels = fuels[fuels["co2_intensity"] != 0]
+
+    # Now remove the projects that use that fuel.
+    gen_proj = pd.read_csv("generation_projects_info.csv", index_col=False, na_values=".", dtype=str)
+    l1 = len(gen_proj)
+    gen_proj = gen_proj[~gen_proj["gen_energy_source"].isin(fuels["fuel"])]
+    l2 = len(gen_proj)
+    # We've made sure to not cast the types of any inputs by using dtype=str
+    gen_proj.to_csv("generation_projects_info.csv", index=False, na_rep=".")
+
+    # Now remove references to that project
+    drop(["--silent", "--no-confirm", "--run", "--inputs-dir", "."])
+
+    return f"Removed {l1 - l2} projects"
diff --git a/switch_model/wecc/get_inputs/post_process_steps/remove_retired_plants.py b/switch_model/wecc/get_inputs/post_process_steps/remove_retired_plants.py
@@ -0,0 +1,64 @@
+import pandas as pd
+
+from switch_model.generators.core.build import is_plant_retired
+from switch_model.wecc.get_inputs.register_post_process import post_process_step
+from switch_model.tools.drop import main as drop
+
+
+@post_process_step(msg="Removing plants that are retired before the first period")
+def post_process(_):
+    # Get the first period
+    # Note we get the first period based on the order in periods.csv which is not ideal
+    periods = pd.read_csv("periods.csv", index_col=False)
+    first_period = periods.iloc[0]
+    period_start = first_period.period_start
+    # Note the period_length could be (end - start + 1) or just (end - start).
+    # This depends on how the period was specified.
+    # To be conservative we use the version without the +1 so that the cutoff for the time
+    # to make it into the period is sooner. This means less plants are retired by the cutoffs
+    # and more plants make it into the period.
+    # And like that we don't accidentally remove a non-retired plant.
+    period_length = first_period.period_end - first_period.period_start
+
+    # Get the last build year for each plant.
+    build_yrs = pd.read_csv(
+        "gen_build_costs.csv",
+        index_col=False,
+        na_values=".",
+        dtype={"GENERATION_PROJECT": str},
+    )
+    build_yrs = build_yrs.groupby("GENERATION_PROJECT", as_index=False).build_year.max()
+
+    # Find if each project is retired
+    def is_retired(row):
+        return is_plant_retired(float(row.build_year), period_start, period_length, float(row.gen_max_age))
+    # We need to set dtype=str to ensure we don't start casting values to float types
+    # unexpectedly. This means that the is_retired() function above needs to cast the wanted
+    # values to floats.
+    proj = pd.read_csv(
+        "generation_projects_info.csv",
+        index_col=False,
+        na_values=".",
+        dtype=str,
+    )
+    columns = proj.columns
+    proj = proj.merge(build_yrs, on="GENERATION_PROJECT")
+    proj["is_retired"] = proj.apply(is_retired, axis=1)
+
+    # Filter out retired projects
+    l1 = len(proj)
+    proj = proj[~proj["is_retired"]]
+    l2 = len(proj)
+    # print(f"Dropped {l1 - l2} projects.")
+
+    # Write csv
+    proj[columns].to_csv("generation_projects_info.csv", index=False, na_rep=".")
+
+    # Now remove references to those projects
+    drop(["--silent", "--no-confirm", "--run", "--inputs-dir", "."])
+
+    return f"Removed {l1 - l2} projects"
+
+
+
+
diff --git a/switch_model/wecc/get_inputs/register_post_process.py b/switch_model/wecc/get_inputs/register_post_process.py
@@ -21,8 +21,9 @@ def wrapper(*args, **kwargs):
             if message is None:
                 message = f"Running {func.__name__}"
             print(f"\t{message}...")
-            func(*args, **kwargs)
-
+            result = func(*args, **kwargs)
+            if result is not None:
+                print(f"\t\t{result}")
         return wrapper
 
     return decorator
Original file line number	Diff line number	Diff line change
Expand Up		@@ -154,4 +154,4 @@ def post_process(config):


		if __name__ == "__main__":
		main({})
		post_process({})