diff --git a/docs/Numerical Issues.md b/docs/Numerical Issues.md index 6844366d7..5d4fbc659 100644 --- a/docs/Numerical Issues.md +++ b/docs/Numerical Issues.md @@ -223,7 +223,7 @@ Other than scaling, some techniques to resolve numerical issues are: - Avoiding unnecessarily large penalty terms -- Changing the solver's method +- Pick a slower but more robust solver's method - Loosening tolerances (at the risk of getting less accurate, or inaccurate results) diff --git a/docs/Performance.md b/docs/Performance.md index 738e68e43..1a716f594 100644 --- a/docs/Performance.md +++ b/docs/Performance.md @@ -1,18 +1,97 @@ # Performance -Memory use and solve time are two important factors that we try to keep to a minimum in our models. There are multiple -things one can do to improve performance. +## Introduction -## Solving methods +This document gives suggestions on how to improve performance and explains important concepts on the topic. +Note that when talking about performance, I am talking about both memory use and solve time. -By far the biggest factor that impacts performance is the method used by Gurobi. The fastest method is barrier solve -without crossover (use `--recommended-fast`) -however this method often returns a suboptimal solution. The next fastest is barrier solve followed by crossover and -simplex (use `--recommended`) which almost always works. In some cases barrier solve encounters numerical issues ( +## Tips for improving performance + +### Pick an appropriate solving method + +By far the biggest factor that impacts performance is the solving method used by Gurobi. + +The fastest method is barrier solve without crossover (use `--recommended-fast`). +Note that this method returns a solution that is optimal within a tolerance rather than *the* absolute optimal solution +(in my opinion this is not an issue, since the tolerance is small enough for all practical purposes). + +Sometimes, the barrier solve without crossover can't find the optimal solution. +The next fastest is barrier solve followed by crossover and simplex (use `--recommended`) +which almost always works. In some cases, barrier solve encounters numerical issues ( see [`Numerical Issues.md`](./Numerical%20Issues.md)) in which case the slower Simplex method must be used (`--solver-options-string method=1`). -## Solver interface +### Avoid unnecessary data + +The second largest way to improve performance is to reduce the size of the model. +Surprisingly, this is often possible without comprising on the quality of results. +Here are some ways to reduce model size without changing results. + +1. Don't include projects that get retired before the first modelled period. I.e. pre-filter + your input data to remove projects that get retired by the model rather than letting + the model retire them. + +2. If your model has a zero-emissions constraint, filter-out projects that emit CO2 ahead of time + since these projects can't contribute to the grid in any case. + +3. Consider excluding or reducing the set of projects that never get used. For example, if you + are modelling concentrated solar power (CSP) but find it is never being used across all your + scenarios (e.g. it's too expensive compared to PV), then consider modelling only one CSP per load + zone. + +Note that although Gurobi has a pre-solve function that automatically excludes unnecessary +variables or constraints from the model, there is still a big overhead in Pyomo/SWITCH to load +all the unnecessary data into Gurobi. I've also found Gurobi to behave differently when removing +unnecessary projects, despite pre-solve. Hence, it's important to pre-filter out unnecessary projects. + +There are also ways to reduce the model size that would affect results. This decision is always +a tradeoff between resolution/accuracy and performance. + +- Model fewer periods (e.g. only 2050 rather than 2030, 2040, 2050) + +- Model at lower timepoint resolutions (e.g. every 4 hours instead of every hour of the year). + +- Don't model all 365 days of the year. + +### Avoid numerical issues + +Numerical issues can slow down your solving or prevent you from using faster solving algorithms. +Read [`Numerical Issues.md`](/docs/Numerical%20Issues.md) to find out whether +you are having numerical issues and how to solve them. + +### Formulate the model carefully + +The way the model is formulated sometimes has an impact on performance. Here are some rules of thumb. + +- For constraints, it is faster to use `<=` or `>=` rather than `==` when possible. If your constraint + is an equality, try to think about whether it is already being pushed against one of the bounds + by the objective function. + + +### Finding your own performance improvements! + +You can find other performance improvements by using these tools that help you understand +where memory and time is being spent. These tools are also helpful in measuring whether a +change has resulted in a noticeable performance improvement. + +- [Memory profiler](https://pypi.org/project/memory-profiler/) for generating plots of the memory + use over time. Use `mprof run --interval 60 --multiprocess switch solve ...` and once solving is done + run `mprof plot -o profile.png` to make the plot. + +- [Fil Profiler](https://pypi.org/project/filprofiler/) is an amazing tool for seeing which parts of the code are + using up memory during peak memory usage. + +- Using `switch_model.utilities.StepTimer` to measure how long certain code blocks take to run. See examples + throughout the code. + +## Important concepts + +In this section I discuss concepts that are relevant to performance. Although +we have not found them to lead to any direct performance improvements, it's +important to understand them as they could provide future avenues to performance +improvement. + +### Solver interfaces Solver interfaces are how Pyomo communicates with Gurobi (or any solver). @@ -36,7 +115,7 @@ while Gurobi is solving and Pyomo is idle, the operating system can automaticall to [virtual memory](https://serverfault.com/questions/48486/what-is-swap-memory) which will free up more memory for Gurobi. -## Warm starting +### Warm starting Warm starting is the act of using a solution from a previous similar model to start the solver closer to your expected solution. Theoretically this can help performance however in practice there are several limitations. For this section, * @@ -46,7 +125,7 @@ Current solution* refers to the solution you are trying to find while using the - To warm start a model use `switch solve --warm-start `. - Warm starting only works if the previous solution does not break any constraints of the current solution. This usually - only happens if a) the model has the exact same set of variables b) + only happens if a) the model has the exact same set of variables, AND b) the previous solution was "harder" (e.g. it had more constraints to satisfy). - Warm starting always uses the slower Simplex method. This means unless you expect the previous solution and current @@ -61,22 +140,4 @@ Current solution* refers to the solution you are trying to find while using the - `--save-warm-start` and `--warm-start` both use an extension of the `gurobi_direct` solver interface which is generally slower than the `gurobi` solver interface (see section above). -## Model formulation -The way the model is formulated often has an impact on performance. Here are some rules of thumb. - -- For constraints, it is faster to use `<=` and `>=` rather than `==` when possible. If your constraint -should be an equality, try to think about whether it is already being pushed against one of the bounds - by the objective function. - -## Tools for improving performance - -- [Memory profiler](https://pypi.org/project/memory-profiler/) for generating plots of the memory -use over time. Use `mprof run --interval 60 --multiprocess switch solve ...` and once solving is done - run `mprof plot -o profile.png` to make the plot. - -- [Fil Profiler](https://pypi.org/project/filprofiler/) is an amazing tool for seeing which parts of the code are -using up memory during peak memory usage. - -- Using `switch_model.utilities.StepTimer` to measure how long certain code blocks take to run. See examples -throughout the code. \ No newline at end of file diff --git a/switch_model/generators/core/build.py b/switch_model/generators/core/build.py index 182a00bd6..3de2d47b7 100644 --- a/switch_model/generators/core/build.py +++ b/switch_model/generators/core/build.py @@ -44,6 +44,20 @@ dependencies = 'switch_model.timescales', 'switch_model.balancing.load_zones',\ 'switch_model.financials', 'switch_model.energy_sources.properties.properties' +def period_cutoff_year(period_start, period_length): + """ + Returns the year that is used to determine if a plant "makes it" into that period. + I.e. will a plant operate in that period based on when it comes online. + """ + # Previously the code was just return period_start + # However using the midpoint of the period as the "cutoff" seems more correct so + # we've made the switch. + return period_start + 0.5 * period_length + +def is_plant_retired(year_comes_online, period_start, period_length, plant_lifetime): + """Returns True if the plant is retired at the given period.""" + return year_comes_online + plant_lifetime <= period_cutoff_year(period_start, period_length) + def define_components(mod): """ @@ -402,11 +416,9 @@ def gen_build_can_operate_in_period(m, g, build_year, period): online = m.period_start[build_year] else: online = build_year - retirement = online + m.gen_max_age[g] - # Previously the code read return online <= m.period_start[period] < retirement - # However using the midpoint of the period as the "cutoff" seems more correct so - # we've made the switch. - return online <= m.period_start[period] + 0.5 * m.period_length_years[period] < retirement + period_start = m.period_start[period] + period_length = m.period_length_years[period] + return online <= period_cutoff_year(period_start, period_length) and not is_plant_retired(online, period_start, period_length, m.gen_max_age[g]) # This verifies that a predetermined build year doesn't conflict with a period since if that's the case # gen_build_can_operate_in_period will mistaken the prebuild for an investment build diff --git a/switch_model/tools/drop.py b/switch_model/tools/drop.py index 2c5ca1e39..fde7e25b9 100644 --- a/switch_model/tools/drop.py +++ b/switch_model/tools/drop.py @@ -175,7 +175,7 @@ def get_valid_ids(primary_file, args): path = os.path.join(args.inputs_dir, filename) if not os.path.exists(path): - print("\n Warning: {} was not found.".format(filename)) + warnings.warn("\n {} was not found.".format(filename)) return None valid_ids = pandas.read_csv(path, dtype=str)[primary_key] @@ -188,6 +188,7 @@ def drop_from_file(filename, foreign_key, valid_ids, args): if not os.path.exists(path): return 0 + # dtype=str is necessary to ensure we don't interpret the types and accidently manipulate the data df = pandas.read_csv(path, dtype=str) count = len(df) if foreign_key not in df.columns: @@ -202,7 +203,7 @@ def drop_from_file(filename, foreign_key, valid_ids, args): print("Removed {} rows {}.".format(rows_removed, filename)) if rows_removed == count: if not args.silent: - print("WARNING: {} is now empty.".format(filename)) + warnings.warn(" {} is now empty.".format(filename)) return rows_removed diff --git a/switch_model/tools/graph/main.py b/switch_model/tools/graph/main.py index de5bf618a..ac4a4b66a 100644 --- a/switch_model/tools/graph/main.py +++ b/switch_model/tools/graph/main.py @@ -604,7 +604,7 @@ def bar_label(self, filename=None): """ ax = self.get_axes(filename=filename) for container in ax.containers: - ax.bar_label(container, fmt="%d", fontsize="x-small") + ax.bar_label(container, fmt="%.1f", fontsize="x-small") def get_figure(self, *args, **kwargs): # Create the figure diff --git a/switch_model/tools/templates/config.yaml b/switch_model/tools/templates/config.yaml index fea534248..8baf3bebd 100644 --- a/switch_model/tools/templates/config.yaml +++ b/switch_model/tools/templates/config.yaml @@ -54,6 +54,8 @@ post_process_steps: # The following post process steps will be run in order # - derate_hydro # - only_california # - reserve_technologies +# - remove_emitting_plants_in_zero_emissions_model # Removes emitting plants from a zero emissions model + - remove_retired_plants # Removes plants that are retired before the first period. post_process_config: # add_storage was used by Martin when studying LDES you likely don't need to uncomment these parameters # add_storage: diff --git a/switch_model/wecc/get_inputs/get_inputs.py b/switch_model/wecc/get_inputs/get_inputs.py index 49c586256..235c6b9b6 100755 --- a/switch_model/wecc/get_inputs/get_inputs.py +++ b/switch_model/wecc/get_inputs/get_inputs.py @@ -28,7 +28,7 @@ def write_csv_from_query(cursor, fname: str, headers: List[str], query: str): cursor.execute(query) data = cursor.fetchall() write_csv(data, fname, headers, log=False) - print(len(data)) + print(f"(num rows: {len(data)})") if not data: warnings.warn(f"File {fname} is empty.") @@ -275,7 +275,8 @@ def query_db(config, skip_cf): """ SELECT name, reserves_area as balancing_area - FROM load_zone;""", + FROM load_zone + WHERE name != '_ALL_ZONES';""", ) # Paty: in this version of switch this tables is named zone_coincident_peak_demand.csv @@ -508,6 +509,7 @@ def query_db(config, skip_cf): join generation_plant as t using(generation_plant_id) JOIN temp_generation_plant_ids USING(generation_plant_id) WHERE generation_plant_existing_and_planned_scenario_id={params.generation_plant_existing_and_planned_scenario_id} + ORDER BY 1, 2 ; """, ) @@ -645,7 +647,7 @@ def query_db(config, skip_cf): JOIN switch.generation_plant USING(generation_plant_id) JOIN temp_generation_plant_ids USING(generation_plant_id) WHERE hydro_simple_scenario_id={params.hydro_simple_scenario_id} - ORDER BY 1; + ORDER BY 1, 2; """, ) diff --git a/switch_model/wecc/get_inputs/post_process_steps/add_storage.py b/switch_model/wecc/get_inputs/post_process_steps/add_storage.py index 428058867..45f95261f 100644 --- a/switch_model/wecc/get_inputs/post_process_steps/add_storage.py +++ b/switch_model/wecc/get_inputs/post_process_steps/add_storage.py @@ -154,4 +154,4 @@ def post_process(config): if __name__ == "__main__": - main({}) + post_process({}) diff --git a/switch_model/wecc/get_inputs/post_process_steps/only_california.py b/switch_model/wecc/get_inputs/post_process_steps/only_california.py index b0d4c2eb3..5476dc26f 100644 --- a/switch_model/wecc/get_inputs/post_process_steps/only_california.py +++ b/switch_model/wecc/get_inputs/post_process_steps/only_california.py @@ -6,7 +6,7 @@ @post_process_step(msg="Dropping all the zones outside of California") def post_process(_): - df = pd.read_csv("load_zones.csv", index_col=False) + df = pd.read_csv("load_zones.csv", index_col=False, dtype=str) df = df[df["LOAD_ZONE"].str.startswith("CA_")] df.to_csv("load_zones.csv", index=False) drop(["--silent", "--no-confirm", "--run", "--inputs-dir", "."]) diff --git a/switch_model/wecc/get_inputs/post_process_steps/remove_emitting_plants_in_zero_emissions_model.py b/switch_model/wecc/get_inputs/post_process_steps/remove_emitting_plants_in_zero_emissions_model.py new file mode 100644 index 000000000..e2c5f93ea --- /dev/null +++ b/switch_model/wecc/get_inputs/post_process_steps/remove_emitting_plants_in_zero_emissions_model.py @@ -0,0 +1,28 @@ +import pandas as pd + +from switch_model.wecc.get_inputs.register_post_process import post_process_step +from switch_model.tools.drop import main as drop + +@post_process_step(msg="Removing emitting plants if there's a zero-emissions constraint") +def post_process(_): + # Check that there's a no emissions constraint on all the years. + carbon_policies = pd.read_csv("carbon_policies.csv", index_col=False, na_values=".") + if not (carbon_policies["carbon_cap_tco2_per_yr"] == 0).all(): + return + + # If so, find the list of emitting fuels. + fuels = pd.read_csv("fuels.csv", index_col=False, na_values=".") + fuels = fuels[fuels["co2_intensity"] != 0] + + # Now remove the projects that use that fuel. + gen_proj = pd.read_csv("generation_projects_info.csv", index_col=False, na_values=".", dtype=str) + l1 = len(gen_proj) + gen_proj = gen_proj[~gen_proj["gen_energy_source"].isin(fuels["fuel"])] + l2 = len(gen_proj) + # We've made sure to not cast the types of any inputs by using dtype=str + gen_proj.to_csv("generation_projects_info.csv", index=False, na_rep=".") + + # Now remove references to that project + drop(["--silent", "--no-confirm", "--run", "--inputs-dir", "."]) + + return f"Removed {l1 - l2} projects" \ No newline at end of file diff --git a/switch_model/wecc/get_inputs/post_process_steps/remove_retired_plants.py b/switch_model/wecc/get_inputs/post_process_steps/remove_retired_plants.py new file mode 100644 index 000000000..cf693946b --- /dev/null +++ b/switch_model/wecc/get_inputs/post_process_steps/remove_retired_plants.py @@ -0,0 +1,64 @@ +import pandas as pd + +from switch_model.generators.core.build import is_plant_retired +from switch_model.wecc.get_inputs.register_post_process import post_process_step +from switch_model.tools.drop import main as drop + + +@post_process_step(msg="Removing plants that are retired before the first period") +def post_process(_): + # Get the first period + # Note we get the first period based on the order in periods.csv which is not ideal + periods = pd.read_csv("periods.csv", index_col=False) + first_period = periods.iloc[0] + period_start = first_period.period_start + # Note the period_length could be (end - start + 1) or just (end - start). + # This depends on how the period was specified. + # To be conservative we use the version without the +1 so that the cutoff for the time + # to make it into the period is sooner. This means less plants are retired by the cutoffs + # and more plants make it into the period. + # And like that we don't accidentally remove a non-retired plant. + period_length = first_period.period_end - first_period.period_start + + # Get the last build year for each plant. + build_yrs = pd.read_csv( + "gen_build_costs.csv", + index_col=False, + na_values=".", + dtype={"GENERATION_PROJECT": str}, + ) + build_yrs = build_yrs.groupby("GENERATION_PROJECT", as_index=False).build_year.max() + + # Find if each project is retired + def is_retired(row): + return is_plant_retired(float(row.build_year), period_start, period_length, float(row.gen_max_age)) + # We need to set dtype=str to ensure we don't start casting values to float types + # unexpectedly. This means that the is_retired() function above needs to cast the wanted + # values to floats. + proj = pd.read_csv( + "generation_projects_info.csv", + index_col=False, + na_values=".", + dtype=str, + ) + columns = proj.columns + proj = proj.merge(build_yrs, on="GENERATION_PROJECT") + proj["is_retired"] = proj.apply(is_retired, axis=1) + + # Filter out retired projects + l1 = len(proj) + proj = proj[~proj["is_retired"]] + l2 = len(proj) + # print(f"Dropped {l1 - l2} projects.") + + # Write csv + proj[columns].to_csv("generation_projects_info.csv", index=False, na_rep=".") + + # Now remove references to those projects + drop(["--silent", "--no-confirm", "--run", "--inputs-dir", "."]) + + return f"Removed {l1 - l2} projects" + + + + diff --git a/switch_model/wecc/get_inputs/register_post_process.py b/switch_model/wecc/get_inputs/register_post_process.py index 052c2a64d..de1b64589 100644 --- a/switch_model/wecc/get_inputs/register_post_process.py +++ b/switch_model/wecc/get_inputs/register_post_process.py @@ -21,8 +21,9 @@ def wrapper(*args, **kwargs): if message is None: message = f"Running {func.__name__}" print(f"\t{message}...") - func(*args, **kwargs) - + result = func(*args, **kwargs) + if result is not None: + print(f"\t\t{result}") return wrapper return decorator