Implement Metadata for SMAC enabling Multi-Fidelity #771

jsfreischuetz · 2024-07-02T22:21:01Z

Enables changing the defaults in the SMAC optimizer to enable state passing (what SMAC internally calls context, but that doesn't match MLOS's notion of context) and multi-fidelity tuning.

This required a few additional changes to metadata throughout the project

Additionally, this PR includes a test case for Multi-Fidelity Tuning

This PR replaces the previous one (#751)

.cspell.json

mlos_core/mlos_core/optimizers/README

mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py

bpkroth · 2024-07-03T20:24:28Z

mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py

-        self.trial_info_map: Dict[ConfigSpace.Configuration, TrialInfo] = {}
+        self.trial_info_df: pd.DataFrame = pd.DataFrame(
+            columns=["Configuration", "Metadata", "TrialInfo", "TrialValue"]
+        )


Why replacing this with a dataframe?

Because configurations are no longer a unique identifier if we support metadata.

An alternative would be to use the type:

Dict[Tuple[ConfigSpace.Configuration, pd.DataFrame], Tuple[TrialInfo, TrialValue]]

but I thought using a dataframe was simpler and more flexible

Not certain you can actually use that Tuple as a Dict key (may not be hashable).

Tuples are hashable, but dataframes are not

mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py

bpkroth · 2024-07-03T20:25:39Z

mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py

        )

+        self.lock = threading.Lock()


Please add some comments regarding the need for the threading lock.

mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py

bpkroth · 2024-07-03T20:28:21Z

mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py

+                        [df_ctx.equals(ctx) for df_ctx in self.trial_info_df["Metadata"]]
+                    )
+
+                # make a new entry


Comments could use some improvement.

Make a new entry of what? What's the first part doing that this one now needs explanation?

mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py

mlos_core/mlos_core/optimizers/utils.py

bpkroth · 2024-07-03T20:34:09Z

mlos_core/mlos_core/optimizers/optimizer.py

+                              for _, _, _, metadata in observations]).reset_index(drop=True)
+        return (configs, scores, contexts if len(contexts.columns) > 0 else None, metadatas if len(metadatas.columns) > 0 else None)
+
+    def get_observations(self) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]:


These Tuples are getting a little large and hard to read (recall a previous version of this PR where the order of them was mistakenly swapped at one point).

Think we discussed creating a NamedTuple or small DataClass for them instead so that they can be accessed by name in order to make it more readable.

If you want I can do this in this PR, or another follow up PR

I think a predecessor PR would be better. Much like we did with adding the metadata args and named args first.

mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py

bpkroth · 2024-07-03T20:50:34Z

mlos_core/mlos_core/tests/optimizers/optimizer_metadata_test.py

+@pytest.mark.parametrize(('optimizer_type', 'verify', 'kwargs'), [
+    # Enumerate all supported Optimizers
+    *[(member, verify, {"seed": SEED, "facade": MFFacade, "intensifier": SuccessiveHalving, "min_budget": 1, "max_budget": 9})
+      for member, verify in [(OptimizerType.SMAC, smac_verify_best)]],


This is a pretty weird pattern. If we wanted to add more types in the future those kwargs wouldn't work.

What we've done elsewhere is

retain the loop over OptimizerType in order to make sure that all optimizers get a test added (or explicitly excluded) whenever we add new optimizer backends.

When necessary, within the body of the test set additional parameters in a switch statement for a given optimizer type. e.g.,

if optimizer_type == OptimizerType.SMAC: verifier = smac_verify_best elif optimizer_type == OptimizerType.FLAML: pytest.skip("TODO: FLAML Optimizer does not yet support metadata") # though even here, I think we should test *something* else: raise NotImplementedError(f"Missing test handler for OptimizerType {optimizer_type}")

I have moved the kwargs into the tuple which fixes the kwargs problem.

I find switching over unsupported optimizers messy since it means, in this case, we should skip the entire test as you have above, unless we are testing SMAC. There are other places where we test to make sure that using metadata throws exceptions, so without metadata support there is nothing to test.

But when we add that functionality to another optimizer it will be very easy to forget to enable related testing support, so I'd rather add that now as it will be easier to search for what we need to enable.

Won't it be the same either way? One requires adding a line, the other requires moving a line. I make this more similar to other tests, but I don't think it changes much either way.

mlos_core/mlos_core/tests/optimizers/optimizer_metadata_test.py

Co-authored-by: Brian Kroth <bpkroth@users.noreply.github.com>

…zer.py Co-authored-by: Brian Kroth <bpkroth@users.noreply.github.com>

…ulti-fidelity

…zer.py Co-authored-by: Brian Kroth <bpkroth@users.noreply.github.com>

Co-authored-by: Brian Kroth <bpkroth@users.noreply.github.com>

…zer.py Co-authored-by: Brian Kroth <bpkroth@users.noreply.github.com>

…ulti-fidelity

jsfreischuetz added 2 commits July 2, 2024 22:19

Implement metadata for multifidelity in SMAC

548af15

Merge branch 'main' into multi-fidelity

36ac67a

jsfreischuetz marked this pull request as ready for review July 2, 2024 22:26

jsfreischuetz requested a review from a team as a code owner July 2, 2024 22:26

bpkroth added 2 commits July 2, 2024 17:33

Merge branch 'main' into multi-fidelity

4cc133b

Merge branch 'main' into multi-fidelity

5ab03c8