Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Optimizer state and Multi-Fidelity passthrough in SMAC #751

Closed
wants to merge 26 commits into from
Closed
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
7f8a43b
minimal implementation of mutli-fidelity
jsfreischuetz May 22, 2024
8afb5f0
revert changes
jsfreischuetz May 23, 2024
08575af
revert
jsfreischuetz May 23, 2024
fcfca53
fix minor bug with logging
jsfreischuetz Jun 1, 2024
838c1db
undo formatting
jsfreischuetz Jun 1, 2024
7533b4e
Update mlos_core/mlos_core/optimizers/optimizer.py
jsfreischuetz Jun 3, 2024
3904020
merge
jsfreischuetz Jun 3, 2024
4ffff6c
Merge branch 'microsoft-main' into multifidleity
jsfreischuetz Jun 3, 2024
b7de120
merge
jsfreischuetz Jun 3, 2024
7278994
add checks back to optimizer
jsfreischuetz Jun 4, 2024
c79294a
add checks back
jsfreischuetz Jun 4, 2024
048269c
add checks back
jsfreischuetz Jun 4, 2024
019192a
update name of context to metadata, and add readme
jsfreischuetz Jun 5, 2024
88d63c1
update tests to also use correct terminology
jsfreischuetz Jun 5, 2024
4e36f28
Update mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimi…
jsfreischuetz Jun 6, 2024
3326ac9
Update mlos_core/mlos_core/optimizers/README.md
jsfreischuetz Jun 6, 2024
2399d3e
Update mlos_core/mlos_core/optimizers/README.md
jsfreischuetz Jun 6, 2024
1f210b5
Add context back to the register interface
jsfreischuetz Jun 6, 2024
87a5af9
Merge branch 'main' into multifidleity
motus Jun 7, 2024
48af70f
Apply suggestions from code review
bpkroth Jun 12, 2024
cd8deff
Merge branch 'main' into multifidleity
motus Jun 12, 2024
271a79b
Update mlos_core/mlos_core/optimizers/optimizer.py
jsfreischuetz Jun 12, 2024
98c7398
Update mlos_core/mlos_core/optimizers/optimizer.py
jsfreischuetz Jun 12, 2024
9726410
Update mlos_core/mlos_core/optimizers/optimizer.py
jsfreischuetz Jun 12, 2024
bf4602b
Update mlos_core/mlos_core/optimizers/optimizer.py
jsfreischuetz Jun 12, 2024
8d2a894
fix comments for python
jsfreischuetz Jun 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@
"sklearn",
"skopt",
"smac",
"SOBOL",
"sqlalchemy",
"srcpaths",
"subcmd",
Expand Down
2 changes: 1 addition & 1 deletion mlos_bench/mlos_bench/optimizers/mlos_core_optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ def suggest(self) -> TunableGroups:
tunables = super().suggest()
if self._start_with_defaults:
_LOG.info("Use default values for the first trial")
df_config = self._opt.suggest(defaults=self._start_with_defaults)
df_config, _ = self._opt.suggest(defaults=self._start_with_defaults)
self._start_with_defaults = False
_LOG.info("Iteration %d :: Suggest:\n%s", self._iter, df_config)
return tunables.assign(
Expand Down
18 changes: 18 additions & 0 deletions mlos_core/mlos_core/optimizers/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
This is a directory that contains wrappers for different optimizers to integrate into MLOS.
jsfreischuetz marked this conversation as resolved.
Show resolved Hide resolved
This is implemented though child classes for the `BaseOptimizer` class defined in `optimizer.py`.

The main goal of these optimizers is to take a suggest configurations based on prior samples to find an optimum based on some objective. This process is interacted with through and ask and tell interface.
bpkroth marked this conversation as resolved.
Show resolved Hide resolved

The following defintions are useful for understanding the implementation
- `configuration`: a vector representation of a configuration of a system to be evaluated.
- `score`: the objective(s) associated with a configuration
- `metadata`: additional information about the evaluation, such as the runtime budget used during evaluation.
- `context`: additional information about the evaluation used to extend the internal model used for suggesting samples. This is not yet implemented.
bpkroth marked this conversation as resolved.
Show resolved Hide resolved

The interface for these classes can be described as follows:

- `register`: this is a function that takes a configuration, a score, and, optionally, metadata about the evaluation to update the model for future evaluations.
- `suggest`: this function returns a new confiugration for evaluation. Some optimizers will return additional metadata for evaluation, that should be used durin the register phase. This function can also optionally take context (not yet implemented), and an argument to force the function to return the default configuration.
bpkroth marked this conversation as resolved.
Show resolved Hide resolved
- `register_pending`: registers a configuration and metadata pair as pending to the optimizer.
- `get_observations`: returns all observations reproted to the optimizer as a triplet of DataFrames (config, score, metadata).
jsfreischuetz marked this conversation as resolved.
Show resolved Hide resolved
- `get_best_observations`: returns the best observation as A triplet of best (config, score, metadata) DataFrames.
bpkroth marked this conversation as resolved.
Show resolved Hide resolved
363 changes: 292 additions & 71 deletions mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py

Large diffs are not rendered by default.

18 changes: 10 additions & 8 deletions mlos_core/mlos_core/optimizers/flaml_optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
Contains the FlamlOptimizer class.
"""

from typing import Dict, List, NamedTuple, Optional, Union
from typing import Dict, List, NamedTuple, Optional, Tuple, Union
from warnings import warn

import ConfigSpace
Expand Down Expand Up @@ -86,7 +86,7 @@ def __init__(self, *, # pylint: disable=too-many-arguments
self._suggested_config: Optional[dict]

def _register(self, configurations: pd.DataFrame, scores: pd.DataFrame,
context: Optional[pd.DataFrame] = None) -> None:
metadata: Optional[pd.DataFrame] = None) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please restore context.
The idea was to add metadata in addition to it, not replace it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(requires associated changes throughout)

"""Registers the given configurations and scores.

Parameters
Expand All @@ -97,11 +97,11 @@ def _register(self, configurations: pd.DataFrame, scores: pd.DataFrame,
scores : pd.DataFrame
Scores from running the configurations. The index is the same as the index of the configurations.

context : None
metadata : None
Not Yet Implemented.
"""
if context is not None:
warn(f"Not Implemented: Ignoring context {list(context.columns)}", UserWarning)
if metadata is not None:
warn(f"Not Implemented: Ignoring context {list(metadata.columns)}", UserWarning)
for (_, config), (_, score) in zip(configurations.astype('O').iterrows(), scores.iterrows()):
cs_config: ConfigSpace.Configuration = ConfigSpace.Configuration(
self.optimizer_parameter_space, values=config.to_dict())
Expand All @@ -112,7 +112,9 @@ def _register(self, configurations: pd.DataFrame, scores: pd.DataFrame,
score=float(np.average(score.astype(float), weights=self._objective_weights)),
)

def _suggest(self, context: Optional[pd.DataFrame] = None) -> pd.DataFrame:
def _suggest(
self, context: Optional[pd.DataFrame] = None
) -> Tuple[pd.DataFrame, Optional[pd.DataFrame]]:
"""Suggests a new configuration.

Sampled at random using ConfigSpace.
Expand All @@ -130,10 +132,10 @@ def _suggest(self, context: Optional[pd.DataFrame] = None) -> pd.DataFrame:
if context is not None:
warn(f"Not Implemented: Ignoring context {list(context.columns)}", UserWarning)
config: dict = self._get_next_config()
return pd.DataFrame(config, index=[0])
return pd.DataFrame(config, index=[0]), None

def register_pending(self, configurations: pd.DataFrame,
context: Optional[pd.DataFrame] = None) -> None:
metadata: Optional[pd.DataFrame] = None) -> None:
jsfreischuetz marked this conversation as resolved.
Show resolved Hide resolved
raise NotImplementedError()

def _target_function(self, config: dict) -> Union[dict, None]:
Expand Down
93 changes: 57 additions & 36 deletions mlos_core/mlos_core/optimizers/optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ class BaseOptimizer(metaclass=ABCMeta):

def __init__(self, *,
parameter_space: ConfigSpace.ConfigurationSpace,
optimization_targets: List[str],
optimization_targets: Optional[Union[str, List[str]]] = None,
jsfreischuetz marked this conversation as resolved.
Show resolved Hide resolved
objective_weights: Optional[List[float]] = None,
space_adapter: Optional[BaseSpaceAdapter] = None):
"""
Expand Down Expand Up @@ -57,8 +57,10 @@ def __init__(self, *,

self._space_adapter: Optional[BaseSpaceAdapter] = space_adapter
self._observations: List[Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]] = []
self._has_context: Optional[bool] = None
self._has_metadata: Optional[bool] = None
self._pending_observations: List[Tuple[pd.DataFrame, Optional[pd.DataFrame]]] = []
self.delayed_config: Optional[pd.DataFrame] = None
self.delayed_metadata: Optional[pd.DataFrame] = None
Comment on lines +62 to +63
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.delayed_config: Optional[pd.DataFrame] = None
self.delayed_metadata: Optional[pd.DataFrame] = None
self._delayed_config: Optional[pd.DataFrame] = None
self._delayed_metadata: Optional[pd.DataFrame] = None

probably should be private, right?


def __repr__(self) -> str:
return f"{self.__class__.__name__}(space_adapter={self.space_adapter})"
Expand All @@ -69,7 +71,7 @@ def space_adapter(self) -> Optional[BaseSpaceAdapter]:
return self._space_adapter

def register(self, configurations: pd.DataFrame, scores: pd.DataFrame,
context: Optional[pd.DataFrame] = None) -> None:
metadata: Optional[pd.DataFrame] = None) -> None:
"""Wrapper method, which employs the space adapter (if any), before registering the configurations and scores.

Parameters
Expand All @@ -78,34 +80,35 @@ def register(self, configurations: pd.DataFrame, scores: pd.DataFrame,
Dataframe of configurations / parameters. The columns are parameter names and the rows are the configurations.
scores : pd.DataFrame
Scores from running the configurations. The index is the same as the index of the configurations.

context : pd.DataFrame
Not Yet Implemented.
metadata : pd.DataFrame
Implementaton depends on instance.
jsfreischuetz marked this conversation as resolved.
Show resolved Hide resolved
"""
# Do some input validation.
assert set(scores.columns) == set(self._optimization_targets), \
"Mismatched optimization targets."
assert self._has_context is None or self._has_context ^ (context is None), \
"Context must always be added or never be added."
if type(self._optimization_targets) is str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if type(self._optimization_targets) is str:
assert self._optimization_targets, "Missing or invalid optimization targets"
if type(self._optimization_targets) is str:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also assert not empty (see also comment above about accepting None)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this makes sense given my comment above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but separate PR for that one please

assert self._optimization_targets in scores.columns, "Mismatched optimization targets."
if type(self._optimization_targets) is list:
assert set(scores.columns) >= set(self._optimization_targets), "Mismatched optimization targets."
assert self._has_metadata is None or self._has_metadata ^ (metadata is None), \
"Metadata must always be added or never be added."
jsfreischuetz marked this conversation as resolved.
Show resolved Hide resolved
assert len(configurations) == len(scores), \
"Mismatched number of configurations and scores."
if context is not None:
assert len(configurations) == len(context), \
"Mismatched number of configurations and context."
if metadata is not None:
assert len(configurations) == len(metadata), \
"Mismatched number of configurations and metadata."
assert configurations.shape[1] == len(self.parameter_space.values()), \
"Mismatched configuration shape."
self._observations.append((configurations, scores, context))
self._has_context = context is not None
self._observations.append((configurations, scores, metadata))
self._has_metadata = metadata is not None

if self._space_adapter:
configurations = self._space_adapter.inverse_transform(configurations)
assert configurations.shape[1] == len(self.optimizer_parameter_space.values()), \
"Mismatched configuration shape after inverse transform."
return self._register(configurations, scores, context)
return self._register(configurations, scores, metadata)

@abstractmethod
def _register(self, configurations: pd.DataFrame, scores: pd.DataFrame,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _register(self, configurations: pd.DataFrame, scores: pd.DataFrame,
def _register(self, *, configurations: pd.DataFrame, scores: pd.DataFrame,

Can force the args to be named to help avoid param ordering mistakes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the elsewhere (e.g., public methods and suggest), though this might be a larger API change that needs its own PR first in prepration for this one since callers will also be affected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Let's fix _register now and update the public register in the next PR

context: Optional[pd.DataFrame] = None) -> None:
metadata: Optional[pd.DataFrame] = None) -> None:
"""Registers the given configurations and scores.

Parameters
Expand All @@ -115,12 +118,14 @@ def _register(self, configurations: pd.DataFrame, scores: pd.DataFrame,
scores : pd.DataFrame
Scores from running the configurations. The index is the same as the index of the configurations.

context : pd.DataFrame
Not Yet Implemented.
metadata : pd.DataFrame
Implementaton depends on instance.
"""
pass # pylint: disable=unnecessary-pass # pragma: no cover

def suggest(self, context: Optional[pd.DataFrame] = None, defaults: bool = False) -> pd.DataFrame:
def suggest(
self, context: Optional[pd.DataFrame] = None, defaults: bool = False
) -> Tuple[pd.DataFrame, Optional[pd.DataFrame]]:
"""
Wrapper method, which employs the space adapter (if any), after suggesting a new configuration.

Expand All @@ -136,13 +141,25 @@ def suggest(self, context: Optional[pd.DataFrame] = None, defaults: bool = False
-------
configuration : pd.DataFrame
Pandas dataframe with a single row. Column names are the parameter names.
metadata : pd.DataFrame
Pandas dataframe with a single row containing the metadata.
Column names are the budget, seed, and instance of the evaluation, if valid.
"""
if defaults:
configuration = config_to_dataframe(self.parameter_space.get_default_configuration())
self.delayed_config, self.delayed_metadata = self._suggest(context)

configuration: pd.DataFrame = config_to_dataframe(
self.parameter_space.get_default_configuration()
)
metadata = self.delayed_metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: when creating PRs - try to keep your changes smaller. It's easier to review and debug.
If the order of this one didn't really matter you could have left the first line alone and only added the two new ones

if self.space_adapter is not None:
configuration = self.space_adapter.inverse_transform(configuration)
else:
configuration = self._suggest(context)
if self.delayed_config is None:
configuration, metadata = self._suggest(metadata)
else:
configuration, metadata = self.delayed_config, self.delayed_metadata
self.delayed_config, self.delayed_metadata = None, None
assert len(configuration) == 1, \
"Suggest must return a single configuration."
assert set(configuration.columns).issubset(set(self.optimizer_parameter_space)), \
Expand All @@ -151,10 +168,12 @@ def suggest(self, context: Optional[pd.DataFrame] = None, defaults: bool = False
configuration = self._space_adapter.transform(configuration)
assert set(configuration.columns).issubset(set(self.parameter_space)), \
"Space adapter produced a configuration that does not match the expected parameter space."
return configuration
return configuration, metadata

@abstractmethod
def _suggest(self, context: Optional[pd.DataFrame] = None) -> pd.DataFrame:
def _suggest(
self, context: Optional[pd.DataFrame] = None
) -> Tuple[pd.DataFrame, Optional[pd.DataFrame]]:
"""Suggests a new configuration.

Parameters
Expand All @@ -166,12 +185,16 @@ def _suggest(self, context: Optional[pd.DataFrame] = None) -> pd.DataFrame:
-------
configuration : pd.DataFrame
Pandas dataframe with a single row. Column names are the parameter names.

metadata : pd.DataFrame
Pandas dataframe with a single row containing the metadata.
Column names are the budget, seed, and instance of the evaluation, if valid.
"""
pass # pylint: disable=unnecessary-pass # pragma: no cover

@abstractmethod
def register_pending(self, configurations: pd.DataFrame,
context: Optional[pd.DataFrame] = None) -> None:
metadata: Optional[pd.DataFrame] = None) -> None:
"""Registers the given configurations as "pending".
That is it say, it has been suggested by the optimizer, and an experiment trial has been started.
This can be useful for executing multiple trials in parallel, retry logic, etc.
Expand All @@ -180,31 +203,29 @@ def register_pending(self, configurations: pd.DataFrame,
----------
configurations : pd.DataFrame
Dataframe of configurations / parameters. The columns are parameter names and the rows are the configurations.
context : pd.DataFrame
Not Yet Implemented.
"""
pass # pylint: disable=unnecessary-pass # pragma: no cover

def get_observations(self) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]:
"""
Returns the observations as a triplet of DataFrames (config, score, context).
Returns the observations as a triplet of DataFrames (config, score, metadata).
jsfreischuetz marked this conversation as resolved.
Show resolved Hide resolved

Returns
-------
observations : Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]
A triplet of (config, score, context) DataFrames of observations.
A triplet of (config, score, metadata) DataFrames of observations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A triplet of (config, score, metadata) DataFrames of observations.
A 4-tuple of (config, score, context, metadata) DataFrames of observations.

(or, better yet, a NamedTuple)

"""
if len(self._observations) == 0:
raise ValueError("No observations registered yet.")
configs = pd.concat([config for config, _, _ in self._observations]).reset_index(drop=True)
scores = pd.concat([score for _, score, _ in self._observations]).reset_index(drop=True)
contexts = pd.concat([pd.DataFrame() if context is None else context
for _, _, context in self._observations]).reset_index(drop=True)
return (configs, scores, contexts if len(contexts.columns) > 0 else None)
metadatas = pd.concat([pd.DataFrame() if metadata is None else metadata
for _, _, metadata in self._observations]).reset_index(drop=True)
return (configs, scores, metadatas if len(metadatas.columns) > 0 else None)

def get_best_observations(self, n_max: int = 1) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]:
"""
Get the N best observations so far as a triplet of DataFrames (config, score, context).
Get the N best observations so far as a triplet of DataFrames (config, score, metadata).
Default is N=1. The columns are ordered in ASCENDING order of the optimization targets.
The function uses `pandas.DataFrame.nsmallest(..., keep="first")` method under the hood.

Expand All @@ -216,14 +237,14 @@ def get_best_observations(self, n_max: int = 1) -> Tuple[pd.DataFrame, pd.DataFr
Returns
-------
observations : Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]
A triplet of best (config, score, context) DataFrames of best observations.
A triplet of best (config, score, metadata) DataFrames of best observations.
"""
if len(self._observations) == 0:
raise ValueError("No observations registered yet.")
(configs, scores, contexts) = self.get_observations()
(configs, scores, metadatas) = self.get_observations()
idx = scores.nsmallest(n_max, columns=self._optimization_targets, keep="first").index
return (configs.loc[idx], scores.loc[idx],
None if contexts is None else contexts.loc[idx])
None if metadatas is None else metadatas.loc[idx])

def cleanup(self) -> None:
"""
Expand Down
25 changes: 15 additions & 10 deletions mlos_core/mlos_core/optimizers/random_optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
Contains the RandomOptimizer class.
"""

from typing import Optional
from typing import Optional, Tuple
from warnings import warn

import pandas as pd
Expand All @@ -25,7 +25,7 @@ class RandomOptimizer(BaseOptimizer):
"""

def _register(self, configurations: pd.DataFrame, scores: pd.DataFrame,
context: Optional[pd.DataFrame] = None) -> None:
metadata: Optional[pd.DataFrame] = None) -> None:
"""Registers the given configurations and scores.

Doesn't do anything on the RandomOptimizer except storing configurations for logging.
Expand All @@ -38,14 +38,15 @@ def _register(self, configurations: pd.DataFrame, scores: pd.DataFrame,
scores : pd.DataFrame
Scores from running the configurations. The index is the same as the index of the configurations.

context : None
Not Yet Implemented.
metadata : None
Metadata is ignored for random_optimizer.
"""
if context is not None:
warn(f"Not Implemented: Ignoring context {list(context.columns)}", UserWarning)
pass
# should we pop them from self.pending_observations?

def _suggest(self, context: Optional[pd.DataFrame] = None) -> pd.DataFrame:
def _suggest(
self, context: Optional[pd.DataFrame] = None
) -> Tuple[pd.DataFrame, Optional[pd.DataFrame]]:
"""Suggests a new configuration.

Sampled at random using ConfigSpace.
Expand All @@ -59,13 +60,17 @@ def _suggest(self, context: Optional[pd.DataFrame] = None) -> pd.DataFrame:
-------
configuration : pd.DataFrame
Pandas dataframe with a single row. Column names are the parameter names.

metadata : pd.DataFrame
Pandas dataframe with a single row containing the metadata.
Column names are the budget, seed, and instance of the evaluation, if valid.
"""
if context is not None:
# not sure how that works here?
warn(f"Not Implemented: Ignoring context {list(context.columns)}", UserWarning)
return pd.DataFrame(dict(self.optimizer_parameter_space.sample_configuration()), index=[0])
return pd.DataFrame(dict(self.optimizer_parameter_space.sample_configuration()), index=[0]), None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should this return None for context?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't returning the context that was passed in make more sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have interpreted the context that is passed in to the suggest to be different than the context that is returned. I am not exactly sure what the context in is supposed to mean

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I expect:

  • context is composed of some set of metrics describing the execution environment (e.g., vm ram, vm vcore count, workload descriptor, etc.)
  • upon registering the optimizer is able to build a model for how different config+context pairs maps to different output metrics
  • when requesting a new suggestion, possible for a different execution environment (e.g., larger vm size), described in the context, the optimizer should be able to use all of that information to provide a new config suggestion for that new context

Now, it's possible that the optimizer stinks at that part initially because it doesn't know how to compute suggestions for unseen contexts (future research work probably).
One option could be that we return a variable length (user specified) list of items, each stemming from different context of "known" items (ranked by higher confidence), instead of for a the new context and then the caller needs to sift through and decide what to do, or else we return a random config (explore) + the provided context and they could start trying them to help fill in that knowledge.
I think the exact specifics of that part is to be determined.

But high level, if someone asks for a suggestion for a given context, I think we should return something within context, not something for anywhere in the space.

Make sense?

Copy link
Contributor Author

@jsfreischuetz jsfreischuetz Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context is composed of some set of metrics describing the execution environment (e.g., vm ram, vm vcore count, workload descriptor, etc.)

As far as I am aware this is not supported by any of the optimizers when asking for a suggestion. SMAC supports the idea of these values being defined on optimizer initialization, but which context to evaluate is determined by the optimizer, and there is still not support for requiring a specific context.

But high level, if someone asks for a suggestion for a given context, I think we should return something within context, not something for anywhere in the space.

I think there is a problem with this. There are two ways to solve for this:

  1. If we simply request configurations until we find a context that matches, we now have a long list of pending configurations that should be evaluated according to the optimizer. If you ignore them, the optimizer will not be evaluating in this region as it believes that they are still pending. If you process them late, they are stale.
  2. Alternatively, we can force the context to match, regardless of what the optimizer returns, but this is not actually sampling efficiently according to the optimizer.

I think realistically, we should remove the argument from suggest, but I tried to keep the changes as minimal as possible. Someone else before me also left a comment suggesting this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed offline about the discrepency in terminology.

SMAC's notion of "context" is different from that in MLOS, which @jsfreischuetz rightly pointed out may or may not be supported directly yet.
SMAC's is much more about passing internal optimizer "state".

For now, we decided to

  1. document the difference in the README.md above.
  2. rename the SMAC required "state" to either metadata or state


def register_pending(self, configurations: pd.DataFrame,
context: Optional[pd.DataFrame] = None) -> None:
metadata: Optional[pd.DataFrame] = None) -> None:
raise NotImplementedError()
# self._pending_observations.append((configurations, context))
# self._pending_observations.append((configurations, metadata))
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,10 @@ def test_context_not_implemented_warning(configuration_space: CS.ConfigurationSp
optimization_targets=['score'],
**kwargs
)
suggestion = optimizer.suggest()
suggestion, _ = optimizer.suggest()
scores = pd.DataFrame({'score': [1]})
context = pd.DataFrame([["something"]])

with pytest.raises(UserWarning):
optimizer.register(suggestion, scores, context=context)

with pytest.raises(UserWarning):
optimizer.suggest(context=context)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so we can register context, but not suggest with it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more about not sending the warning not being being thrown anymore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what the code in this test is doing. I'm asking why we aren't supporting suggestions with context yet when we're adding support for registering with context. It makes it seem like a blackhole of information or only partial support somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does context for a suggestion mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed offline and is documented in the suggestion in the README.md above now.


Expand Down
Loading
Loading