-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Support for Hydra in Kedro #1303
Comments
To win some time, here is the repository containing the example I described : https://github.com/neltacigreb/kedroXhydra |
Hi @neltacigreb thanks for raising this! This is something we'd love to support in Kedro, as we progress to 1.0.0 this is the exactly the sort of thing that should be simple, that is to say the core parts of the Kedro framework should be entirely hot-swappable with alternatives. In terms of implementation above the issues you post feel working directory related so I would consider using a That being said, I do think there are two slightly more 'Kedrific' approaches at our disposal: (1) Using a lifecycle hookOur lifecycle hooks are the simplest way to extend parts of Kedro's run lifecycle. I'm pretty sure the (2) Defining your own config loaderHere you would define your own In the next release of Kedro 0.18.0 we will introduce an AbstractConfigLoader class which will make this specific exercise simpler in the future. Closing thoughtsKedro is going through a long running exercise where we're researching and thinking about potential solutions to config overhead, complexity and mental model. This is being tracked on issue #891 so any thoughts you have on this topic will steer our future direction. |
Thank you for the directions, I tried different ways to implement the loading of a Hydra config and here are some notes I collected. Disclaimer I am a fresh Kedro user, and quite experienced Hydra user. A solution to use Hydra in KedroI ended up using a hook, which will load the config before a pipeline execution and store it in the data catalog. A repo with a working example of using Hydra in the iris toy project is here, and the source code for the hook specifically is here. When looking at projects from the community, I stumbled on this package from @Minyus, which is used in this implementation.
Limitations encountered when trying different methods
Thoughts on the config in Kedro, and how to improve compatibility with HydraAgain, I'm very new to Kedro so I don't have a clear view on what these changes would impact down the line.
Thanks for this cool framework :) |
[AUTO-MERGE] Merge master into develop via merge-master-to-develop
Some update on this, I managed to make a custom decorator to be used on the Documentation and source can be found in this repo The more I use both packages, the more I feel like a native integration of Hydra in Kedro would make sense. I'm thinking about logging management, environments switching, classes instantiation from yaml files, dynamic pipelines generation at runtime, ... |
@neltacigreb thank you for the update on this and your continued work - if helpful I could organise a discussion call with the maintainer group to help think through some of this? I'm not sure any of us are familiar with Hydra - but we absolutely see the user value here. |
(I also think your snippet is actually pretty elegant) |
@datajoely absolutely I'd be happy to explain globally what the package does and what parallels I see with the current config management in Kedro, I PM you on linkedin. Obviously the Hydra doc is good to check, and for curiosity the one from Omegaconf too. Hydra is basically an extension of Omegaconf. Also this repo is a good example, since it uses most of the functionalities provided by the package. |
Thank you for summarizing this. This makes perfect sense and I have seen internal teams are trying to achieve the same thing by writing their own multi-runner (one commands to fire up multiple kedro pipelines with different parameters). As far as I can remember. Hydra probably has a different way of looking at how configurations should be structure and this could be related to #891. |
Hi @noklam sorry for the delay.
This is true, using hydra via CLI allows to override parameters at runtime, or launch it in multirun mode (one command fires all configs). Also the CLI mode creates a new output folder for each run, which proves useful in multirun.
Correct me if needed but I feel that the main difference is that kedro aims at simplicity in the config directory, while hydra encourage more complex config folder structures, so to make use of the override mechanism. They're similar on some subjects too (multirun, dynamic pipelines, overrides), where some are already provided in the kedro config As I continue using the two packages, I'll focus on 2 features that could be a match in my opinion:
When I find some time i'll package my findings in a plugin :) until then if you think of some features that could be used into kedro I'd be glad to try them as well |
Thanks for the explanation. As I understand you are using the Compose API currently. So the main benefit of using
|
For the compose API, that's it exactly. In the repo I mentioned, their are 2 hydra decorators adapters.
My plan to make the multirun usable, is to generate many kedro pipelines with different configurations, namespace them and assemble them in a big final pipeline. I didn't know about the multirun hook's I'll look into that first to see if it fits my app |
Did some digging into this issue myself. It seems the problem of integrating hydra with kedro stems from the fact that hydra tries to be a “replacement” for e.g. a click application. Hence kedro and hydra will never be natively compatible. That being said, as already shown above, it is still possible to create a custom decorator to make them compatible. The only real question is: in what location of the code base are we going to use that one (somewhere in the cli part seems most reasonable)? In my opinion, hydra fits best as an alternative ConfigLoader potentially alongside an OmegaConfLoader. In that case, using hydra’s compose API would already suffice, provided we take care of missing features with the kedro cli. The biggest problem I see with using hydra is that omegaconf (and hence hydra) only supports “soft-merges” of config, which is not always desirable! So, to me it feels like adopting hydra would also require a change in how we manage configuration in a project… |
I've turned using hydra in a cli script that runs kedro in a subprocess, which works to use hydra plugins (hyperparameters search,..) but makes it difficult to debug a pipeline. Having a custom OmegaConfLoader and compose API would be best integrated in kedro's workflow, but looses the ability to override parameters using at once a yaml file of overrides . Not sure what soft merge refers to, but having 2 different ways of reading the config could indeed be problematic. So far this script has worked for me, as long as the config structure stays simple: import os
import shlex
import subprocess
import sys
from typing import List
import hydra
from flatten_dict import flatten
from omegaconf import DictConfig
def run_subprocess(command: str):
print(f"Running command: \n\n{command}\n")
process = subprocess.Popen(
shlex.split(command),
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
bufsize=2,
)
with process.stdout as p_out:
for line in iter(p_out.readline, b""): # b'\n'-separated lines
print(line.decode().strip())
process.wait() # to fetch returncode
return process.returncode
def get_all_params_overrides(cfg: DictConfig) -> List[str]:
config_flat = flatten(cfg)
params_overrides = [
".".join(param_keys) + ":" + str(param_value)
for param_keys, param_value in config_flat.items()
]
return params_overrides
@hydra.main(config_path="conf", config_name="config", version_base=None)
def main(cfg: DictConfig):
pipeline = cfg.pipeline if "pipeline" in cfg.keys() else "__default__"
params_overrides = get_all_params_overrides(cfg)
kedro_bin = os.path.join(os.path.split(sys.executable)[0], "kedro")
command = " ".join(
[
kedro_bin,
"run",
f"--pipeline={pipeline}",
f'--params="{",".join(params_overrides)}"',
]
)
returncode = run_subprocess(command)
if returncode:
raise Exception
if __name__ == "__main__":
main() |
Hello folks, since this issue was opened we've made great progress and now we intend to make This means though that we don't plan to support Hydra natively in the near future. This doesn't mean that Hydra can't work with Kedro - in fact, some internal teams in McKinsey have created a This is not something that we'd like to maintain for the open source community though, so we'll look into blogging about our approach and giving away some bits of code, and if someone else wants to take over and publish it as a plugin, we'll be more than happy to promote it in https://github.com/kedro-org/awesome-kedro. For now I'm closing this feature request as "won't fix". If you have more thoughts, please feel free to share them in this thread. Thanks everyone who contributed to the conversation! |
Description
Hydra is a framework for elegantly configuring complex applications. It is used to create a hierarchical configuration by splitting it in different yaml files, making it easier to organise.
Project description here : https://github.com/facebookresearch/hydra
When trying to use Hydra via the
hydra.main()
decorator applied onregister_pipelines()
, an error occurs.Context
Having Kedro and Hydra working together would make it easier to maintain complex pipelines.
Reproducing issue
python version: 3.8.12
kedro version: 0.17.7
hydra version: 1.1.1
The bug appears when trying to set
hydra.main()
decorator onregister_pipelines()
. This decorator is used to build an Omegaconf config from the /conf directory. Steps to reproduce :This will result in the following error:
note: get_started is the name of the package in /src
Cause of the issue
After some digging, it appears that the configuration path resolved by hydra.main does not exist. The following info is obtained by running in debug mode, and setting a breakpoint on the first line of the function
ensure_main_config_source_available()
. Full path:hydra/_internal/config_loader_impl.py/ConfigLoaderImpl.ensure_main_config_source_available()
self.get_sources()
while being inConfigLoaderImpl.ensure_main_config_source_available
returns this :[provider=hydra, path=pkg://hydra.conf, provider=main, path=pkg://conf, provider=schema, path=structured://]
[provider=hydra, path=pkg://hydra.conf, provider=main, path=file:///PATH_TO_PROJECT//conf, provider=schema, path=structured://]
It appears that Hydra doesn't know how to get
file:///PATH_TO_PROJECT
, and replaces it bypkg://
Possible Implementation
Not really sure how to solve and which library should be adapted to correct this bug, so I wrote a similar post on Hydra's issues tracker.
Hydra requires that the script is launched by calling it manually in the terminal, and I don't know what happens when executing
kedro run
but I guess it comes from somewhere here.Possible Alternatives
Right now i'm using a workaround by generating the conf via
initialize()
andcompose()
:Follow up question
This also raises the question on how to do config overrides from the command line, a feature of Hydra possible when the user calls the script himself from command line. I guess it would be possible via the
--config
argument ofkedro run
, but I haven't tested it yet.tl;dr: hydra.main() called in a unusual way, leading to the impossibility for Hydra to find the config folder.
The text was updated successfully, but these errors were encountered: