-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Jinja2 templating in config #583
Comments
This is awesome! There are very few pipelines that I write that do not have a super repetative pattern in the catalog. I often create mine through a python script. Pardon my lack of understanding jinja. Can you use things like I often need something along the lines of continents = ['europe', 'asia', 'africa']
layers = ['raw', 'pri', 'int']
for continent, layer in itertools.product(continents, layers):
... I often have a very similar pattern that creates nodes, rather than maintaining duplicate lists, the one that generates the catalog actually imports from nodes module. Is it possible to access the same lists from both jinja and my nodes easily? |
@WaylonWalker I think it's possible to use Python module code in Jinja2 templates (see https://stackoverflow.com/a/11856935/1093967), and I don't see any reason why |
Another proposal may be something like the following config template (somewhat inspired by terraform): counts:
variables:
geo:
- africa
- asia
- europe
dataset:
- cases.csv
- demographic.csv
for_each:
lists:
- var.geo
- var.dataset
function:
- join:
delimiter: "/"
type: pandas.CSVDataSet
filepath: data/04_feature/{each.key} |
I believe this has been now addressed in c466c8a - |
Description
With the advent of reusable modular pipelines and namespacing (but even previously with dynamic pipeline creation), it's common to need near-duplicate catalog entries. For example, with primary data models for COVID-19 data in Europe, Asia, and Africa, I may want to reuse the same feature generation and master table creation pipeline, resulting in the following (subset of the) data catalog:
Having to write it out explicitly is inconvenient (borderline painful) and error-prone. Being forced to use the code API to define configuration is not ideal, either.
Context
Jinja2 is a widely-used templating language, already supported by the backend technology used by Kedro for configuration parsing,
anyconfig
. Turning it on lets (power) users leverage templating without affecting existing functionality. The above config would become:Possible Implementation
#578
Possible Alternatives
Individual users can make this change themselves, but it's annoying. Since
anyconfig
is imported in_load_config
, it can't be easily monkeypatched and requires redefining/retesting a lot of functionality. For example, from my current project:src/package_name/run.py
:src/tests/test_run.py
:The text was updated successfully, but these errors were encountered: