-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(sdk): Allow disabling default caching via a CLI flag and env var #11222
base: master
Are you sure you want to change the base?
feat(sdk): Allow disabling default caching via a CLI flag and env var #11222
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
c305fa5
to
b4b38ae
Compare
…a compiler CLI flag and env var Co-authored-by: Greg Sheremeta <gshereme@redhat.com> Signed-off-by: ddalvi <ddalvi@redhat.com>
eedb220
to
fa7a432
Compare
Signed-off-by: ddalvi <ddalvi@redhat.com>
fa7a432
to
626e1c1
Compare
/hold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrote my thoughts on this PR as a general feedback without explicit approval.
# or the env var KFP_DISABLE_EXECUTION_CACHING_BY_DEFAULT. | ||
# align with click's treatment of env vars for boolean flags. | ||
# per click doc, "1", "true", "t", "yes", "y", and "on" are all converted to True | ||
_execution_caching_default = not str( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By assigning _execution_caching_default
at the class level, it gets executed upon import. This represented a significant problem in Pipeline version 1, as detailed in: kubeflow/kfp-tekton#1345 , kubeflow/kfp-tekton#1339 , https://github.com/kubeflow/kfp-tekton/pull/1336/files
The user will need to define KFP_DISABLE_EXECUTION_CACHING_BY_DEFAULT
before importing the compiler.
The developer code will end up like the following that is not the natural way to develop an application in Python.
os.environ["KFP_DISABLE_EXECUTION_CACHING_BY_DEFAULT"] = True
from kfp.client import Client
I was really affected by the order of the definition explained above, and if we are going to follow this approach, we need to make sure to document it everywhere. It took me a lot of time to find out that the import order really matter while working with Pipelines v1.
Also, the Pipeline._execution_caching_default
is being set globally and can be modified by the flag or environment variable.
If the community believes that we need to follow this approach, this PR should add documentation about the side effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user will need to define KFP_DISABLE_EXECUTION_CACHING_BY_DEFAULT before importing the compiler.
You are correct. I believe that the typical end user will set this env var in their terminal, and then run the python file containing the compiler. You're perhaps not that typical user.
A key requirement of this task is to make sure that no pipeline code needs to change. Since PipelineTask is loaded by the interpreter as soon as the @dsl.pipeline is interpreted, and that's before the Compiler.compile function is in play, we saw no other way to do this.
Do you have a suggestion for an alternative way, where @dsl.pipeline is not changed, and no pipeline code would ever need to be changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A key requirement of this task is to make sure that no pipeline code needs to change. Since PipelineTask is loaded by the interpreter as soon as the @dsl.pipeline is interpreted, and that's before the Compiler.compile function is in play, we saw no other way to do this.
+1 to what Greg mentioned here, this seems to be the only way to have the var plumbed in without having to change the Pipeline code. Any alternate way would lead to a bunch of pipeline API call changes, and we want to avoid that.
My vote would be to keep this logic as is, and we'll add documentation on this approach to avoid any confusion. @diegolovison could you suggest where the documentation needs to go?
def test_compile_with_caching_flag_enabled(self): | ||
with tempfile.NamedTemporaryFile(suffix='.py') as temp_pipeline: | ||
# Write the pipeline function to a temporary file | ||
temp_pipeline.write(b""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we just have a static file we can reference, rather than trying to dynamically write and use the file via temp files? It would be easier and cleaner to manage, especially since we are writing essentially the same file 3 separate times.
That said, if there is a technical reason for using tempfiles, since it appears the pipelines are the same across all 3 `test_compile_ functions, we should extract this 'tiny-pipeline' into a variable and simply reference it here instead
result = self.invoke( | ||
['--py', temp_pipeline.name, '--output', 'test_output.yaml']) | ||
print(result.output) # Print the command output | ||
self.assertEqual(result.exit_code, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be good to have some additional/more vigorous validation instead of just a return code check here. Perhaps some checks on the contents of test_output.yaml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Granted, this is for testing the CLI, not the compiler itself, so it likely could be something lightweight, even if it's just 'Check for existence of test_output.yaml and ensure it's not empty' to verify the command actually did something
Description of your changes:
Allow setting a default of execution caching disabled via a compiler CLI flag and env var.
Resolves #11092
Key Changes:
CLI Update (kfp dsl compile):
A new flag
--disable-execution-caching-by-default
is added. When set, this flag disables caching for all pipeline tasks by default. Behavior specified in pipeline code still takes precedence.A corresponding Environment Variable
KFP_DISABLE_EXECUTION_CACHING_BY_DEFAULT
is added as well.Details:
By default, tasks default to caching enabled if the pipeline code doesn't specify a desired behavior via set_caching_options. For example:
In this example, task 1 won't try to use the cache, but task 2 will, even though the author didn't specify anything about task 2. The resulting pipeline spec will look like this:
This PR enhances the DSL compiler to allow the user to override that default behavior via a command line flag. This change is done:
in an additive way, using an optional feature flag
in a way that doesn't alter the DSL or pipeline spec interfaces
In a way that doesn't change the default behavior for those who prefer the default of caching enabled
After this change, this command will continue to result in the default behavior:
Whereas this command will cause tasks to default to caching disabled if no desired caching behavior is specified in code:
For example, this pipeline compiled with
--disable-execution-caching-by-default
would then result in both tasks having caching disabled:
even though nothing was specified about task_2 in the code.
Usage
CLI flag for dsl compile.
A new flag
--disable-execution-caching-by-default
is added. When set, this flag disables caching for all pipeline tasks by default.Example:
Environment Variable
KFP_DISABLE_EXECUTION_CACHING_BY_DEFAULT
.You can also set the default caching behavior by using the environment variable. When set to true, 1, or other truthy values, it will disable execution caching by default for all pipelines. When set to false or when absent, the default of caching enabled remains.
Example:
This environment variable also works for
Compiler().compile()
.Given the following pipeline file my_pipeline.py:
Executing this
will result in task_2 having caching disabled.
This PR is made in collaboration with @gregsheremeta. Thank you, Greg for your contributions!
Checklist: