-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kfp multi jobs #142
kfp multi jobs #142
Conversation
There is one example - noop-multiple_wf for testing |
kfp/kfp_ray_components/executeMultipleRayJobComponent_multi_s3.yaml
Outdated
Show resolved
Hide resolved
kfp/kfp_support_lib/src/kfp_support/workflow_support/utils/kfp_utils.py
Outdated
Show resolved
Hide resolved
kfp/kfp_support_lib/src/kfp_support/workflow_support/utils/pipeline_utils.py
Outdated
Show resolved
Hide resolved
kfp/kfp_support_lib/src/kfp_support/workflow_support/utils/remote_jobs_utils.py
Outdated
Show resolved
Hide resolved
generated by the transform workflow) | ||
:param exec_script_name: script to run (has to be present in the image) | ||
:param server_url: API server url | ||
:return: None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need to return RayJobID. it will be used link metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain how. It's not that I do not want it. Just trying to understand. As far as I know, metrics are using complete cluster, not specific job id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are going to discuss it tomorrow. Indeed, metrics use a complete cluster and store the results with Takton Job ID, but execution stats are stored with Ray Job ID. Therefore, we cannot match them.
kfp/kfp_support_lib/src/kfp_support/workflow_support/utils/remote_jobs_utils.py
Show resolved
Hide resolved
generated by the transform workflow) | ||
:param exec_script_name: script to run (has to be present in the image) | ||
:param server_url: API server url | ||
:return: None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are going to discuss it tomorrow. Indeed, metrics use a complete cluster and store the results with Takton Job ID, but execution stats are stored with Ray Job ID. Therefore, we cannot match them.
kfp/kfp_support_lib/src/kfp_support/workflow_support/utils/remote_jobs_utils.py
Show resolved
Hide resolved
kfp/kfp_support_lib/src/kfp_support/workflow_support/utils/remote_jobs_utils.py
Show resolved
Hide resolved
kfp/kfp_support_lib/src/kfp_support/workflow_support/utils/remote_jobs_utils.py
Outdated
Show resolved
Hide resolved
Simplified code, unified single and multiple. Hopefully this answers most of the comments |
Why are these changes needed?
Support KFP with multiple submissions
Related issue number (if any).