Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kfp multi jobs #142

Merged
merged 11 commits into from
May 20, 2024
Merged

kfp multi jobs #142

merged 11 commits into from
May 20, 2024

Conversation

blublinsky
Copy link
Collaborator

Why are these changes needed?

Support KFP with multiple submissions

Related issue number (if any).

@blublinsky blublinsky requested review from roytman, daw3rd, Mohammad-nassar10 and revit13 and removed request for daw3rd May 17, 2024 10:29
@blublinsky
Copy link
Collaborator Author

There is one example - noop-multiple_wf for testing

generated by the transform workflow)
:param exec_script_name: script to run (has to be present in the image)
:param server_url: API server url
:return: None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to return RayJobID. it will be used link metrics.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain how. It's not that I do not want it. Just trying to understand. As far as I know, metrics are using complete cluster, not specific job id.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are going to discuss it tomorrow. Indeed, metrics use a complete cluster and store the results with Takton Job ID, but execution stats are stored with Ray Job ID. Therefore, we cannot match them.

kfp/kfp_ray_components/src/execute_multiple_ray_jobs.py Outdated Show resolved Hide resolved
generated by the transform workflow)
:param exec_script_name: script to run (has to be present in the image)
:param server_url: API server url
:return: None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are going to discuss it tomorrow. Indeed, metrics use a complete cluster and store the results with Takton Job ID, but execution stats are stored with Ray Job ID. Therefore, we cannot match them.

@blublinsky
Copy link
Collaborator Author

blublinsky commented May 19, 2024

Simplified code, unified single and multiple. Hopefully this answers most of the comments

@blublinsky blublinsky merged commit 13a38b5 into dev May 20, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants