Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent subworkflow #1167

Closed
BoPeng opened this issue Jan 9, 2019 · 8 comments
Closed

Concurrent subworkflow #1167

BoPeng opened this issue Jan 9, 2019 · 8 comments

Comments

@BoPeng
Copy link
Contributor

BoPeng commented Jan 9, 2019

Currently substeps are not executed in parallel if it contains subworkflows

[A]
parameter: idx=0
print(f'this is subworkflow {idx}')
import time
#time.sleep(10)

[default]
input: for_each=dict(i=range(4))
sos_run('A', idx=i)

The problem is that we use substep workers to execute substeps. If substep workers are allowed to execute subworkflow, then substep worker needs to handle workflow and step execution logic, then another layer of substep worker. If substep workers passes subworkflows to the step worker, then the substep worker has to wait for the result from step worker, which will cause a lot of inactive processes.

@BoPeng
Copy link
Contributor Author

BoPeng commented Jan 9, 2019

OK, the best we can do is this,

  1. If sos_run is the last and only statement in the substep, and concurrent is True (default)
  2. the step executor would use threads to submit the workflows to the master, without waiting for each sub workflow to complete
    3 the step executor wait for results of all subworkflows and exit

In this way no substep executor will be involved and the subworkflows are executed in parallel in practice.

@gaow
Copy link
Member

gaow commented Jan 9, 2019

Just want to link this issue to our discussion in #1086

BoPeng pushed a commit that referenced this issue Jan 9, 2019
@BoPeng
Copy link
Contributor Author

BoPeng commented Jan 9, 2019

OK, if we know the limitations, the solution is relatively easy.

  1. When sos_run is the last statement of the substep, let sos_run returns immediately after submitting the workflow without waiting for the workflow result.
  2. In the step executor, execute the substeps sequentially. However, because sos_run does not wait for the results, they will appear to be executed concurrently.
  3. The step executor waits for the results from subworkflows after all subworkflows are submitted.

On the scheduler side, previously a step can only wait for one workflow, and now they can wait for many. There are also competing messaging issues caused by concurrent subworkflows ...

So in the end,

[10]
input: for_each=dict(i = range(5))
sos_run('A')

will be executed concurrently, so does

[10]
input: for_each=dict(i = range(5))
something_else()
sos_run('A')

but

[10]
sos_run('a')
sos_run('b')

and

[10]
sos_run('a')
something_else()

will not be executed concurrently.

@BoPeng
Copy link
Contributor Author

BoPeng commented Jan 9, 2019

Note that

[10]
sos_run('A')
sos_run('B')

will not be executed concurrently, but

input: for_each={'wf': ['A', 'B']}
sos_run(wf)

will, and technically speaking

sos_run('A')
sos_run('B')

would imply that B relies on the completion of 'A'.

I think an easy fix to this limitation would be

sos_run(['A', 'B'])

@BoPeng
Copy link
Contributor Author

BoPeng commented Jan 9, 2019

OK, sos_run now accept a list of workflow IDs and will execute the workflows concurrently. example script:

[B]
parameter: idx=2
print(f'B_{idx}')
import time
time.sleep(idx)

[A]
import time
print(f'A_{idx}')
time.sleep(idx)

[default]
input: for_each=dict(i=range(4))
sos_run(['A', 'B'], idx=i)

@BoPeng
Copy link
Contributor Author

BoPeng commented Jan 9, 2019

@gaow

Although the general case is extremely difficult to solve, I think the two cases, namely

[10]
func_before() 
sos_run(['A', 'B'])  # sos_run can be in the middle
func_after()

and

input: for_each
some_func() # sos_run has to be the last statement
sos_run()

would satisfy a majority of the use cases, and I would consider this issue resolved.

@gaow
Copy link
Member

gaow commented Jan 9, 2019

Thank you @BoPeng I confirm it does work for my workflow examples when changed to this new convention!

@BoPeng
Copy link
Contributor Author

BoPeng commented Jan 9, 2019

Good. Hopefully figure 2c is not slower than other implementations now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants