Skip to content

Task agglomeration

Bartosz Balis edited this page Oct 5, 2020 · 9 revisions

Task agglomeration allows for submitting groups of tasks for execution. This may be particularly useful for small tasks for which job creation time is comparable to job execution time. HyperFlow provides a generic job buffering mechanism which can be used to implement task agglomeration in a particular computing infrastructure. Currently it is implemented for Kubernetes clusters as follows:

Task agglomeration is configured as follows:

workflow.config.jobAgglomerations.json:

[
  {
    "matchTask": ["mProject"],
    "size": 2,
    "timeoutMs": 3000
  },
  {
    "matchTask": ["mDiffFit"],
    "size": 6,
    "timeoutMs": 3000
  },
  {
    "matchTask": ["mBackground"],
    "size": 3,
    "timeoutMs": 3000
  }
]

This configuration sets rules for agglomeration of different types of tasks. For example, mDiffFit tasks will be gathered into groups of size six, or for 3 seconds, whichever comes first.

The following example shows the execution traces of Montage2 workflow of size 1.0 (4800+ tasks) with and without agglomeration, on a Kubernetes cluster with 8 nodes, each with 8 vCPUs. Note the drastical difference in the execution times (600 vs. 1800 seconds). Because the workflow has many very small jobs (1-2 seconds), high parallelism cannot be achieved without agglomeration because the creation of Kubernetes Pods takes a comparable time.

Clone this wiki locally