Skip to content

Commit

Permalink
Getting started docs (#634)
Browse files Browse the repository at this point in the history
- [x] Fixes #627 
- [x] ~Tests added~ docs only
- [x] Documentation/examples added
- [x] [Good commit messages](https://cbea.ms/git-commit/) and/or PR
title

**Description of PR**
See #627. This PR gets users off the ground. Future PRs to cover
features like DAGs, Artifacts etc

---------

Signed-off-by: Elliot Gunton <egunton@bloomberg.net>
  • Loading branch information
elliotgunton authored May 25, 2023
1 parent 36aeef4 commit 373c8f2
Show file tree
Hide file tree
Showing 7 changed files with 359 additions and 88 deletions.
84 changes: 84 additions & 0 deletions docs/getting-started/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Introduction

Hera is a Python library that allows you to construct and submit Argo Workflows. It is designed to be intuitive and easy
to use, while also providing a powerful interface to the underlying Argo API.

## Hera V5 vs V4

Hera v5 is a major release that introduces breaking changes from v4. The main reason for this is that v5 is a complete
rewrite of the library, and is now based on the OpenAPI specification of Argo Workflows. This allows us to provide a
more intuitive interface to the Argo API, while also providing full feature parity with Argo Workflows. This means that
you can now use all the features of Argo Workflows in your workflows. Additionally, it has been re-structured to
accommodate other Argo projects, such as Argo Events and Argo CD. Currently only Argo Workflows is supported, and there
is some work in progress to add support for Argo Events.

The codebase is now much more readable, and the focus can be fully dedicated to improving the Python interface to
various Argo projects rather than maintaining feature parity with the Argo codebase. The library is divided into the
following components:

- `hera.shared` - This package contains the shared code that will be used by all Argo projects. This includes common
global configuration to interact with the Argo API, and common Pydantic base models that are used by all Argo
projects.

- `hera.events.models` - This package contains the auto-generated code that allows you to construct Argo Events. It
provides Pydantic models for all the Argo Events OpenAPI objects, and allows you to construct events using these
models. These models are based on the OpenAPI specification, and are therefore exactly the same as the models used by
Argo Events.

- `hera.workflows.models` - This package contains the auto-generated code that allows you to construct Argo Workflows.
It provides Pydantic models for all the Argo Workflows OpenAPI objects, and allows you to construct workflows using
these models. These models are based on the OpenAPI specification, and are therefore exactly the same as the models
used by Argo Workflows.

- `hera.workflows` - This package contains the hand-written code that allows you to construct and submit Argo Workflows.
It wraps the auto-generated code, and provides a more intuitive interface to the Argo API. It also provides a number
of useful features, such as the ability to submit workflows from a Python function. This package has various extension
points that allow you to plug-in the auto-generated models in case you need to use a feature that is not yet supported
by the hand-written code.

The major differences between v4 and v5 are:

- The `hera.workflows.models` package is now auto-generated, and is based on the OpenAPI specification of Argo
Workflows. This means that all the models are exactly the same as the models used by Argo Workflows, and you can use
all the features of Argo Workflows in your workflows written with `hera`.

- The auto-generated models are based on Pydantic, which means that you can use all the features of Pydantic to
construct your workflows. This includes better type-checking, auto-completion in IDEs and more.

- All template types are now supported. This means that you can now use all the template types that are supported by
Argo Workflows, such as DAGs, Steps, Suspend and more. Previously, only the DAG template type was supported.

- The hand-written code has been rewritten to be extensible. This means that you can now easily extend the library to
support new features, or to support features that are not yet supported by the hand-written code. This is done by
using the `hera.workflows.models` package, and plugging it into the `hera.workflows` package.

The following example shows how to use the DAG template type.

```python
from hera.workflows import (
DAG,
Workflow,
script,
)


# Notice that we are using the script decorator to define the function.
# This is required in order to use the function as a template.
# The decorator also allows us to define the image that will be used to run the function and
# other parameters that are specific to the Script template type.
@script(add_cwd_to_sys_path=False, image="python:alpine3.6")
def say(message):
print(message)


with Workflow(generate_name="dag-diamond-", entrypoint="diamond") as w:
# Note that we need to explicitly specify the DAG template type.
with DAG(name="diamond"):
# We can now use the decorated function as tasks in the DAG.
A = say(name="A", arguments={"message": "A"})
B = say(name="B", arguments={"message": "B"})
C = say(name="C", arguments={"message": "C"})
D = say(name="D", arguments={"message": "D"})
# We can use the `>>` or `.next()` operators to define dependencies between tasks.
A >> [B, C] >> D
```
92 changes: 92 additions & 0 deletions docs/getting-started/quick-start.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Quick Start

## Install Argo tools

Ensure you have a Kubernetes cluster, kubectl and Argo Workflows installed by following the
[Argo Workflows Quick Start](https://argoproj.github.io/argo-workflows/quick-start/).

Ensure you are able to submit a workflow to Argo as in the example:

```console
argo submit -n argo --watch https://github.com/argoproj/argo-workflows/master/examples/hello-world.yaml
```

## Install Hera

[![Pypi](https://img.shields.io/pypi/v/hera.svg)](https://pypi.python.org/pypi/hera)

Hera is available on PyPi as the `hera` package. Add this dependency to your project in your usual way, e.g. pip or
poetry, or install directly with `pip install hera`.

## Hello World

If you were able to run the `argo submit` command above, copy the following Workflow definition into a local file
`hello_world.py`.

```py
from hera.workflows import Steps, Workflow, WorkflowsService, script


@script()
def echo(message: str):
print(message)


with Workflow(
generate_name="hello-world-",
entrypoint="steps",
namespace="argo",
workflows_service=WorkflowsService(host="https://localhost:2746")
) as w:
with Steps(name="steps"):
echo(arguments={"message": "Hello world!"})

w.create()
```

Run the file

```console
python -m hello_world
```

You will then see the Workflow at <https://localhost:2746/>

## Hello World on an existing Argo installation

If you or your organization are already running on Argo and you're interested in using Hera to write your Workflow
definitions, you will need to set up some config variables in `hera.shared.global_config`. Copy the following as a basis
and fill in the blanks.

```py
from hera.workflows import Steps, Workflow, script
from hera.shared import global_config

global_config.host = "https://<your-host-name>"
global_config.token = "" # Copy token value after "Bearer" from the `argo auth token` command
global_config.image = "<your-image-repository>/python:3.8" # set the image if you cannot access "python:3.8" via Docker Hub


@script()
def echo(message: str):
print(message)


with Workflow(
generate_name="hello-world-",
entrypoint="steps",
namespace="argo",
) as w:
with Steps(name="steps"):
echo(arguments={"message": "Hello world!"})

w.create()
```

Run the file

```console
python -m hello_world
```

You will then see the Workflow at https://\<your-host-name>
49 changes: 49 additions & 0 deletions docs/getting-started/walk-through/about-hera.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# About

Hera is a Python library that allows you to construct and submit Argo Workflows. It is designed to be intuitive and easy
to use, while also providing a powerful interface to the underlying Argo API.

Hera acts as a domain-specific-language on top of Argo, so it is primarily a way to define Workflows. In previous Argo
Workflows surveys such as [2021](https://blog.argoproj.io/argo-workflows-2021-survey-results-d6fa890030ee), a better
Python DSL has been highly requested to overcome the YAML barrier to adoption. In the
[2022 survey results](https://blog.argoproj.io/cncf-argo-project-2022-user-survey-results-f9caf46df7fd#:~:text=Job%20Roles%20%26%20Use%20Cases)
we can infer from the job roles for people using Argo Workflows that the DevOps Engineers are likely more comfortable
using YAML than ML Engineers.

> DevOps Engineers: 41%
> Software Engineer: 20%
> Architects: 20%
> Data Engineer / Data Scientist / ML Engineer: 13%
We hope by providing a more intuitive Python definition language, Data and ML users of Argo Workflows will increase.

## Feature Parity

A natural concern about an abstraction layer on top of another technology is whether it can function the same as the
original lower layer. In this case, Hera generates a [library of model classes](../../api/workflows/models.md) using
Argo's OpenAPI spec which are wrapped up by Hera's feature-rich classes, while the model classes are available as a
fallback mechanism. You can check out the extensive
["upstream" examples](../../examples/workflows/upstream/dag_diamond.md) that contain side-by-side Python and YAML
definitions for Workflows in
[the Argo examples folder on GitHub](https://github.com/argoproj/argo-workflows/tree/master/examples). Our CI/CD runs
through the Argo examples folder to check that we are able to reproduce them using Hera Workflows written by hand (note:
we have not _yet_ written Hera Workflows for all the examples).

If you are a new user of Argo, we encourage you to become familiar with
[Argo's Core Concepts](https://argoproj.github.io/argo-workflows/workflow-concepts/), which provide a foundation of
understanding when working with Hera. Working through the
[Argo Walk Through](https://argoproj.github.io/argo-workflows/walk-through/) will also help you understand key concepts
before moving to Python.

## Context Managers

You will notice many classes in Hera implement the context manager interface. This was designed to mirror the YAML
syntax of Argo, helping existing users come to Hera from YAML, and for users new to both Argo and Hera, who will be able
to interpret and understand most of the existing YAML documentation and resources online from familiar naming and
functionality in Hera.

## Orchestrating Scripts

A natural extension of a Python DSL for Argo is tighter integration with Python scripts. This is where Hera improves the
developer experience through its tailored classes and syntactic sugar to enable developers to easily orchestrate Python
functions. Check out [Hello World](hello-world.md) to get started!
128 changes: 128 additions & 0 deletions docs/getting-started/walk-through/hello-world.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Hello World

Let's take a look at the `hello_world.py` from the [Quick Start](../quick-start.md) guide.

```py
from hera.workflows import Steps, Workflow, WorkflowsService, script


@script()
def echo(message: str):
print(message)


with Workflow(
generate_name="hello-world-",
entrypoint="steps",
namespace="argo",
workflows_service=WorkflowsService(host="https://localhost:2746")
) as w:
with Steps(name="steps"):
echo(arguments={"message": "Hello world!"})

w.create()
```

## The imports

As we are using Argo Workflows, we import specialized classes from `hera.workflows`. You will see Argo concepts from the
Argo spec have been transformed into powerful Python classes, explore them at the
[Hera Workflows API reference](../../api/workflows/hera.md).

For this Workflow, we want to echo using Python's `print` function, which is wrapped in our convenience `echo` function.
We use Hera's `script` decorator to turn the `echo` function into what's known as a
[Script template](https://argoproj.github.io/argo-workflows/workflow-concepts/#script), and is mirrored in Hera with the
`Script` class. As we're defining the Workflow in Python, Hera is able to infer multiple field values that the developer
would otherwise have to define when using YAML.

## The script decorator

The `script` decorator can take kwargs that a `Script` can take. Importantly, you can specify the `image` of Python
to use instead of the default `python:3.8` for your script if required:

```py
@script(image="python:3.11")
def echo(message: str):
print(message)
```

Alternatively, you can specify this image once via the `global_config.image` variable, and it will be used for all
`script`s automatically:

```py
from hera.shared import global_config
global_config.image = "python:3.11"

@script() # "echo" will now run using python:3.11, as will any other scripts you define
def echo(message: str):
print(message)

@script() # "echo_twice" will also run using python:3.11
def echo_twice(message: str):
print(message)
print(message)
```

## The Workflow context manager

The Workflow context manager acts as a scope under which `template` Hera objects can be declared, which include
Containers, Scripts, DAGs [and more](https://argoproj.github.io/argo-workflows/workflow-concepts/#template-types). For a
minimal example, you will need to provide your `Workflow` the initialization values as seen

```py
with Workflow(
generate_name="hello-world-",
entrypoint="steps",
namespace="argo",
workflows_service=WorkflowsService(host="https://localhost:2746")
) as w:
```

* `generate_name` is taken by Argo upon submission, where it appends a random 5 character suffix, so you may see this
Workflow run with a name like `hello-world-vmsz5`.
* `entrypoint` tells Argo which template to run upon submission.
* `namespace` refers to the Kubernetes namespace you want to submit to.
* `workflows_service` is the submission service.

## The Steps context manager

A `Steps` template is the second template type of this example, the first being the `Script`. The `Steps` template,
along with the `DAG` template, is known as a "template invocator". This is because they are used to arrange other
templates, mainly Containers and Scripts, to do the actual work. In Hera, the `Steps` class is a context manager as it
automatically arranges your templates in the order that you add them, with each template invocation known as a `Step`.

```py
with Steps(name="steps"):
```

To invoke the `echo` template, you can call it, passing values to its arguments through the `arguments` kwarg, which is
a dictionary of the _function_ kwargs to values. This is because under a `Steps` or `DAG` context manager, the `script`
decorator converts a call of the function into a `Script` object, to which you must pass `Script` initialization kwargs.

```py
echo(arguments={"message": "Hello world!"})
```

> For advanced users: the exact mechanism of the `script` decorator is to create a `Script` object when declared, so
> that when your function is invoked you have to pass its arguments through the `arguments` kwarg as a dictionary, and
> the `Script` objects `__call__` function is invoked with the `arguments` kwarg. The `__call__` function on a
> `CallableTemplateMixin` automatically creates a `Step` or a `Task` depending on whether the context manager is a
> `Steps` or a `DAG`.
## Submitting the Workflow

Finally, with the workflow defined, the actual submisson occurs on

```py
w.create()
```

This uses the `WorkflowsService` to submit to Argo using its REST API, so `w.create()` can be thought of as running
`argo submit`.

Alternatively, you may want to see what the YAML looks like for this Workflow, which can be done with a print or to a
file using `w.to_yaml()`.

```py
print(w.to_yaml())
```
28 changes: 0 additions & 28 deletions docs/hera_getting_started.md

This file was deleted.

Loading

0 comments on commit 373c8f2

Please sign in to comment.