Skip to content

RHOAIENG-9499: docs(examples): begin drafting the Dockerfile and suitable README.md #1152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
148 changes: 148 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
# Examples

## JupyterLab with Elyra

This Workbench image installs JupyterLab and the ODH-Elyra extension.

The main difference between the [upstream Elyra](https://github.com/elyra-ai/elyra) and the [ODH-Elyra fork](https://github.com/opendatahub-io/elyra) is that the fork implements Argo Pipelines support, which is required for executing pipelines in OpenDataHub/OpenShift AI.
Specifically, the fork integrates [PR#](https://github.com/elyra-ai/elyra/pull/3273) that is yet to be merged upstream.

### Design

The workbench is based on a Source-to-Image (S2I) UBI9 Python 3.11 image.
This means, besides having Python 3.11 installed, that it also has the following
* HOME directory is set to /opt/app-root/src
* port 8888 is exposed by default

These characteristics are required for OpenDataHub workbenches to function.

#### Integration with OpenDataHub Notebook Controller and Notebook Dashboard

#### OpenDataHub Dashboard

Dashboard automatically populates an environment variable named `NOTEBOOK_ARGS`.
This variable contains configurations that are necessary to integrate with Dashboard regarding launching the Workbench and logging off.

Reference: https://github.com/opendatahub-io/odh-dashboard/blob/95d80a0cccd5053dc0ca372effcdcd8183a0d5b8/frontend/src/api/k8s/notebooks.ts#L143-L149

Furthermore, when configuring a workbench, the default Persistent Volume Claim (PVC) is created and volume is mounted at `/opt/app-root/src` in the workbench container.
This means that changing the user's HOME directory from the expected default is inadvisable.

##### OpenDataHub Notebook Controller

During the Notebook Custom Resource creation, the mutating webhook in Notebook Controller is triggered.
This webhook is responsible for configuring OAuth Proxy, certificate bundles, pipeline runtime, runtime images, and maybe more.
It also creates a service and OpenShift route to make the Workbench reachable from the outside of the cluster.

**OAuth Proxy** is configured to connect to port 8888 of the workbench container (discussed above) and listen for incoming connections on port 8443.

Reference: https://github.com/opendatahub-io/kubeflow/blob/eacf63cdaed4db766a6503aa413e388e1d2721ef/components/odh-notebook-controller/controllers/notebook_webhook.go#L114-L121

**Certificate bundles** are added as a file-mounted configmap at `/etc/pki/tls/custom-certs/ca-bundle.crt`.
This is a nonstandard location, so it is necessary to also add environment variables that instruct various software to reference this bundle during operation.

Reference:
* https://github.com/opendatahub-io/kubeflow/blob/eacf63cdaed4db766a6503aa413e388e1d2721ef/components/odh-notebook-controller/controllers/notebook_webhook.go#L598
* https://github.com/opendatahub-io/kubeflow/blob/eacf63cdaed4db766a6503aa413e388e1d2721ef/components/odh-notebook-controller/controllers/notebook_webhook.go#L601-L607

**Pipeline runtime configuration** is obtained from a Data Science Pipeline Application (DSPA) CR.
The DSPA CR is first located in the same project where the workbench is being started, a secret with the connection data is created, and then this secret is mounted.
The secret is mounted under `/opt/app-root/runtimes/`.

Reference: https://github.com/opendatahub-io/kubeflow/blob/eacf63cdaed4db766a6503aa413e388e1d2721ef/components/odh-notebook-controller/controllers/notebook_dspa_secret.go#L42C28-L42C50

IMPORTANT: the `setup-elyra.sh` script in this repo relies on this location.

**Runtime images** are processed very similarly to the DSPA configuration.
First, image stream resources are examined, and then a configmap is created, and mounted to every newly started workbench.
The mount location is under `/opt/app-root/pipeline-runtimes/`.

Reference: https://github.com/opendatahub-io/kubeflow/blob/eacf63cdaed4db766a6503aa413e388e1d2721ef/components/odh-notebook-controller/controllers/notebook_runtime.go#L25C19-L25C51

IMPORTANT: the `setup-elyra.sh` script in this repo again relies on this location.

### Build

```shell
podman build -f examples/jupyterlab-with-elyra/Dockerfile -t quay.io/your-username/jupyterlab-with-elyra:latest .
podman push quay.io/your-username/jupyterlab-with-elyra:latest
```

### Deploy

Open the `Settings > Workbench images` page in OpenDataHub Dashboard.
Click on the `Import new image` button and add the image you have just pushed.
The `Image location` field should be set to `quay.io/your-username/jupyterlab-with-elyra:latest`, or wherever the image is pushed and available for the cluster to pull.
Values of other fields do not matter for functionality, but they let you keep better track of previously imported images.

There is a special ODH Dashboard feature that alerts you when you are using a workbench image that lists the `elyra` instead of `odh-elyra` package.
This code will have to be updated when `elyra` also gains support for Argo Pipelines, but for now it does the job.

Reference: https://github.com/opendatahub-io/odh-dashboard/blob/2ced77737a1b1fc24b94acac41245da8b29468a4/frontend/src/concepts/pipelines/elyra/utils.ts#L152-L162

## Image Streams
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Available workbench images are represented by OpenShift ImageStreams stored either in the notebook-controller's own namespace
(defaults to `opendatahub` on ODH and `redhat-ods-applications` in RHOAI)
or, starting with RHOAI 2.22, in the datascience project namespace.

There is a system of one label and multiple annotations that can be added to image streams which will influence how the image is displayed in the Dashboard.

### Example image stream

```yaml
apiVersion: image.openshift.io/v1
kind: ImageStream
metadata:
labels:
opendatahub.io/notebook-image: "true"
annotations:
opendatahub.io/notebook-image-name: "Jupyter Data Science"
opendatahub.io/notebook-image-desc: "Jupyter notebook image with a set of data science libraries that advanced AI/ML notebooks will use as a base image to provide a standard for libraries avialable in all notebooks"
name: jupyter-datascience-notebook
spec:
tags:
- annotations:
# language=json
opendatahub.io/notebook-software: |
[
{"name": "Python", "version": "v3.11"},
{ ... }
]
# language=json
opendatahub.io/notebook-python-dependencies: |
[
{"name": "JupyterLab","version": "4.2"},
{ ... }
]
opendatahub.io/workbench-image-recommended: 'true'
opendatahub.io/image-tag-outdated: 'false'
opendatahub.io/notebook-build-commit: 947dea7
from:
kind: DockerImage
name: quay.io/opendatahub/workbench-images@sha256:57d8e32ac014dc39d1912577e2decff1b10bb2f06f4293c963e687687a580b05
name: "2025.1"
referencePolicy:
type: Source
```

**opendatahub.io/notebook-image**: determines whether the image stream will be shown in the workbenches list or not

**opendatahub.io/notebook-image-name**: the name of the image that will be shown in the workbenches list

**opendatahub.io/notebook-image-desc**: the description of the image that will be shown in the workbenches list

**opendatahub.io/notebook-software**: a JSON-formatted list of software that is installed in the image. This is used to display the software in the workbench image details.

**opendatahub.io/notebook-python-dependencies**: a JSON-formatted list of Python dependencies that are installed in the image. This is used to display the Python dependencies in the workbench image details.

**opendatahub.io/workbench-image-recommended**: determines whether the image stream will be marked as the `Recommended` image in the workbenches list or not. Only one image tag can be marked as `Recommended`.

**opendatahub.io/image-tag-outdated**: determines whether the image stream will be hidden from the list of available image versions in the workbench spawner dialog. Workbenches that were previously started with this image will continue to function.

**opendatahub.io/notebook-build-commit**: the commit hash of the notebook image build that was used to create the image. This is shown in Dashboard webui starting with RHOAI 2.22.

Some of these annotations cannot be configured in the Dashboard Settings webui.
For the label there is a toggle, name and description can be edited as well.
For the software versions there is also suitable interface.
Recommended, outdated, and build commit cannot be edited there, though.
89 changes: 89 additions & 0 deletions examples/jupyterlab-with-elyra/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
########
# base #
########

# https://catalog.redhat.com/software/containers/registry/registry.access.redhat.com/repository/ubi9/python-311
FROM registry.access.redhat.com/ubi9/python-311:latest
# Subsequent code may leverages the folloving definitions from the base image:
# ENV APP_ROOT=/opt/app-root
# ENV HOME="${APP_ROOT}/src"
# ENV PYTHON_VERSION=3.11
# ENV PYTHONUNBUFFERED=1
# ENV PYTHONIOENCODING=UTF-8
# ENV PIP_NO_CACHE_DIR=off
# ENV BASH_ENV="${APP_ROOT}/bin/activate"
# ENV ENV="${APP_ROOT}/bin/activate"

# OS packages needs to be installed as root
USER root

# Install useful OS packages
RUN dnf install -y mesa-libGL skopeo && dnf clean all && rm -rf /var/cache/yum

# Other apps and tools shall be installed as default user
# - Kuberneres requires using numeric IDs for the final USER command
# - Openshift's SCC restricted-v2 policy runs images under random UID and GID of 0
USER 1001:0
WORKDIR /opt/app-root

ARG JUPYTER_REUSABLE_UTILS=jupyter/utils
ARG MINIMAL_SOURCE_CODE=jupyter/minimal/ubi9-python-3.11
ARG DATASCIENCE_SOURCE_CODE=jupyter/datascience/ubi9-python-3.11

# Emplace and activate our entrypoint script
COPY ${MINIMAL_SOURCE_CODE}/start-notebook.sh /opt/app-root/bin/
ENTRYPOINT ["/opt/app-root/bin/start-notebook.sh"]
# Copy JupyterLab config from utils directory
COPY ${JUPYTER_REUSABLE_UTILS} /opt/app-root/bin/utils/
# Copy Elyra setup script and various utils where start-notebook.sh expects it
COPY ${DATASCIENCE_SOURCE_CODE}/setup-elyra.sh ${DATASCIENCE_SOURCE_CODE}/utils /opt/app-root/bin/utils/
Comment on lines +29 to +39
Copy link
Member Author

@jiridanek jiridanek Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the build relies heavily on the scripts start-notebook.sh and setup-elyra.sh elsewhere in this repository; I don't want to make a copy; customers probably should just make a copy


# Install Python packages and Jupyterlab extensions
# https://www.docker.com/blog/introduction-to-heredocs-in-dockerfiles/

COPY <<EOF requirements.txt
--index-url https://pypi.org/simple

# JupyterLab
jupyterlab==4.2.7
jupyter-bokeh~=4.0.5
jupyter-server~=2.15.0
jupyter-server-proxy~=4.4.0
jupyter-server-terminals~=0.5.3
jupyterlab-git~=0.50.1
jupyterlab-lsp~=5.1.0
jupyterlab-widgets~=3.0.13
jupyter-resource-usage~=1.1.1
nbdime~=4.0.2
nbgitpuller~=1.2.2
Comment on lines +49 to +58
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a lot of packages, some can be removed to make the important ones stand out better


# Elyra
odh-elyra==4.2.1
kfp~=2.12.1
Comment on lines +61 to +62
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are important


# Miscellaneous datascience packages
matplotlib~=3.10.1
numpy~=2.2.3
# ...
Comment on lines +65 to +67
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to show that customers are free to add whatever additional python packages they want to use in their workbench

EOF

RUN echo "Installing software and packages" && \
pip install -r requirements.txt && \
rm -f ./Pipfile.lock && \
# Prepare directories for elyra runtime configuration
mkdir /opt/app-root/runtimes && \
mkdir /opt/app-root/pipeline-runtimes && \
# Remove default Elyra runtime-images
rm /opt/app-root/share/jupyter/metadata/runtime-images/*.json && \
# Replace Notebook's launcher, "(ipykernel)" with Python's version 3.x.y
sed -i -e "s/Python.*/$(python --version | cut -d '.' -f-2)\",/" /opt/app-root/share/jupyter/kernels/python3/kernel.json && \
# Copy jupyter configuration
cp /opt/app-root/bin/utils/jupyter_server_config.py /opt/app-root/etc/jupyter && \
# Disable announcement plugin of jupyterlab
jupyter labextension disable "@jupyterlab/apputils-extension:announcements" && \
# Fix permissions to support pip in Openshift environments
chmod -R g+w /opt/app-root/lib/python3.11/site-packages && \
fix-permissions /opt/app-root -P

# Switch dir to $HOME
WORKDIR /opt/app-root/src
Loading