Skip to content

Commit

Permalink
Add FAQ for enroot/pyxis compatibility issue with multi-arch images
Browse files Browse the repository at this point in the history
  • Loading branch information
yhtang committed Oct 2, 2023
1 parent 11c0615 commit 7fa347a
Showing 1 changed file with 27 additions and 7 deletions.
34 changes: 27 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,18 +161,38 @@ The [JAX image](ghcr.io/nvidia/jax) is embedded with the following flags and env

## FAQ (Frequently Asked Questions)

<details>
<summary>`bus error` when running JAX in a docker container</summary>

Question: A "bus error"
-----------------------
**Solution:**
```bash
docker run -it --shm-size=1g ...
```

**Q:** When I execute my JAX code, I come across a `bus error`. How can I address this issue?
**Explanation:**
The `bus error` might occur due to the size limitation of `/dev/shm`. You can address this by increasing the shared memory size using
the `--shm-size` option when launching your container.
</details>

**A:** The `bus error` might occur due to the size limitation of `/dev/shm`. You can address this by increasing the shared memory size using
the `--shm-size` option when launching your container. Here is a demonstration of how this can be achieved using Docker:
<details>

```bash
docker run -it --shm-size=1g ...
<summary>enroot/pyxis reports error code 404 when importing multi-arch images</summary>

**Problem description:**
```
slurmstepd: error: pyxis: [INFO] Authentication succeeded
slurmstepd: error: pyxis: [INFO] Fetching image manifest list
slurmstepd: error: pyxis: [INFO] Fetching image manifest
slurmstepd: error: pyxis: [ERROR] URL https://ghcr.io/v2/nvidia/jax/manifests/<TAG> returned error code: 404 Not Found
```

**Solution:**
Upgrade enroot or [apply a single-file patch](https://github.com/NVIDIA/enroot/releases/tag/v3.4.0) as mentioned in the enroot v3.4.0 release note.

**Explanation:**
Docker has traditionally used Docker Schema V2.2 for multi-arch manifest lists but has switched to using the Open Container Initiative (OCI) format since 20.10. Enroot added support for OCI format in version 3.4.0.
</details>

## JAX on Public Clouds

* AWS
Expand Down

0 comments on commit 7fa347a

Please sign in to comment.