Automatically launch functions and initialize distributed PyTorch environments on multiple machines
pip install torchrunx
Requirements:
- Operating System: Linux
- Python >= 3.8.1
- PyTorch >= 2.0
- Shared filesystem & passwordless SSH between hosts
# Simple example
def distributed_function():
pass
import torchrunx as trx
trx.launch(
func=distributed_function,
func_kwargs={},
hostnames=["node1", "node2"], # or just: ["localhost"]
workers_per_host=2
)
trx.launch(
# ...
hostnames=trx.slurm_hosts(),
workers_per_host=trx.slurm_workers()
)
We use the pixi
package manager. Simply install pixi
and run pixi shell
in this repository. We use ruff
for linting and formatting, pyright
for static type checking, and pytest
for testing. We build for PyPI
and conda-forge
. Our release pipeline is powered by Github Actions.