Skip to content

Launch PyTorch functions onto multiple machines & GPUs

License

Notifications You must be signed in to change notification settings

apoorvkh/torchrunx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

torchrunx 🔥

PyPI - Python Version PyPI - Version Tests Docs GitHub License

Automatically launch functions and initialize distributed PyTorch environments on multiple machines

Installation

pip install torchrunx

Requirements:

  • Operating System: Linux
  • Python >= 3.8.1
  • PyTorch >= 2.0
  • Shared filesystem & passwordless SSH between hosts

Usage

# Simple example
def distributed_function():
    pass
import torchrunx as trx

trx.launch(
    func=distributed_function,
    func_kwargs={},
    hostnames=["node1", "node2"],  # or just: ["localhost"]
    workers_per_host=2
)

In a SLURM allocation

trx.launch(
    # ...
    hostnames=trx.slurm_hosts(),
    workers_per_host=trx.slurm_workers()
)

Compared to other tools

Contributing

We use the pixi package manager. Simply install pixi and run pixi shell in this repository. We use ruff for linting and formatting, pyright for static type checking, and pytest for testing. We build for PyPI and conda-forge. Our release pipeline is powered by Github Actions.