Skip to content

PyTorch utilities including trainer, reporter, etc.

License

Notifications You must be signed in to change notification settings

Fragile-azalea/homura

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

homura document

homura is a library for fast prototyping DL research.

🔥🔥🔥🔥 homura (焰) is flame or blaze in Japanese. 🔥🔥🔥🔥

Notice: homura v2019.11+ introduces backward-incompatible changes

For older versions, install as pip install git+https://github.com/moskomule/homura@v0.7 etc.

Requirements

minimal requirements

Python>=3.8
PyTorch>=1.5.0
torchvision>=0.6.0
tqdm # automatically installed
tensorboard # automatically installed
hydra-core # automatically installed

optional

colorlog (to log with colors)
faiss (for faster kNN)
accimage (for faster image pre-processing)
horovad (for easier distributed training)
cupy

If horovod is available, homura tries to use it for distributed training. To disable horovod and use pytorch.distributed instead, set HOMURA_DISABLE_HOROVOD=1.

test

pytest .

Installation

pip install git+https://github.com/moskomule/homura

or

git clone https://github.com/moskomule/homura
cd homura
pip install -e .

horovod installation

conda install gxx_linux-64
pip install horovod

APIs

Basics

homura aims abstract (e.g., device-agnostic) simple prototyping.

from homura import optim, lr_scheduler
from homura import trainers, callbacks, reporters
from torchvision.models import resnet50
from torch.nn import functional as F

# User does not need to care about the device
resnet = resnet50()
# Model is registered in optimizer lazily. This is convenient for distributed training and other complicated scenes.
optimizer = optim.SGD(lr=0.1, momentum=0.9)
scheduler = lr_scheduler.MultiStepLR(milestones=[30,80], gamma=0.1)

# `homura` has callbacks
c = [callbacks.AccuracyCallback(),
    reporters.TensorboardReporter(".")]
with trainers.SupervisedTrainer(resnet, optimizer, loss_f=F.cross_entropy, 
                                     callbacks=c, scheduler=scheduler) as trainer:
    # epoch-based training
    for _ in range(epochs):
        trainer.train(train_loader)
        trainer.test(test_loader)

    # otherwise, iteration-based training

    trainer.run(train_loader, test_loader, 
                total_iterations=1_000, val_intervals=10)

User can customize iteration of trainer as follows.

from homura.trainers import TrainerBase, SupervisedTrainer
from homura.utils.containers import TensorMap

trainer = SupervisedTrainer(...)

def iteration(trainer: TrainerBase, 
              data: Tuple[torch.Tensor]) -> Mapping[torch.Tensor]:
    input, target = data
    output = trainer.model(input)
    loss = trainer.loss_f(output, target)
    results = Map(loss=loss, output=output)
    if trainer.is_train:
        trainer.optimizer.zero_grad()
        loss.backward()
        trainer.optimizer.step()
    # iteration returns at least (loss, output)
    # registered value can be called in callbacks
    results.user_value = user_value
    return results

SupervisedTrainer.iteration = iteration
# or   
trainer.update_iteration(iteration) 

callbacks.Callback can access the parameters of models, loss, outputs of models and other user-defined values.

In most cases, callbacks.metric_callback_decorator is useful. The returned values are accumulated.

from homura import callbacks

@callbacks.metric_callback_decorator
def user_value(data):
    return data["user_value"]

callbacks.Callback has methods before_all, before_iteration, before_epoch, after_all, after_iteration and after_epoch. For example, callbacks.WeightSave is like:

from homura.callbacks import Callback
class WeightSave(Callback):
    ...

    def after_epoch(self, data: Mapping):
        self._epoch = data["epoch"]
        self._step = data["step"]
        if self.save_freq > 0 and data["epoch"] % self.save_freq == 0:
            self.save(data, f"{data['epoch']}.pkl")

    def after_all(self, data: Mapping):
        if self.save_freq == -1:
            self.save(data, "weight.pkl")

dict of models, optimizers, loss functions are supported.

trainer = CustomTrainer({"generator": generator, "discriminator": discriminator},
                        {"generator": gen_opt, "discriminator": dis_opt},
                        {"reconstruction": recon_loss, "generator": gen_loss},
                        **kwargs)

Distributed training

Easy distributed initializer homura.init_distributed() is available. See imagenet.py as an example.

Reproducibility

This method makes randomness deterministic in its context.

from homura.utils.reproducibility import set_deterministic, set_seed
with set_deterministic(seed):
    something()

with set_seed(seed):
    other_thing()

Examples

See examples.

  • cifar10.py: training ResNet-20 or WideResNet-28-10 with random crop on CIFAR10
  • imagenet.py: training a CNN on ImageNet on multi GPUs (single and multi process)

For imagenet.py, if you want

  • single node single gpu
  • single node multi gpus

run python imagenet.py root=/path/to/imagenet/root.

If you want

  • single node multi threads multi gpus

run python -m torch.distributed.launch --nproc_per_node=$NUM_GPUS imagenet.py root=/path/to/imagenet/root distributed.on=true.

If you want

  • multi nodes multi threads multi gpus,

run

  • python -m torch.distributed.launch --nnodes=$NUM_NODES --node_rank=0 --master_addr=$MASTER_IP --master_port=$MASTER_PORT --nproc_per_node=$NUM_GPUS imagenet.py root=/path/to/imagenet/root distributed.on=true on the master node
  • python -m torch.distributed.launch --nnodes=$NUM_NODES --node_rank=$RANK --master_addr=$MASTER_IP --master_port=$MASTER_PORT --nproc_per_node=$NUM_GPUS imagenet.py root=s/path/to/imagenet/root distributed.on=true on the other nodes

Here, 0<$RANK<$NUM_NODES.

Citing

@misc{homura,
    author = {Ryuichiro Hataya},
    title = {homura},
    year = {2018},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/moskomule/homura}},
}

About

PyTorch utilities including trainer, reporter, etc.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%