You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 19, 2022. It is now read-only.
First, I'm confused why the example models implement their own DistributedDataParallel module. Why not use the torch one?
DistributedDataParallel is used in the examples:
https://github.com/kubeflow/pytorch-operator/blob/master/examples/ddp/mnist/cpu/mnist_ddp_cpu.py#L154
The DistributedDataParallel documentation states:
However, a DIY
average_gradients
function is used in the same example as well: https://github.com/kubeflow/pytorch-operator/blob/master/examples/ddp/mnist/cpu/mnist_ddp_cpu.py#L168Double trouble?
The text was updated successfully, but these errors were encountered: