Incorrect accuracy calculation with DistributedDataParallel - examples/imagenet/main.py

The `accuracy()` function [here](https://github.com/pytorch/examples/blob/master/imagenet/main.py#L411) divides number of correct samples by the batch size. Then, the [top1 accuracy is updated](https://github.com/pytorch/examples/blob/master/imagenet/main.py#L340) with `AverageMeter`, and the `top1.avg` is returned. This is incorrect, since the input to the `top1` AverageMeter is accuracy, and `n=images.size(0)`. Essentially, the number of correct samples is divided by the batch size **twice**. 

Furthermore, the `validate` function will return different values for each process when using DDP. The results should be combined across all GPUs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorrect accuracy calculation with DistributedDataParallel - examples/imagenet/main.py #905

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect accuracy calculation with DistributedDataParallel - examples/imagenet/main.py #905

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions