Skip to content

xuqiantong/CUDA-Winograd

Repository files navigation

Introduction

This code implements fast cuda kernels for DNN inference, especially for convolution layers / residule blocks in ResNet. Specifically, the kernels combine three parts into one piece:

  • Convolution
  • Batch Nomalization (BN + Scale)
  • Activation (ReLU)

For implementation details, please refer to the technical report included in this repo. Winograd algorithm is used for 3 * 3 convolutional kernels.

Usage

mkdir data
python data_generator.py
make
./Test 0
  • Set parameters in data_generator.py
  • Run 6 test cases with changing numbers from 0 to 5 after ./Test

Results

3 * 3 Kernels

Kernals Operations 128 / 128 256 / 256
Cudnn Gemm + BN + ReLU 214us 384us
Cudnn Winograd + BN + ReLU 95us 155us
Our Kernel Winograd + BN + ReLU 59us 117us

1 * 1 Kernels [BUGGY NUMBERS]

Kernals 512 / 128 128 / 512 1024 / 256 256 / 1024
Operations Gemm + BN + ReLU Gemm + BN Gemm + BN + ReLU Gemm + BN + ReLU
Cudnn 119us 115us 219us 214us
Our Kernel 58us 55us 186us 181us

About

Fast CUDA Kernels for ResNet Inference.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published