Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

could you share your script for producing these protos and results of each model? #1

Open
jiangxuehan opened this issue Mar 6, 2017 · 10 comments

Comments

@jiangxuehan
Copy link

No description provided.

@Tongcheng
Copy link
Owner

@jiangxuehan Hi! Actually because in my implementation of the model I can specify an entire DenseBlock (tens of transitions) as one layer, so the entire DenseBlock was manually created by prototxt, and there is no scripts for generate proto. But there are only about 10 layers if you look at each prototxt so I think it should be manually doable.
And I will update the results of each model soon.

@jiangxuehan
Copy link
Author

@Tongcheng I have run models in this repo, for k=12 and L=100, the accuracy on cifar10+ is 94.8%, but it should be 95.5% according to the paper. Looking forward your results.

@Tongcheng
Copy link
Owner

Tongcheng commented Mar 10, 2017

@jiangxuehan Thanks for pointing out! I currently have the same result, which is about 0.8% lower than torch counterpart.
This is actually a known issue: liuzhuang13/DenseNet#10 . In my caffe, based on this issue, I did one fix, which is use cudnn version of the BatchNormalization (torch has the smoothing factor for EMA estimation set to 0.1, but caffe does it an entirely different way), this makes the convergence curve (accuracy) looks similar to the one in the paper (Figure4-Right), but doesn't seem to improve final accuracy gap.
I am currently investigating the cause, and this is actually one reason the results of each model is not yet updated.

@Tongcheng
Copy link
Owner

@jiangxuehan It turns out caffe's datalayer is feeding data without permutation, now I add a flag to permute the data, which turns the accuracy to 95.2%

@jiangxuehan
Copy link
Author

@Tongcheng. Thanks for your reply. Using ImageDataLayer with shuffle option can get the same result(95.2%) with your modified DataLayer. Do you think if there are some other differences between Torch and Caffe that will affect model performance?

@Tongcheng
Copy link
Owner

Tongcheng commented Apr 23, 2017

@jiangxuehan Currently I have no definitive conclusion of the remaining 0.3% divergence, but there are several hypothesis:
(1) Source of randomness: besides different random seed, one additional source of randomness is from Convolution Algorithms, the torch version use deterministic convolution algorithms, which corresponds to (1,1,1) in cudnn convolution, however, if I use my random seed and deterministic convolution, then it is somewhat lower result (95.1%).
(2) We did some space-time tradeoff to achieve space efficiency, in particular, for BC networks' backward phase, I have to do an extra (Convolution1*1 Forward) to recompute/overwrite the space to intermediate channels, this convolution might introduce some numerical instability (combined with its BN forward) into the system, causing the overall performance slightly worse, this is an unavoidable part if we want space efficiency.
There could also be some tricky parts that I haven't seen yet, I would welcome any constructive idea that might work. Thanks!

@Tongcheng
Copy link
Owner

@jiangxuehan Also, I think my datalayer with random option should be superior than the default imageDataLayer implementation because ImageDataLayer did the shuffling on a vector of Datum, which are quite big objects but I did shuffling on index of Datum which are smaller objects. So there should be some time efficiency within my implementation.

@John1231983
Copy link

Hi, I found your explanation about different between caffe and torch implementation for BN. I guess you tried to modify the BatchNorm layer in caffe to make it is similar the torch. How much different performance by using the modification? In addition, I also want to use the modify for my Caffe version. So, I just copy/paste your batch_norm_layer.cu and replace it by my current batch_norm_layer.cu (and header file also). Is it right?

Finally, the BN often follows by a Scale layer. But in your prototxt, I did not see the Scale layer after BN, such as https://github.com/Tongcheng/DN_CaffeScript/blob/master/train_test_BCBN_C10plus.prototxt. Or Did you already integrate it together?

layer {
  name: "BatchNorm1"
  type: "BatchNorm"
  bottom: "DenseBlock1"
  top: "BatchNorm1"
  batch_norm_param {
    moving_average_fraction : 0.1
    scale_filler {
      type: "constant"
      value: 1
    }
    bias_filler {
      type: "constant"
      value: 0
    }
    engine: CUDNN
  }  
}

@Tongcheng
Copy link
Owner

Hi @John1231983 , the torch version's use cudnn version of BatchNormalization, which already includes the scale layer in the function, so in my version of caffe's modified BatchNorm, there is no need to put additional ScaleLayer behind BatchNorm. The difference is mainly the difference in smoothing factors of EMA, which means different training curve shapes by different batchNorm.

@John1231983
Copy link

Thanks for point out this. Could you tell me which file did you change for BatchNorm layer? I would like to test it in my caffe version by changing these file. I check the batch_norm_layer.cu but it is similar the current batch_norm_layer.cu of caffe, just different some log print.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants