Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[Performance] Add oneDNN support for temperature parameter in Softmax #20567

Merged
merged 2 commits into from
Oct 19, 2021

Conversation

bgawrych
Copy link
Contributor

@bgawrych bgawrych commented Sep 3, 2021

Description

Benchmarked on CLX8280 (single socket - 28 cores):

Command:

KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0 OMP_NUM_THREADS=28 numactl --physcpubind=0-27 --membind=0 python temperature.py 

Script:

import mxnet as mx
from mxnet import nd
from mxnet.gluon import nn
import time

import mxnet.numpy as np
mx.random.seed(123)
class Softmax(nn.HybridBlock):
    def __init__(self, temp, **kwargs):
        super(Softmax, self).__init__(**kwargs)
        self.temp = temp

    def forward(self, x):
        out = mx.npx.softmax(x,
                             axis=-1,
                             temperature=self.temp
                             )
        return out

def benchmark():
    results = dict()
    N  = [32, 64, 128]
    IC = [32, 128, 512, 1024, 2048]
    IW = [32, 128, 512, 1024, 2048]
    temp = [1, 3, 8]
    for t in temp:
        for bs in N:
            for ic in IC:
                for iw in IW:
                    #print(bs, ic, iw, t)
                    in_data = mx.np.random.uniform(size=[bs, ic, iw], dtype='float32')
                    net = Softmax(t)
                    net.initialize()
                    net.hybridize(static_alloc=True, static_shape=True)
                    mx.nd.waitall()
                    tic = time.time()
                    for i in range(10):
                        o = net(in_data)
                        o.wait_to_read()
                    total = time.time() - tic
                    results[(bs, ic, iw, t)] = total

    for k,v in results.items():
        print(f"{k[0]};{k[1]};{k[2]};{k[3]};{v}")

benchmark()

Results:

bs ic iw  temperature master onednn master/onednn
32 32 32 1 0.002636 0.002653 0.993708996
32 32 128 1 0.003474 0.001926 1.803465347
32 32 512 1 0.009884 0.002643 3.739738385
32 32 1024 1 0.018752 0.003586 5.229454787
32 32 2048 1 0.03555 0.004841 7.343314455
32 128 32 1 0.003528 0.001792 1.968471465
32 128 128 1 0.009447 0.002112 4.473749577
32 128 512 1 0.034463 0.003134 10.9964245
32 128 1024 1 0.068366 0.006682 10.2310629
32 128 2048 1 0.13407 0.013143 10.20066392
32 512 32 1 0.009452 0.002226 4.246010496
32 512 128 1 0.033757 0.003276 10.30562632
32 512 512 1 0.132375 0.010599 12.48896687
32 512 1024 1 0.263806 0.023773 11.09691007
32 512 2048 1 0.523643 0.046402 11.28480265
32 1024 32 1 0.017592 0.002871 6.127304434
32 1024 128 1 0.065932 0.004773 13.81451693
32 1024 512 1 0.260083 0.02098 12.39664989
32 1024 1024 1 0.519072 0.041556 12.49093507
32 1024 2048 1 1.041307 0.088522 11.76323295
32 2048 32 1 0.03316 0.003494 9.491128702
32 2048 128 1 0.130167 0.010031 12.97682544
32 2048 512 1 0.518406 0.041503 12.49068234
32 2048 1024 1 1.035254 0.082866 12.4930747
32 2048 2048 1 2.077811 0.181906 11.42247699
64 32 32 1 0.002707 0.002027 1.335332863
64 32 128 1 0.005575 0.002265 2.461732814
64 32 512 1 0.017691 0.002386 7.41585049
64 32 1024 1 0.034102 0.003199 10.66072893
64 32 2048 1 0.066872 0.004814 13.89064976
64 128 32 1 0.005399 0.001891 2.855377632
64 128 128 1 0.017844 0.00267 6.684201125
64 128 512 1 0.066344 0.004704 14.10512976
64 128 1024 1 0.13144 0.010926 12.03028848
64 128 2048 1 0.260924 0.022051 11.83281074
64 512 32 1 0.017367 0.002846 6.103309594
64 512 128 1 0.065824 0.004643 14.17783598
64 512 512 1 0.260053 0.021704 11.98180879
64 512 1024 1 0.518945 0.041815 12.41055056
64 512 2048 1 1.03565 0.083152 12.45493514
64 1024 32 1 0.033103 0.003496 9.46958123
64 1024 128 1 0.130172 0.009884 13.17010807
64 1024 512 1 0.518383 0.04111 12.6097189
64 1024 1024 1 1.035132 0.081875 12.64289604
64 1024 2048 1 2.068471 0.169412 12.20969621
64 2048 32 1 0.064555 0.005115 12.62117186
64 2048 128 1 0.258695 0.021392 12.09327597
64 2048 512 1 1.034411 0.08196 12.62089328
64 2048 1024 1 2.068258 0.169632 12.19263458
64 2048 2048 1 4.151224 0.346947 11.96500142
128 32 32 1 0.003761 0.002113 1.779871375
128 32 128 1 0.009555 0.002159 4.4265518
128 32 512 1 0.033919 0.00327 10.3714369
128 32 1024 1 0.066462 0.004812 13.81025514
128 32 2048 1 0.131737 0.010979 11.99856681
128 128 32 1 0.009446 0.002387 3.957051538
128 128 128 1 0.033753 0.003357 10.05333049
128 128 512 1 0.130869 0.010193 12.83960609
128 128 1024 1 0.26046 0.021792 11.95188394
128 128 2048 1 0.519283 0.042405 12.24576071
128 512 32 1 0.03313 0.003486 9.503419505
128 512 128 1 0.130216 0.010278 12.67000719
128 512 512 1 0.518307 0.041477 12.49611138
128 512 1024 1 1.03523 0.082965 12.4779685
128 512 2048 1 2.068576 0.169157 12.22876726
128 1024 32 1 0.064612 0.005205 12.4131092
128 1024 128 1 0.25856 0.021104 12.2519234
128 1024 512 1 1.03451 0.081953 12.62316539
128 1024 1024 1 2.068357 0.168912 12.24516104
128 1024 2048 1 4.134959 0.325172 12.71620504
128 2048 32 1 0.127446 0.010628 11.9910496
128 2048 128 1 0.515368 0.040215 12.81545248
128 2048 512 1 2.065812 0.166998 12.37027101
128 2048 1024 1 4.133266 0.324198 12.74918278
128 2048 2048 1 8.297849 0.689424 12.03591474
32 32 32 3 0.00206 0.002174 0.947368421
32 32 128 3 0.002901 0.001908 1.520554792
32 32 512 3 0.006872 0.002194 3.13236253
32 32 1024 3 0.012467 0.002722 4.580325858
32 32 2048 3 0.023432 0.003677 6.372017635
32 128 32 3 0.002867 0.001836 1.561152947
32 128 128 3 0.006835 0.002149 3.181313804
32 128 512 3 0.023169 0.00332 6.979673921
32 128 1024 3 0.044929 0.005999 7.489229791
32 128 2048 3 0.088602 0.016748 5.290261506
32 512 32 3 0.007207 0.002516 2.864127345
32 512 128 3 0.023531 0.00376 6.258465441
32 512 512 3 0.088555 0.017108 5.176119736
32 512 1024 3 0.175204 0.035965 4.871474123
32 512 2048 3 0.348496 0.070004 4.978267533
32 1024 32 3 0.012922 0.003074 4.202946879
32 1024 128 3 0.045336 0.005897 7.687527795
32 1024 512 3 0.175341 0.035303 4.966765943
32 1024 1024 3 0.348684 0.071373 4.885339489
32 1024 2048 3 0.694106 0.140925 4.92536061
32 2048 32 3 0.024166 0.004092 5.905266838
32 2048 128 3 0.089105 0.016061 5.547992993
32 2048 512 3 0.348901 0.070672 4.936887063
32 2048 1024 3 0.694543 0.141337 4.914105555
32 2048 2048 3 1.390182 0.288056 4.826085877
64 32 32 3 0.002384 0.002117 1.125886724
64 32 128 3 0.004173 0.002011 2.075299419
64 32 512 3 0.012351 0.002708 4.561014263
64 32 1024 3 0.023311 0.003676 6.340596628
64 32 2048 3 0.045749 0.006466 7.075479351
64 128 32 3 0.004375 0.002227 1.964243657
64 128 128 3 0.012503 0.002645 4.726297765
64 128 512 3 0.044982 0.0056 8.032484673
64 128 1024 3 0.088837 0.016787 5.292168504
64 128 2048 3 0.174877 0.034336 5.093065402
64 512 32 3 0.012893 0.003048 4.229313311
64 512 128 3 0.045332 0.005912 7.668091628
64 512 512 3 0.175326 0.035578 4.928000375
64 512 1024 3 0.348549 0.071049 4.905771812
64 512 2048 3 0.694153 0.140929 4.925550542
64 1024 32 3 0.024231 0.004062 5.965078061
64 1024 128 3 0.089171 0.016667 5.350198838
64 1024 512 3 0.348847 0.06966 5.007875444
64 1024 1024 3 0.694303 0.141937 4.891609513
64 1024 2048 3 1.38575 0.286075 4.844007681
64 2048 32 3 0.046704 0.006425 7.269491966
64 2048 128 3 0.176777 0.034333 5.148856976
64 2048 512 3 0.695222 0.138113 5.0337016
64 2048 1024 3 1.392859 0.290829 4.789276158
64 2048 2048 3 2.777339 0.550022 5.049507165
128 32 32 3 0.003131 0.002193 1.427592955
128 32 128 3 0.006891 0.0022 3.131989597
128 32 512 3 0.02319 0.003693 6.279617793
128 32 1024 3 0.045063 0.005953 7.569443332
128 32 2048 3 0.088695 0.017094 5.188741352
128 128 32 3 0.007221 0.002528 2.855997737
128 128 128 3 0.023517 0.003656 6.432083469
128 128 512 3 0.088489 0.016527 5.354309126
128 128 1024 3 0.175154 0.036008 4.864284343
128 128 2048 3 0.348467 0.070856 4.917958761
128 512 32 3 0.02422 0.004093 5.917113234
128 512 128 3 0.089245 0.016182 5.51490998
128 512 512 3 0.348877 0.069145 5.04555595
128 512 1024 3 0.694318 0.14195 4.891271669
128 512 2048 3 1.385778 0.285961 4.846030837
128 1024 32 3 0.046719 0.006418 7.279356588
128 1024 128 3 0.176882 0.034288 5.158747818
128 1024 512 3 0.695201 0.138072 5.035053366
128 1024 1024 3 1.38632 0.289526 4.788235953
128 1024 2048 3 2.769361 0.552881 5.008964398
128 2048 32 3 0.091738 0.017008 5.393840417
128 2048 128 3 0.351505 0.06749 5.208260767
128 2048 512 3 1.387876 0.283533 4.894941399
128 2048 1024 3 2.769904 0.550297 5.03346627
128 2048 2048 3 5.551181 1.098142 5.05506626
32 32 32 8 0.00204 0.00214 0.953431373
32 32 128 8 0.002903 0.001909 1.521299188
32 32 512 8 0.006899 0.002172 3.176528708
32 32 1024 8 0.012445 0.002729 4.559972045
32 32 2048 8 0.023433 0.00369 6.349634989
32 128 32 8 0.002854 0.001863 1.531925784
32 128 128 8 0.006844 0.00214 3.198328691
32 128 512 8 0.023193 0.003301 7.026870847
32 128 1024 8 0.045073 0.0059 7.63996767
32 128 2048 8 0.088638 0.017365 5.10442925
32 512 32 8 0.007404 0.002718 2.724449513
32 512 128 8 0.023509 0.003741 6.284849257
32 512 512 8 0.088519 0.016207 5.461613145
32 512 1024 8 0.175228 0.034852 5.027773977
32 512 2048 8 0.348409 0.069693 4.999185807
32 1024 32 8 0.012949 0.003063 4.227117061
32 1024 128 8 0.045319 0.005848 7.749317135
32 1024 512 8 0.175334 0.035818 4.895128867
32 1024 1024 8 0.348441 0.07177 4.854968856
32 1024 2048 8 0.694237 0.141721 4.898620675
32 2048 32 8 0.024226 0.004105 5.901899286
32 2048 128 8 0.089214 0.016026 5.567007855
32 2048 512 8 0.348753 0.069279 5.034035729
32 2048 1024 8 0.694219 0.139642 4.971413791
32 2048 2048 8 1.385626 0.29052 4.769461754
64 32 32 8 0.002413 0.002146 1.124416796
64 32 128 8 0.004144 0.002024 2.047355401
64 32 512 8 0.012359 0.002728 4.529930962
64 32 1024 8 0.023354 0.003663 6.375423067
64 32 2048 8 0.045371 0.006103 7.434096414
64 128 32 8 0.004359 0.002213 1.969409737
64 128 128 8 0.012511 0.002749 4.551344319
64 128 512 8 0.044995 0.005695 7.900284662
64 128 1024 8 0.088507 0.016912 5.233224315
64 128 2048 8 0.175137 0.034585 5.063952847
64 512 32 8 0.012879 0.003049 4.223707874
64 512 128 8 0.045283 0.005896 7.680807182
64 512 512 8 0.175346 0.035567 4.929983912
64 512 1024 8 0.348452 0.071776 4.854721807
64 512 2048 8 0.694082 0.141424 4.90781304
64 1024 32 8 0.024211 0.004068 5.952054393
64 1024 128 8 0.089151 0.015926 5.597946016
64 1024 512 8 0.348775 0.069241 5.037125926
64 1024 1024 8 0.694444 0.141441 4.90979275
64 1024 2048 8 1.385926 0.285969 4.84641929
64 2048 32 8 0.046686 0.006406 7.287841006
64 2048 128 8 0.176619 0.033027 5.347713409
64 2048 512 8 0.695337 0.139538 4.983137553
64 2048 1024 8 1.386666 0.290755 4.769191031
64 2048 2048 8 2.777631 0.553053 5.02235659
128 32 32 8 0.003133 0.002197 1.425889757
128 32 128 8 0.006902 0.002223 3.105116379
128 32 512 8 0.023208 0.003685 6.297535097
128 32 1024 8 0.044963 0.005882 7.643847276
128 32 2048 8 0.088625 0.017138 5.171252887
128 128 32 8 0.007156 0.00252 2.839640492
128 128 128 8 0.023461 0.003696 6.348106574
128 128 512 8 0.088585 0.016423 5.393830297
128 128 1024 8 0.175132 0.036241 4.832427667
128 128 2048 8 0.348407 0.071602 4.865889718
128 512 32 8 0.024173 0.004047 5.972608388
128 512 128 8 0.089146 0.016749 5.322325343
128 512 512 8 0.348874 0.069427 5.025010302
128 512 1024 8 0.694221 0.141927 4.891384981
128 512 2048 8 1.385653 0.285861 4.847288134
128 1024 32 8 0.046703 0.006402 7.294481269
128 1024 128 8 0.176865 0.033604 5.263230338
128 1024 512 8 0.695349 0.138104 5.034957212
128 1024 1024 8 1.386518 0.289829 4.783911579
128 1024 2048 8 2.769319 0.549994 5.035179696
128 2048 32 8 0.091783 0.017045 5.384639056
128 2048 128 8 0.351532 0.067172 5.233307305
128 2048 512 8 1.387872 0.285158 4.867024514
128 2048 1024 8 2.77036 0.552777 5.011710939
128 2048 2048 8 5.550584 1.09872 5.051865695

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

@mxnet-bot
Copy link

Hey @bgawrych , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [windows-cpu, centos-gpu, sanity, clang, edge, centos-cpu, miscellaneous, unix-cpu, windows-gpu, website, unix-gpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress labels Sep 3, 2021
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 6, 2021
@bgawrych
Copy link
Contributor Author

bgawrych commented Sep 6, 2021

@mxnet-bot run ci [centos-gpu, centos-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [centos-cpu, centos-gpu]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 6, 2021
@bgawrych
Copy link
Contributor Author

bgawrych commented Sep 7, 2021

@mxnet-bot run ci [centos-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [centos-cpu]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 7, 2021
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-awaiting-review PR is waiting for code review pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 13, 2021
@mseth10 mseth10 added pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 29, 2021
@bgawrych
Copy link
Contributor Author

@mxnet-bot run ci [unix-gpu, unix-cpu, miscellaneous]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu, miscellaneous, unix-cpu]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Oct 11, 2021
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Oct 14, 2021
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Oct 14, 2021
@bgawrych
Copy link
Contributor Author

@mxnet-bot run ci [windows-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [windows-gpu]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Oct 18, 2021
@akarbown akarbown merged commit fb90650 into apache:master Oct 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants