Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[FEATURE] Fuse dequantize with convolution #20816

Merged
merged 7 commits into from
Feb 14, 2022

Conversation

DominikaJedynak
Copy link
Contributor

Description

This PR adds the possibility to fuse dequantize node with convolution node, what in practice enables us to avoid unnecessary multiplying and then dividing all entries of a convolution by the same scaling factor.

Speedup on various data sizes:

fuze_deq_plot
Measured on instance c6i.12xlarge (Intel Xeon Platinum 8375C), ami-04505e74c0741db8d (Canonical, Ubuntu, 20.04 LTS)
Script:

import mxnet as mx
from mxnet.contrib import quantization
from mxnet.gluon import nn
import gc
import time

batch_size = [1, 3, 8, 32, 64]
channels = [1, 3, 16, 64]
picture_size = [32, 64, 128, 248, 512, 1024]
DATA_SHAPE=[(n, c, s, s) for n in batch_size for c in channels for s in picture_size]

rounds = 1000
warmup = 100

def print_header(header):
    print( "\n---- ", header, " ----")
    print("    Shape    | Time [s] | Mean [ms]" )

def print_value(shape, total, mean):
    print("({:4}, {:4}, {:4}, {:4}) | {:8.3f} | {:8.3f} ".format( shape[0], shape[1], shape[2], shape[3], total, mean))

def measure(net, data, shape):
    mx.nd.waitall()
    gc.collect()
    gc.disable()
    tic = 0
    for i in range(rounds + warmup):
        if i == warmup:
            start_time = time.time() 
        o = net(data)
        o.wait_to_read()
    end_time = time.time()
    run_time = (end_time - start_time)
    print_value(shape, run_time, 1000 * run_time / rounds)
    gc.enable()
    gc.collect()

class Conv(nn.HybridBlock):
    def __init__(self, **kwargs):
        super(Conv, self).__init__(**kwargs)
        self.conv0 = nn.Conv2D(channels=4, kernel_size=(3, 3), strides=1, use_bias=False)

    def forward(self, x):
        out = self.conv0(x)
        return out

def benchmark():
    for data_shape in DATA_SHAPE:
        net = Conv()
        net.initialize()
        net.hybridize(static_alloc=True, static_shape=True)
        x = mx.np.random.uniform(size=data_shape, low=-1.0, high=1.0)

        data = mx.gluon.data.ArrayDataset(x)
        calib_data = mx.gluon.data.DataLoader(data, batch_size=1)
        net = quantization.quantize_net(net,
                                    ctx=mx.current_context(),
                                    calib_mode='naive',
                                    calib_data=calib_data,
                                    num_calib_batches=1,
                                    )
        measure(net, x, data_shape)

benchmark()

@mxnet-bot
Copy link

Hey @DominikaJedynak , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [centos-cpu, unix-cpu, clang, miscellaneous, website, windows-cpu, windows-gpu, sanity, unix-gpu, centos-gpu, edge]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress labels Jan 11, 2022
@DominikaJedynak
Copy link
Contributor Author

@mxnet-bot run ci [centos-gpu, windows-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [windows-gpu, centos-gpu]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Jan 12, 2022
@bgawrych
Copy link
Contributor

@mxnet-bot run ci [all]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu, windows-cpu, windows-gpu, centos-cpu, website, centos-gpu, unix-gpu, sanity, clang, edge, miscellaneous]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-review PR is waiting for code review pr-awaiting-testing PR is reviewed and waiting CI build and test labels Jan 26, 2022
@@ -209,7 +209,7 @@ class SgDNNLPostQuantizeProperty : public SubgraphProperty {

// When only fused quantized operator and requantize, set min/max_cablib_range,
// When fused quantized operator + requantize + dequantize, set dequantize flag to true.
if (dequantize_node != nullptr) {
if ((dequantize_node != nullptr && (no_enable_float_output.count(fuse_node->op()) == 0))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doubled brackets.

Suggested change
if ((dequantize_node != nullptr && (no_enable_float_output.count(fuse_node->op()) == 0))) {
if (dequantize_node != nullptr && (no_enable_float_output.count(fuse_node->op()) == 0)) {

@mx.util.use_np
@pytest.mark.parametrize('data_shape', DATA_SHAPE)
@pytest.mark.parametrize('no_bias', [True, False])
@pytest.mark.parametrize('out_type', ['int8', 'auto'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not uint8?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following settings for other tests in this file, I do not test it as it is scenario which is not used.

@mx.util.use_np
@pytest.mark.parametrize('data_shape', DATA_SHAPE)
@pytest.mark.parametrize('no_bias', [True, False])
@pytest.mark.parametrize('out_type', ['int8', 'auto'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

2 + (conv_param.no_bias ? 0 : 1) + (dnnl_param.with_bn ? 4 : 0) +
(dnnl_param.with_sum ? 1 : 0) +
(dnnl_param.quantized ? 2 + (full_conv_param.dnnl_param.with_sum ? 2 : 0) : 0);
size_t input_size = 2 + (conv_param.no_bias ? 0 : 1) + (dnnl_param.with_bn ? 4 : 0) +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can skip this calculation and use only calculated from idx below. If we wish to double check it, it will be enough to do it in the assert. so we can replace CHECK_EQ in line 167 with assert with calculation from here.

src/operator/subgraph/dnnl/dnnl_conv.cc Show resolved Hide resolved
Comment on lines 531 to 526
if (param.full_conv_param.dnnl_param.quantized) {
if (param.full_conv_param.dnnl_param.enable_float_output)
return std::vector<std::string>{"output"};
else
return std::vector<std::string>{"output", "output_min", "output_max"};
} else {
return std::vector<std::string>{"output"};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be simplfied:

Suggested change
if (param.full_conv_param.dnnl_param.quantized) {
if (param.full_conv_param.dnnl_param.enable_float_output)
return std::vector<std::string>{"output"};
else
return std::vector<std::string>{"output", "output_min", "output_max"};
} else {
return std::vector<std::string>{"output"};
}
if (param.full_conv_param.dnnl_param.quantized &&
!param.full_conv_param.dnnl_param.enable_float_output ) {
return std::vector<std::string>{"output", "output_min", "output_max"};
} else {
return std::vector<std::string>{"output"};
}

net = ConvAdd(use_bias=True)
check_quantize(net, data_shape, out_type)


Copy link
Contributor

@anko-intel anko-intel Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about test with convolution, activation and sum ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They were already there ConvActAdd, ConvBNSumAct

@szha
Copy link
Member

szha commented Feb 5, 2022

@DominikaJedynak could you resolve conflict?

@mseth10 mseth10 removed the pr-awaiting-merge Review and CI is complete. Ready to Merge label Feb 8, 2022
@mseth10 mseth10 added pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Feb 10, 2022
@bgawrych
Copy link
Contributor

@mxnet-bot run ci [unix-cpu, centos-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu, centos-gpu]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Feb 11, 2022
@DominikaJedynak
Copy link
Contributor Author

@mxnet-bot run ci [centos-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [centos-gpu]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Feb 11, 2022
@DominikaJedynak
Copy link
Contributor Author

@mxnet-bot run ci [centos-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [centos-gpu]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Feb 11, 2022
@bgawrych bgawrych merged commit f4c4952 into apache:master Feb 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants