Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Flaky test test_quantization_mkldnn.test_requantize_int32_to_int8 #11747

Closed
KellenSunderland opened this issue Jul 13, 2018 · 13 comments · Fixed by #16709
Closed

Flaky test test_quantization_mkldnn.test_requantize_int32_to_int8 #11747

KellenSunderland opened this issue Jul 13, 2018 · 13 comments · Fixed by #16709

Comments

@KellenSunderland
Copy link
Contributor

Example Failure
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/1175/pipeline

Output

FAIL: test_quantization_mkldnn.test_requantize_int32_to_int8

Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/nose/case.py", line 198, in runTest
self.test(*self.arg)
File "/usr/local/lib/python3.5/dist-packages/nose/util.py", line 620, in newfunc
return func(*arg, **kw)
File "/work/mxnet/tests/python/mkl/../unittest/common.py", line 157, in test_new
orig_test(*args, **kwargs)
File "/work/mxnet/tests/python/mkl/../quantization/test_quantization.py", line 127, in test_requantize_int32_to_int8
check_requantize((3, 4, 10, 10))
File "/work/mxnet/tests/python/mkl/../quantization/test_quantization.py", line 123, in check_requantize
assert_almost_equal(qdata_int8.asnumpy(), qdata_int8_np)
File "/work/mxnet/python/mxnet/test_utils.py", line 493, in assert_almost_equal
raise AssertionError(msg)
AssertionError:
Items are not equal:
Error 1562.500000 exceeds tolerance rtol=0.000010, atol=0.000000. Location of maximum error:(0, 3, 8, 0), a=-63.000000, b=-64.000000
a: array([[[[ -72, 106, 79, ..., 73, -43, 126],
[ -74, -118, -46, ..., -18, 44, -37],
[ 0, 13, -93, ..., -117, -123, -56],...
b: array([[[[ -72, 106, 79, ..., 73, -43, 126],
[ -74, -118, -46, ..., -18, 44, -37],
[ 0, 13, -93, ..., -117, -123, -56],...
-------------------- >> begin captured logging << --------------------
common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=2054829694 to reproduce.
--------------------- >> end captured logging << ---------------------

@apeforest
Copy link
Contributor

Thannks for filing this issue. We will investigate this Flaky test

@xinyu-intel
Copy link
Contributor

Cannot reproduce locally. By the way, why just python3 fail?

@KellenSunderland
Copy link
Contributor Author

@xinyu-intel: This is a rarely occurring non-deterministic failure, so the fact it happens in python3 as opposed to python2, (or any other language) is probably just a coincidence.

@marcoabreu
Copy link
Contributor

marcoabreu commented Jul 30, 2018

Hi,

please run the following commands:

  1. ci/build.py -p ubuntu_cpu -i
  2. PYTHONPATH=/work/mxnet/python python3 tools/flakiness_checker.py -s 2127644814 test_quantization_mkldnn.test_requantize_int32_to_int8

The output will be:


INFO:root:Testing: /work/mxnet/tests/python/mkl/test_quantization_mkldnn.py:test_requantize_int32_to_int8
INFO:root:No test seed provided, using random seed
test_quantization_mkldnn.test_requantize_int32_to_int8 ... [INFO] 351 of 10000: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=2127644814 to reproduce.
FAIL

======================================================================
FAIL: test_quantization_mkldnn.test_requantize_int32_to_int8
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/usr/local/lib/python3.5/dist-packages/nose/util.py", line 620, in newfunc
    return func(*arg, **kw)
  File "/work/mxnet/tests/python/mkl/../quantization/common.py", line 172, in test_new
    orig_test(*args, **kwargs)
  File "/work/mxnet/tests/python/mkl/../quantization/test_quantization.py", line 127, in test_requantize_int32_to_int8
    check_requantize((3, 4, 10, 10))
  File "/work/mxnet/tests/python/mkl/../quantization/test_quantization.py", line 123, in check_requantize
    assert_almost_equal(qdata_int8.asnumpy(), qdata_int8_np)
  File "/work/mxnet/python/mxnet/test_utils.py", line 493, in assert_almost_equal
    raise AssertionError(msg)
AssertionError:
Items are not equal:
Error 1562.500000 exceeds tolerance rtol=0.000010, atol=0.000000.  Location of maximum error:(1, 3, 1, 7), a=-63.000000, b=-64.000000
 a: array([[[[-119,   61,  -58, ...,    8,  123,   10],
         [  97,   79,   11, ...,   37, -106,  -13],
         [  82,   53, -125, ...,  104,   90,  112],...
 b: array([[[[-119,   61,  -58, ...,    8,  123,   10],
         [  97,   79,   11, ...,   37, -106,  -13],
         [  82,   53, -125, ...,  104,   90,  112],...
-------------------- >> begin captured logging << --------------------
common: INFO: 351 of 10000: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=2127644814 to reproduce.
--------------------- >> end captured logging << ---------------------

This output is entirely deterministic given the seed. For reference, I'm running on a g3.8xlarge.
@pengzhao-intel @xinyu-intel

@pengzhao-intel
Copy link
Contributor

sorry, I missed this one previously.
@xinyu-intel will keep looking the issue.

@xinyu-intel
Copy link
Contributor

@KellenSunderland @marcoabreu @reminisce Hi, I think we should set the absolute error to one ULP. When we check whether the int8 data are equal, we should set the absolute error to 1.
In this case, we convert int32 data to float32 data and then convert to int8 data. In the last step (np.sign(data) * np.minimum(np.abs(data) * scale + 0.5, quantized_range)).astype('int8') plus 0.5 to realize rounded up will bring one ULP truncated error.
For example, MKLDNN float 63.4999 will not be round up to 64 after plus 0.5 but Numpy float 63.5000 will. The absolute error of float32 1e-4 will be expand to 1 after converted to int8.
I'll start a pr to fix it.

@marcoabreu
Copy link
Contributor

Great, thanks a lot @xinyu-intel !

@marcoabreu
Copy link
Contributor

#12040

@ChaiBapchya
Copy link
Contributor

#16692 unrelated PR
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-16692/1/pipeline

======================================================================

FAIL: test_quantization.test_requantize_int32_to_int8

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest

    self.test(*self.arg)

  File "/work/mxnet/tests/python/quantization/common.py", line 177, in test_new

    orig_test(*args, **kwargs)

  File "/work/mxnet/tests/python/quantization/test_quantization.py", line 186, in test_requantize_int32_to_int8

    check_requantize_with_symbol((3, 4, 10, 10))

  File "/work/mxnet/tests/python/quantization/test_quantization.py", line 181, in check_requantize_with_symbol

    assert_almost_equal(qdata_int8.asnumpy(), qdata_int8_np)

  File "/work/mxnet/python/mxnet/test_utils.py", line 627, in assert_almost_equal

    raise AssertionError(msg)

AssertionError: 

Items are not equal:

Error 1562.500000 exceeds tolerance rtol=1.000000e-05, atol=1.000000e-20 (mismatch 0.166667%).

Location of maximum error: (0, 2, 3, 4), a=-63.00000000, b=-64.00000000

 ACTUAL: array([[[[ -98,  -25,  -99, ...,  -93,   -8,   37],

         [  94,  -57,  -94, ...,  -84,   83,   60],

         [ -47,  112,   95, ...,  107, -112,    4],...

 DESIRED: array([[[[ -98,  -25,  -99, ...,  -93,   -8,   37],

         [  94,  -57,  -94, ...,  -84,   83,   60],

         [ -47,  112,   95, ...,  107, -112,    4],...

-------------------- >> begin captured stdout << ---------------------


*** Maximum errors for vector of size 1200:  rtol=1e-05, atol=1e-20


  1: Error 1562.500000  Location of error: (0, 2, 3, 4), a=-63.00000000, b=-64.00000000

  2: Error 1562.500000  Location of error: (2, 2, 0, 2), a=63.00000000, b=64.00000000

@pengzhao-intel
Copy link
Contributor

@xinyu-intel please take a look for flaky test.

@ChaiBapchya
Copy link
Contributor

@zixuanweeei
Copy link
Contributor

Hi, @ChaiBapchya. Perhaps #16709 was not backported to the 1.6.x branch. Would you mind confirming that? If so, I think we need to backport the patch to 1.6.x as well along with #17993.

@ChaiBapchya
Copy link
Contributor

Backported that patch. Thanks for pointing it out. Weird how it didn't get selected in the previous cherry-picks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants