backport #19393 to v1.x #19395

samskalicky · 2020-10-21T04:45:54Z

Description

backport #19393 to v1.x

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

…e#18504) * fix batch norm when fix_gamma is True * support gradient accumulation for batch norm * mkldnn batchnorm support grad add * unittest for bn * fix bn arg * fix lint * fix mkldnn * fix mkldnn bn * fix grad when fixing gamma * fix naive gpu bn * fix lint * invoke mkldnn and cudnn batchnorm when axis != 1 * backport 18500 * change condition * fix * fix * add mkldnn_off for bn * remove mkldnn_off * recover save_000800.json * cast

* Fix scipy dependency in probability module * Fix copy-paste error * dtype='float32' for digamma and gammaln

…che#18405) * Add deleting of args aux aux to Partition API Signed-off-by: Serge Panev <spanev@nvidia.com> * Delete args from Block.params Signed-off-by: Serge Panev <spanev@nvidia.com> * Fix to use arg/auxdict when optimize_for is called in HybridBlock Signed-off-by: Serge Panev <spanev@nvidia.com> * Address PR comments Signed-off-by: Serge Panev <spanev@nvidia.com>

* update footer style * add compiled css of footer styles changes * add same style for footer2 * more fix to the toc

…8350) * Add missing args/aux support in optimize_for and deferred inference option Signed-off-by: Serge Panev <spanev@nvidia.com> * Add input shape_dict, type_dict and stype_dict to optimize_for Signed-off-by: Serge Panev <spanev@nvidia.com> * Remove warnings for Werror Signed-off-by: Serge Panev <spanev@nvidia.com> * Address PR comments Signed-off-by: Serge Panev <spanev@nvidia.com>

…#18691)

…18713) CMAKE_CUDA_HOST_COMPILER will be reset if CMAKE_CUDA_COMPILER is not set as of cmake 3.17.3 See https://gitlab.kitware.com/cmake/cmake/-/issues/20826

* Disable test coverage in MKL builds * Enable test parallelization * Set OMP_NUM_THREADS * Fix * Fix unpack_and_init

* Enable GPU Memory profiler tests Previously tests are not run as test_profiler.py was not taken into account on GPU CI runs and some tests were marked for being skipped if run on a CPU-only machine. * Disable broken tests

* Refactor scope functionality in Python API - Remove deprecated metaclass functionality - Remove global state in naming - Switch from threading.local to asyncio compatible contextvars - Stop exposing UUIDs in parameter name * Fix dependencies * Fixes * Fixes * Fix * Fix after merge master

* Add the newest mxnet discuss version. Add d2l.ai * delete [] and insert old version

* add ndarray and boolean indexing for numpy symbol * fix sanity and unit test * ensure consistency between the imperative and symbolic interface * Update python/mxnet/numpy/multiarray.py and add new test Co-authored-by: Leonard Lausen <leonard@lausen.nl> * Don't rely on indexing_key_expand_implicit_axes for deciding if _npi.advanced_indexing_multiple is applicable * fix sanity Co-authored-by: Leonard Lausen <lausen@amazon.com>

… Transpose and Rollaxis (apache#18707) * support 6+ dims for transpose * test over * reorder code * fix transposeex

…ache#18724)

@szha

* Refactoring of Pooled Storage Manager classes * Adding test for new functionality * Fixing compilation problems which appear for MXNET_USE_CUDA=0 * Fixing compilation problems for WINDOWS and ANDROID * Fixing compilation problems which appear for WINDOWS and __APPLE__ * Fixing lint problems * test_dataloader_context(): Bypassing custom_dev_id pinned mem test on system with GPUs < 2. * Fixing compilation for Android. Elimination of unused includes. * Fixing problems with CPUPinned Storage Manager which appears when MXNET_USE_CUDA = 0 * Removing test_bucketing.py * Imroving CPU_Pinned Pooled Storage Manager case. * Fixing lint problem * The GPU profiling commands calls moved into mutex area * Fixing lint problem * Improved reporting regarding the Storage Manager used. * Fixing lint problem * Trigger CI * Removing some comments, as suggested by @szha * Trigger CI * Trigger CI Co-authored-by: andreii <andreii@nvidia.com>

Disabling this test for now to unblock other PRs, while I'm looking into it. apache#18740

…pache#18750) This reverts commit 60d0672.

@Retry

* Add sm arch 80 to Makefile * Add TF32 to cuBLAS GEMMs Signed-off-by: Serge Panev <spanev@nvidia.com> * Add CUDA version guards Signed-off-by: Serge Panev <spanev@nvidia.com> * Remove useless TF32 for double and old CUDA version Signed-off-by: Serge Panev <spanev@nvidia.com> * Factorize VERSION_ADJUSTED_TF32_MATH Signed-off-by: Serge Panev <spanev@nvidia.com> * Add TF32 considerations to test_util.py:check_consistency() * Bypass test_gluon_gpu.py:test_large_models if gmem >32GB * Default tols in assert_almost_equal() now a function of dtype and ctx * Expand types listed by default_tols() * Fix pylint * All with_seed() tests to waitall in teardown * Elevate MXNET_TEST_SEED logging to WARNING * Revert test_gluon_gpu.py:test_rnn_layer to default tols * Fix test_gluon_model_zoo_gpu.py::test_inference and test_operator_gpy.py::test_np_linalg_{solve,tensorinv} * test_numpy_interoperability.py to not fix seed for rest of CI * Further fix to test_np_linalg_tensorinv * Fix test_gluon_data.py:test_dataloader_context when run on 1-GPU system. * Fix test_operator_gpu.py::test_embedding_with_type * Fix test_operator_gpu.py::{test_*convolution_large_c,test_np_linalg_tensorsolve} * Remove unneeded print() from test_numpy_interoperability.py * Unify tol handling of check_consistency() and assert_almost_equal(). Test tweeks. * Add tol handling of assert_almost_equal() with number args * Add tol handling of bool comparisons * Fix test_numpy_op.py::test_np_random_rayleigh * Fix test_operator_gpu.py::test_batchnorm_with_type * Fix test_gluon.py::test_sync_batchnorm in cpu selftest * Improve unittest failure reporting * Add to robustness of test_operator_gpu.py::test_embedding_with_type * Check_consistency() to use equal backward gradients for increased test robustness * Fix test_operator_gpu.py::test_{fully_connected,gemm}. Add default_numeric_eps(). * test_utils.py fix for numeric gradient calc * Reinstate rtol=1e-2 for test_operator.py::test_order * Remove auto-cast of check_consistency() input data to least precise dtype (not needed) * Fix test_operator.py::test_{reciprocol,cbrt,rcbrt}_op * Expand default float64 numeric_eps for test_operator_gpu.py::test_sofmin * Fix segfault-on-error of @Retry decorator. Add test isolation. * assert_almost_equal() to handle a,b scalars * Fix test_operator_gpu.py::test_gluon_{mvn,mvn_v1} race * Fix test_operator_gpu.py::test_flatten_slice_after_conv via scale * Remove test_utils.py:almost_equal_ignore_nan() * Fix sample vs. pop variance issue with test_numpy_op.py::test_npx_batch_norm * Expose test_utils.py:effective_dtype() and use to fix test_operator_gpu.py::test_np_linalg_svd * Fix true_divide int_array / int_scalar -> float_array to honor np_default_dtype * Try test_elemwise_binary_ops serial to avoid pytest worker crash * Fix (log_)softmax backward on empty ndarray * Temporarily log all CI seeds to troubleshoot seed non-determinism * Revert "Temporarily log all CI seeds to troubleshoot seed non-determinism" This reverts commit f60eff2. * Temp log all CI seeds to troubleshoot unwanted seed determinism * Revert "Add sm arch 80 to Makefile" This reverts commit f9306ce. * Same fix of sample vs. pop variance issue, now with test_operator_gpu.py::test_batchnorm * Revert "Temp log all CI seeds to troubleshoot unwanted seed determinism" This reverts commit ff328ef. * Marking test_sparse_dot_grad with garbage_expected after teardown error * Fix flakiness of test_gluon_probability{_v1,_v2}.py::test_gluon_kl{_v1,} * Temp skip of test_aggregate_duplication on gpu * Add seeding to test_{numpy,}_contrib_gluon_data_vision.py. Make created files unique. * Add ndarray module isolation to help debug test_bbox_augmenters worker crash * Marking test_sparse_square_sum serial after pytest worker crash * Fix flakiness of test_gluon_probability{_v1,_v2}.py::test_half_cauchy{_v1,} Co-authored-by: Serge Panev <spanev@nvidia.com> Co-authored-by: Bart Gawrych <gawrych.bartlomiej@intel.com>

* enable default large tensor in np * revert cmake change * move test_np_large_array.py to nightly

Replaced by cmake buildsystem as per apache#16167

* Remove duplicate setup and teardown functions faccd91 introduced a automatic pytest hooks for handling MXNET_MODULE_SEED adapted from dmlc/gluon-nlp@66e926a but didn't remove the existing seed handling via explicit setup and teardown functions. This commit removes the explicit setup and teardown functions in favor of the automatic pytest version, and thereby also ensures that the seed handling code is not executed twice. As a side benefit, seed handling now works correctly even if contributors forget to add the magic setup_module and teardown_module imports in new test files. If pytest is run with --capture=no (or -s shorthand), output of the module level fixtures is shown to the user. * Fix locale setting

* Add GPU-optimization for split op * Complete operator * unit-test: use parametrize * fix lint * fix lint * fix lint

…9321) * Updated the auto-encoder example. Fixes apache#18712 * add Aditya Trivedi to contributors

…che#19329) Automated CI checks treating all warnings as errors will be introduced in a separate commit.

* Add examples of running Python unittests to docs * Remove conda references from test setup example Co-authored-by: r3stl355 <ulmasov@amazon.com>

* fix * Update test_np_large_array.py

* Fix the direction of in_channels -> out_channels in the repr function for ConvTranspose classes. Co-authored-by: g4b1nagy <gabrian.nagy@gmail.com> Co-authored-by: g4b1nagy <gabrian.nagy@gmail.com>

* fix * also include fix for np.diagonal * tweak branching * Update test_np_large_array.py * Update test_np_large_array.py * add operator index_t() * use index_t * update tests * Update packed_func.h * Update packed_func.h * Update packed_func.h * revert back Co-authored-by: Zhu <zhaoqzhu@3c22fbbb4e1a.ant.amazon.com>

…pache#19352) * add pr-awaiting-review shield * link to code_review page * fix markdown issue in website pipeline * use apache/mxnet instead of apache/incubator-mxnet

…, dsplit (apache#19357) Co-authored-by: Rohit Kumar Srivastava <srivastava.141@buckeyemail.osu.edu>

* fix * Update test_np_large_array.py * Update test_np_large_array.py

* Faster pointwise fusion graph pass * Fix lint * Fix lint 2 * Fixes * Fixing slice parameter handling in fusion * Fixing the slice fix * Fix the cycle bug * Added test * Fix lint * Fix merging of subgraphs * Fixes from review

Prior implementation recorded reshape with concrete shapes used during the particular invocation of At. New implementation uses records reshape with magic numbers to match symbolic interface.

mxnet-bot · 2020-10-21T04:45:55Z

Hey @samskalicky , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

To trigger all jobs: @mxnet-bot run ci [all]
To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [centos-gpu, edge, windows-gpu, miscellaneous, centos-cpu, clang, website, unix-cpu, windows-cpu, sanity, unix-gpu]

Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

ys2843 and others added 30 commits July 7, 2020 15:06

fix broken installation widget - remove empty entries (apache#18661)

348ab4d

change bn test (apache#18688)

a9b16f7

Fix scipy dependency in probability module (apache#18689)

19e373d

* Fix scipy dependency in probability module * Fix copy-paste error * dtype='float32' for digamma and gammaln

add 'needs triage' label to new bug reports (apache#18696)

8ebb537

Fix python micro-site table of content bugs (apache#18664)

9d62392

* update footer style * add compiled css of footer styles changes * add same style for footer2 * more fix to the toc

Merge content from numpy.mxnet.io into mxnet official website (apache…

7c9c4fc

…#18691)

Fix all anchor shifts on website (apache#18674)

f125f5f

Set CMAKE_CUDA_COMPILER in aarch64-linux-gnu-toolchain.cmake (apache#…

d8430b6

…18713) CMAKE_CUDA_HOST_COMPILER will be reset if CMAKE_CUDA_COMPILER is not set as of cmake 3.17.3 See https://gitlab.kitware.com/cmake/cmake/-/issues/20826

Disable test coverage in MKL builds (apache#18443)

d512814

* Disable test coverage in MKL builds * Enable test parallelization * Set OMP_NUM_THREADS * Fix * Fix unpack_and_init

Enable GPU Memory profiler tests (apache#18701)

0dc30a2

* Enable GPU Memory profiler tests Previously tests are not run as test_profiler.py was not taken into account on GPU CI runs and some tests were marked for being skipped if run on a CPU-only machine. * Disable broken tests

Migrate from private to public jetson toolchain files (apache#18677)

12ec046

Add the newest mxnet discuss version. Add d2l.ai (apache#18663)

6901325

* Add the newest mxnet discuss version. Add d2l.ai * delete [] and insert old version

[MXNET-1453] Support the intput whose dimension is greater than 6 for…

37bdf0b

… Transpose and Rollaxis (apache#18707) * support 6+ dims for transpose * test over * reorder code * fix transposeex

Initialize docker cache in build.py for docker-compose containers (ap…

2abf0b8

…ache#18724)

Remove NNPACK integration (apache#18722)

a77f774

Add qr backward for wide matrices with m < n (apache#18197)

60d0672

Disable sparse op test (apache#18741)

3e0df1b

Disabling this test for now to unblock other PRs, while I'm looking into it. apache#18740

Move gluon.metric api docs (apache#18733)

cec86ad

Revert "Add qr backward for wide matrices with m < n (apache#18197)" (a…

444a7ee

…pache#18750) This reverts commit 60d0672.

[NumPy] enable large tensor in np (apache#18368)

bf26bcc

* enable default large tensor in np * revert cmake change * move test_np_large_array.py to nightly

Add qr backward for wide inputs ncols>nrows (apache#18757)

1aec483

Remove Makefile build support (apache#18721)

a7c6606

Replaced by cmake buildsystem as per apache#16167

Improve test seeding in test_numpy_interoperablity.py (apache#18762)

6bb3d72

leezu and others added 17 commits October 11, 2020 03:57

Add GPU-optimization for split op (apache#19131)

16eb89b

* Add GPU-optimization for split op * Complete operator * unit-test: use parametrize * fix lint * fix lint * fix lint

BUGFIX Updated the auto-encoder example. Fixes apache#18712 (apache#1…

4dc9947

…9321) * Updated the auto-encoder example. Fixes apache#18712 * add Aditya Trivedi to contributors

Fix python API doc and all rst warnings for sphinx website build (apa…

5ed72b1

…che#19329) Automated CI checks treating all warnings as errors will be introduced in a separate commit.

[DOC] Add examples of running Python unittests to docs (apache#19253)

293fd9a

* Add examples of running Python unittests to docs * Remove conda references from test setup example Co-authored-by: r3stl355 <ulmasov@amazon.com>

change int to index_t (apache#19326)

191341f

Numpy vstack large tensor fix (apache#19313)

ce37302

* fix * Update test_np_large_array.py

BUGFIX Fix ConvTranspose __repr__ (apache#19338) (apache#19344)

94b649f

* Fix the direction of in_channels -> out_channels in the repr function for ConvTranspose classes. Co-authored-by: g4b1nagy <gabrian.nagy@gmail.com> Co-authored-by: g4b1nagy <gabrian.nagy@gmail.com>

Add FindCUTENSOR.cmake (apache#19334)

ce1e682

[DOC] Add shield for pr-awaiting-review and link to codereview page (a…

d0ceecb

…pache#19352) * add pr-awaiting-review shield * link to code_review page * fix markdown issue in website pipeline * use apache/mxnet instead of apache/incubator-mxnet

adding large tensor tests to verify support for split, hsplit, vsplit…

7bbe928

…, dsplit (apache#19357) Co-authored-by: Rohit Kumar Srivastava <srivastava.141@buckeyemail.osu.edu>

fix seg fault (apache#19324)

dfda45b

fix numpy ediff1d large tensor (apache#19310)

e6cea0d

* fix * Update test_np_large_array.py * Update test_np_large_array.py

Faster pointwise fusion graph pass (apache#19269)

a0fd1fe

* Faster pointwise fusion graph pass * Fix lint * Fix lint 2 * Fixes * Fixing slice parameter handling in fusion * Fixing the slice fix * Fix the cycle bug * Added test * Fix lint * Fix merging of subgraphs * Fixes from review

Fix AtWithRecord (apache#19374)

defaafe

Prior implementation recorded reshape with concrete shapes used during the particular invocation of At. New implementation uses records reshape with magic numbers to match symbolic interface.

backport apache#19393 to v1.x

9ca57b4

samskalicky requested a review from szha as a code owner October 21, 2020 04:45

samskalicky changed the base branch from master to v1.x October 21, 2020 04:46

samskalicky requested review from aaronmarkham, anirudh2290, eric-haibin-lin, gigasquid, iblislin, marcoabreu, nswamy, sergeykolychev and yzhliu as code owners October 21, 2020 04:46

samskalicky closed this Oct 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backport #19393 to v1.x #19395

backport #19393 to v1.x #19395

samskalicky commented Oct 21, 2020

mxnet-bot commented Oct 21, 2020

backport #19393 to v1.x #19395

backport #19393 to v1.x #19395

Conversation

samskalicky commented Oct 21, 2020

Description

Checklist

Essentials

Changes

Comments

mxnet-bot commented Oct 21, 2020