Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[Numpy] Add qr backward for wide inputs with nrows < ncols #18757

Merged
merged 1 commit into from
Jul 20, 2020

Conversation

D-Roberts
Copy link
Contributor

@D-Roberts D-Roberts commented Jul 19, 2020

As titled. This is a resubmit of #18197 . In addition, tests were re-verified for robustness.

MXNET_TEST_COUNT=10000 pytest -v 
 ~/workspace/incubator mxnet/tests/python/unittest/test_numpy_op.py::test_np_linalg_qr

incubator-mxnet/tests/python/unittest/test_numpy_op.py::test_np_linalg_qr PASSED  [100%]

The obtained gradient has the same values for a given input as with TensorFlow. The implemented method is similar to the method implemented in tf .

Here are cross-checked examples:

import mxnet as mx
import mxnet.numpy as np
import numpy as _np
_np.random.seed(42)
data_np = _np.random.uniform(-1, 1, (3, 5)).astype(_np.float32)
data = np.array(data_np, dtype='float32')
data.attach_grad()
with mx.autograd.record():
    ret = np.linalg.qr(data)
mx.autograd.backward(ret)
print(data.grad)
[[ 0.7569422   0.5140486  -0.48962986 -0.48962957 -0.48962957]
 [-2.414882   -1.2380642  -1.6602778  -1.6602782  -1.6602782 ]
 [ 0.2763981  -0.5044659   0.06115001  0.06115055  0.06115055]]

import tensorflow as tf
import numpy as _np
_np.random.seed(42)
data_np = _np.random.uniform(-1, 1, (3, 5)).astype(_np.float32)
data = tf.convert_to_tensor(data_np)
with tf.GradientTape() as g:
    g.watch(data)
    ret = tf.linalg.qr(data)
print(g.gradient(ret, data))
tf.Tensor(
[[ 0.75694233  0.51404876 -0.48962957 -0.48962957 -0.48962957]
 [-2.414882   -1.2380638  -1.6602784  -1.6602781  -1.6602781 ]
 [ 0.276398   -0.50446594  0.0611502   0.06115052  0.06115052]], shape=(3, 5), dtype=float32)

At high level the methodology is: partition/split the input A into 2 matrices X and Y and split matrix R (from A=QR decomposition) into 2 matrices U and V. Then X = QU and get X_grad by applying the gradient derivation from the square input case (m=n) with adjusted Q_grad. Also get Y_grad separately. Then A_grad is the concatenation of X_grad and Y_grad.

Changes

  • qr backward wide input
  • tests

@mxnet-bot
Copy link

Hey @D-Roberts , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [website, miscellaneous, centos-gpu, windows-gpu, unix-cpu, unix-gpu, windows-cpu, edge, sanity, clang, centos-cpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@D-Roberts
Copy link
Contributor Author

@mxnet-bot run ci [edge, clang]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [edge, clang]

@D-Roberts D-Roberts changed the title Add qr backward for wide inputs with nrows < ncols [Numpy] Add qr backward for wide inputs with nrows < ncols Jul 20, 2020
@D-Roberts
Copy link
Contributor Author

@szha Here is the replacement PR. What is necessary to be done about the codecov/project failure?

@leezu
Copy link
Contributor

leezu commented Jul 20, 2020

@D-Roberts it's a bug and you can ignore it #18421 (comment)

Could you elaborate on how the PR differs from the previous version? Do you mean the test was adapted? ("tests were re-verified for robustness") If not, do you know why the previous version failed the CI and your current version passes?

Thank you

@D-Roberts
Copy link
Contributor Author

D-Roberts commented Jul 20, 2020

Hi @leezu
By 're-verify tests' I really meant that I ran the tests >3 times with MXNET_TEST_COUNT=10000 :) .

I also made the following changes to the tests (as compared to the previous PR):

  1. I removed the expensive analytical and numerical Jacobian calculations that were included in the initial PR. These calculations were initially included to demonstrate that the analytical method that I was implementing was working correctly and passing the central differences numerical checks. At the time of the initial PR 3 months ago, MXNet was the only framework implementing this qr backward method for wide inputs. However, in the meanwhile, TensorFlow had this method implemented, so the method is already verified via central differences. Notice in the example given in the description that the results from MXNet implementation and Tf implementation are identical.

  2. I included a larger matrix in the test input for a test case. I removed a couple of redundant test cases to reduce test run time. I also relaxed atol and rtol for dtype float32 to make sure there are no flaky test runs at any time due to numerical rounding errors.

The error that led to the "stale PR" failing CI was due to a calculation in the numerical Jacobian for central differences check. The code broke after updates to Numpy and MXNet Numpy when the source array is float32 and the the dtype used in view is float64.

"jacobian_num[row, :] = diff.asnumpy().ravel().view(dtype)
[2020-07-17T17:41:26.700Z] E           ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.

@szha
Copy link
Member

szha commented Jul 20, 2020

I'm comfortable merging this. @leezu?

@leezu
Copy link
Contributor

leezu commented Jul 20, 2020

Thank you @D-Roberts!

@leezu leezu merged commit 1aec483 into apache:master Jul 20, 2020
@D-Roberts D-Roberts deleted the qr_back_wide_v2 branch December 21, 2020 19:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants