Floating point exception in mxnet.ndarray.op.random_pdf_dirichlet #18936

leeyeetonn · 2020-08-16T02:24:58Z

Description

(A clear and concise description of what the bug is.)
mxnet.ndarray.op.random_pdf_dirichlet has floating point exception when given sample's shape has 0. Please see the provided code as example.

Error Message

(Paste the complete error message. Please also include stack trace by setting environment variable DMLC_LOG_STACK_TRACE_DEPTH=10 before running your script.)

Floating point exception (core dumped)

To Reproduce

(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)

import mxnet
import numpy as np
sample = mxnet.nd.array(np.random.rand(4,0))
alpha = mxnet.nd.array(np.random.rand(1))
mxnet.ndarray.op.random_pdf_dirichlet(sample=sample, alpha=alpha)

Steps to reproduce

(Paste the commands you ran that produced the error.)

run the provided code in python interpreter or as a script

What have you tried to solve it?

Environment

We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:

curl --retry 10 -s https://github.com/dmlc/gluon-nlp/master/tools/diagnose.py | python

# paste outputs here

Got 404 when trying to get the script.

Some environment information:

OS: ubuntu 18.04
Python: 3.7.6
pip: 20.0.2
numpy: 1.18.5
mxnet: 1.6.0

The text was updated successfully, but these errors were encountered:

szha · 2020-08-16T02:29:08Z

@leeyeetonn thanks a lot for identifying and reporting these issues (#18927, #18933, #18934, and this). it's very helpful.

I think a general problem for these issues is that FPE exits the program without a stacktrace. I will work on improving the signal handler to treat it as a regular runtime error instead.

leeyeetonn · 2020-08-16T02:59:01Z

@szha Thanks for your feedback! I agree. They should not have FPEs but rather runtime exceptions in python. I believe each one of them requires some additional input validity checks.

I have a few more cases of FPEs caused by similar kind of input. If you don't mind, I'd like to report them as individual issues which is helpful to keep track of things.

xidulu · 2020-08-20T15:38:21Z

@leeyeetonn
I guess this issue should be resolved now according to #18956 (if my understand is correct) authored by @szha

szha · 2020-08-20T16:04:08Z

Yes, FPE should no longer abort the program now. The bug still needs to be fixed

szha · 2020-08-21T06:17:49Z

So here's the problem:

% DMLC_LOG_STACK_TRACE_DEPTH=150 MXNET_ENGINE_TYPE=NaiveEngine lldb python3.7 -- test_18936.py
(lldb) target create "python3.7"
Current executable set to 'python3.7' (x86_64).
(lldb) settings set -- target.run-args  "test_18936.py"
(lldb) run
Process 27100 launched: '/usr/local/bin/python3.7' (x86_64)
Process 27100 stopped
* thread #2, stop reason = exec
    frame #0: 0x0000000100006000 dyld`_dyld_start
dyld`_dyld_start:
->  0x100006000 <+0>: popq   %rdi
    0x100006001 <+1>: pushq  $0x0
    0x100006003 <+3>: movq   %rsp, %rbp
    0x100006006 <+6>: andq   $-0x10, %rsp
(lldb) cont
Process 27100 resuming
[23:14:55] ../src/engine/engine.cc:55: MXNet start using engine: NaiveEngine
[23:14:55] ../src/storage/storage.cc:198: Using Pooled (Naive) StorageManager for CPU
Process 27100 stopped
* thread #2, queue = 'com.apple.main-thread', stop reason = EXC_ARITHMETIC (code=EXC_I386_DIV, subcode=0x0)
    frame #0: 0x0000000116f5b318 libmxnet.dylib`void mxnet::op::mxnet_op::Kernel<mxnet::op::LaunchExWrapper<mxnet::op::PDF_Dirichlet<false> >, mshadow::cpu>::LaunchEx<int, int, float*, float*, float*>(mshadow::Stream<mshadow::cpu>*, unsigned long, int, int, float*, float*, float*) at pdf_op.h:443
   440 	    index_t i = start;
   441
   442 	    // Get aligned
-> 443 	    const index_t align_step = sample_size - (i % sample_size);
   444 	    const index_t first_stride = length > align_step ? align_step : length;
   445 	    OP::Map(i, first_stride, sample_size, args...);
   446 	    i += first_stride;

https://github.com/apache/incubator-mxnet/blob/9bdd4d6347c284770ee5bfe5ae98f1dabc283829/src/operator/random/pdf_op.h#L443

The code needs to guard against zero-size array for right operand of %, and we should add a smoke test to guard against such problem in this op, similar to https://github.com/apache/incubator-mxnet/pull/18972/files

szha · 2020-08-21T06:21:16Z

@xidulu since we are deprecating ndarray in favor of np/npx, do we need to register an alias of this op in np/npx? (or is it already registered)

* fix #18938 * fix #18939, #18940 * fix #18936 and #18937 Co-authored-by: r3stl355 <ulmasov@amazon.com>

leeyeetonn added Bug needs triage labels Aug 16, 2020

szha added Operator and removed needs triage labels Aug 16, 2020

szha added the good first issue for c++ developer label Aug 21, 2020

szha mentioned this issue Aug 21, 2020

Floating point exception in mxnet.ndarray.op.random_pdf_poisson #18937

Closed

szha added v1.x Targeting v1.x branch C++ Related to C++ good first issue and removed good first issue for c++ developer labels Aug 21, 2020

r3stl355 pushed a commit to r3stl355/incubator-mxnet that referenced this issue Feb 10, 2021

fix apache#18936 and apache#18937

d9029ca

r3stl355 mentioned this issue Feb 10, 2021

[BUGFIX] fix #18936, #18937 #19878

Merged

4 tasks

szha closed this as completed in #19878 Apr 30, 2021

szha pushed a commit that referenced this issue Apr 30, 2021

[BUGFIX] fix #18936, #18937 (#19878)

6f4ac54

* fix #18938 * fix #18939, #18940 * fix #18936 and #18937 Co-authored-by: r3stl355 <ulmasov@amazon.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Floating point exception in mxnet.ndarray.op.random_pdf_dirichlet #18936

Floating point exception in mxnet.ndarray.op.random_pdf_dirichlet #18936

leeyeetonn commented Aug 16, 2020

szha commented Aug 16, 2020

leeyeetonn commented Aug 16, 2020

xidulu commented Aug 20, 2020

szha commented Aug 20, 2020

szha commented Aug 21, 2020 •

edited

Loading

szha commented Aug 21, 2020 •

edited

Loading

Floating point exception in mxnet.ndarray.op.random_pdf_dirichlet #18936

Floating point exception in mxnet.ndarray.op.random_pdf_dirichlet #18936

Comments

leeyeetonn commented Aug 16, 2020

Description

Error Message

To Reproduce

Steps to reproduce

What have you tried to solve it?

Environment

szha commented Aug 16, 2020

leeyeetonn commented Aug 16, 2020

xidulu commented Aug 20, 2020

szha commented Aug 20, 2020

szha commented Aug 21, 2020 • edited Loading

szha commented Aug 21, 2020 • edited Loading

szha commented Aug 21, 2020 •

edited

Loading

szha commented Aug 21, 2020 •

edited

Loading