ResourceExhustedError: OOM when executing NAO-WS/cnn/train_search.sh #6

cwhuang888 · 2018-12-14T07:16:22Z

Hi,

I have cloned the code and run without modification under python3.6 virtual environment with following packages installed

Package Version

absl-py 0.6.1
astor 0.7.1
gast 0.2.0
grpcio 1.17.1
Markdown 3.0.1
numpy 1.15.4
Pillow 5.3.0
pip 18.1
pkg-resources 0.0.0
protobuf 3.6.1
PyYAML 3.13
setuptools 39.1.0
six 1.12.0
tensorboard 1.9.0
tensorflow-gpu 1.9.0
termcolor 1.1.0
torch 0.3.1
torchvision 0.2.1
Werkzeug 0.14.1
wheel 0.32.3

However, a out-of-memory exceptioin occurs when executing NAO-WS/cnn/train_search.sh on a GTX-1080 with 8GB memory.

Could you point out how to fix the issue ?

Error message:

lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:100: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

...

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[5,160,80,8,8] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: child_1/layer_6/cell_3/y/stack = Pack[N=5, T=DT_FLOAT, axis=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](child_1/layer_6/cell_3/y/conv_3x3/stack_1/FusedBatchNorm, child_1/layer_6/cell_3/y/conv_5x5/stack_1/FusedBatchNorm, child_1/layer_6/cell_3/y/avg_pool/average_pooling2d/AvgPool, child_1/layer_6/cell_3/y/max_pool/max_pooling2d/MaxPool, child_1/layer_6/cell_3/y/strided_slice_2)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[Node: child_2/gradients/concat_12/_22493 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_42968_child_2/gradients/concat_12", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ResourceExhustedError: OOM when executing NAO-WS/cnn/train_search.sh #6

ResourceExhustedError: OOM when executing NAO-WS/cnn/train_search.sh #6

cwhuang888 commented Dec 14, 2018 •

edited

Loading

ResourceExhustedError: OOM when executing NAO-WS/cnn/train_search.sh #6

ResourceExhustedError: OOM when executing NAO-WS/cnn/train_search.sh #6

Comments

cwhuang888 commented Dec 14, 2018 • edited Loading

cwhuang888 commented Dec 14, 2018 •

edited

Loading