Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResourceExhustedError: OOM when executing NAO-WS/cnn/train_search.sh #6

Open
cwhuang888 opened this issue Dec 14, 2018 · 0 comments
Open

Comments

@cwhuang888
Copy link

cwhuang888 commented Dec 14, 2018

Hi,

I have cloned the code and run without modification under python3.6 virtual environment with following packages installed

Package Version


absl-py 0.6.1
astor 0.7.1
gast 0.2.0
grpcio 1.17.1
Markdown 3.0.1
numpy 1.15.4
Pillow 5.3.0
pip 18.1
pkg-resources 0.0.0
protobuf 3.6.1
PyYAML 3.13
setuptools 39.1.0
six 1.12.0
tensorboard 1.9.0
tensorflow-gpu 1.9.0
termcolor 1.1.0
torch 0.3.1
torchvision 0.2.1
Werkzeug 0.14.1
wheel 0.32.3

However, a out-of-memory exceptioin occurs when executing NAO-WS/cnn/train_search.sh on a GTX-1080 with 8GB memory.

Could you point out how to fix the issue ?

Error message:

lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:100: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

...

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[5,160,80,8,8] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: child_1/layer_6/cell_3/y/stack = Pack[N=5, T=DT_FLOAT, axis=0, _device="/job:localhost/replica:0/task:0/device:GPU:0"](child_1/layer_6/cell_3/y/conv_3x3/stack_1/FusedBatchNorm, child_1/layer_6/cell_3/y/conv_5x5/stack_1/FusedBatchNorm, child_1/layer_6/cell_3/y/avg_pool/average_pooling2d/AvgPool, child_1/layer_6/cell_3/y/max_pool/max_pooling2d/MaxPool, child_1/layer_6/cell_3/y/strided_slice_2)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[Node: child_2/gradients/concat_12/_22493 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_42968_child_2/gradients/concat_12", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant