Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to run medaka consensus #195

Closed
jagos01 opened this issue Aug 18, 2020 · 3 comments
Closed

Failed to run medaka consensus #195

jagos01 opened this issue Aug 18, 2020 · 3 comments

Comments

@jagos01
Copy link

jagos01 commented Aug 18, 2020

Hello,
I have installed medaka following the GPU instructions of this site. When run, I get the following error:

2020-08-18 09:18:24.145726: E tensorflow/stream_executor/dnn.cc:588] OOM when allocating tensor with shape[3073512624] and type uint8 on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "/home/DRDC/medaka/venv/bin/medaka", line 33, in
sys.exit(load_entry_point('medaka==1.0.3', 'console_scripts', 'medaka')())
File "/home/DRDC/medaka/venv/lib/python3.6/site-packages/medaka-1.0.3-py3.6-linux-x86_64.egg/medaka/medaka.py", line 643, in main
args.func(args)
File "/home/DRDC/medaka/venv/lib/python3.6/site-packages/medaka-1.0.3-py3.6-linux-x86_64.egg/medaka/prediction.py", line 149, in predict
batch_size=args.batch_size, save_features=args.save_features
File "/home/DRDC/medaka/venv/lib/python3.6/site-packages/medaka-1.0.3-py3.6-linux-x86_64.egg/medaka/prediction.py", line 49, in run_prediction
class_probs = model.predict_on_batch(x_data)
File "/home/DRDC/medaka/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1294, in predict_on_batch
outputs = self.predict_function(inputs)
File "/home/DRDC/medaka/venv/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3292, in call
run_metadata=self.run_metadata)
File "/home/DRDC/medaka/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1458, in call
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 3, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size]: [1, 256, 128, 1, 10000, 100]
[[{{node bidirectional_1/CudnnRNN_1}}]]
[[classify/truediv/_123]]
(1) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 3, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size]: [1, 256, 128, 1, 10000, 100]
[[{{node bidirectional_1/CudnnRNN_1}}]]
0 successful operations.
0 derived errors ignored.
Failed to run medaka consensus.

Environment
Ubuntu 18.04
GPU: GeForce RTX 2080 Ti
cuDNN version: 7.4.2
Nvidia Driver: 440.100
CUDA version: 10.0.130

output from python -c "import tensorflow; print(tensorflow.version)"

/home/DRDC/medaka/venv/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
1.14.0

Any help would be appreciated.
Thanks,
Scott

@cjw85
Copy link
Member

cjw85 commented Aug 18, 2020

Hi @jagos01,

Take a look at #70, I think you will need to:

  • Run export TF_FORCE_GPU_ALLOW_GROWTH=true
  • Reduce the --batch_size option (-b in the script medaka_consensus)

We are working on updating medaka to use a newer version of tensorflow which should better handle RTX GPUs.

@jagos01
Copy link
Author

jagos01 commented Aug 18, 2020

Hello @cjw85,
I did export TF_FORCE_GPU_ALLOW_GROWTH=true, but missed the --batch_size option. Medaka is running fine now.
Thanks,
Scott

@jagos01 jagos01 closed this as completed Aug 18, 2020
cjw85 pushed a commit that referenced this issue Nov 26, 2020
Checkpoints

Closes #195

See merge request research/medaka!461
@asan-emirsaleh
Copy link

The solution with --batch_size option helped me too. Setup:
CUDA 11, cuDNN 8, RTX 2070, TensorFlow 2.3.
Tested with E. coli genomic reads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants