Failed to run medaka consensus #195

jagos01 · 2020-08-18T15:46:27Z

Hello,
I have installed medaka following the GPU instructions of this site. When run, I get the following error:

2020-08-18 09:18:24.145726: E tensorflow/stream_executor/dnn.cc:588] OOM when allocating tensor with shape[3073512624] and type uint8 on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "/home/DRDC/medaka/venv/bin/medaka", line 33, in
sys.exit(load_entry_point('medaka==1.0.3', 'console_scripts', 'medaka')())
File "/home/DRDC/medaka/venv/lib/python3.6/site-packages/medaka-1.0.3-py3.6-linux-x86_64.egg/medaka/medaka.py", line 643, in main
args.func(args)
File "/home/DRDC/medaka/venv/lib/python3.6/site-packages/medaka-1.0.3-py3.6-linux-x86_64.egg/medaka/prediction.py", line 149, in predict
batch_size=args.batch_size, save_features=args.save_features
File "/home/DRDC/medaka/venv/lib/python3.6/site-packages/medaka-1.0.3-py3.6-linux-x86_64.egg/medaka/prediction.py", line 49, in run_prediction
class_probs = model.predict_on_batch(x_data)
File "/home/DRDC/medaka/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1294, in predict_on_batch
outputs = self.predict_function(inputs)
File "/home/DRDC/medaka/venv/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3292, in call
run_metadata=self.run_metadata)
File "/home/DRDC/medaka/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1458, in call
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 3, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size]: [1, 256, 128, 1, 10000, 100]
[[{{node bidirectional_1/CudnnRNN_1}}]]
[[classify/truediv/_123]]
(1) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 3, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size]: [1, 256, 128, 1, 10000, 100]
[[{{node bidirectional_1/CudnnRNN_1}}]]
0 successful operations.
0 derived errors ignored.
Failed to run medaka consensus.

Environment
Ubuntu 18.04
GPU: GeForce RTX 2080 Ti
cuDNN version: 7.4.2
Nvidia Driver: 440.100
CUDA version: 10.0.130

output from python -c "import tensorflow; print(tensorflow.version)"

/home/DRDC/medaka/venv/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
1.14.0

Any help would be appreciated.
Thanks,
Scott

The text was updated successfully, but these errors were encountered:

cjw85 · 2020-08-18T15:52:56Z

Hi @jagos01,

Take a look at #70, I think you will need to:

Run export TF_FORCE_GPU_ALLOW_GROWTH=true
Reduce the --batch_size option (-b in the script medaka_consensus)

We are working on updating medaka to use a newer version of tensorflow which should better handle RTX GPUs.

jagos01 · 2020-08-18T17:30:34Z

Hello @cjw85,
I did export TF_FORCE_GPU_ALLOW_GROWTH=true, but missed the --batch_size option. Medaka is running fine now.
Thanks,
Scott

Checkpoints Closes #195 See merge request research/medaka!461

asan-emirsaleh · 2020-12-21T07:28:35Z

The solution with --batch_size option helped me too. Setup:
CUDA 11, cuDNN 8, RTX 2070, TensorFlow 2.3.
Tested with E. coli genomic reads.

jagos01 closed this as completed Aug 18, 2020

cjw85 pushed a commit that referenced this issue Nov 26, 2020

save model every epoch and best models resolving #195

463fce6

cjw85 pushed a commit that referenced this issue Nov 26, 2020

Merge branch 'checkpoints' into 'dev'

fdaaf98

Checkpoints Closes #195 See merge request research/medaka!461

tavareshugo mentioned this issue Aug 8, 2023

Pipeline revisions - kraken, flye, etc cambiotraining/awd-pathogen-bioinformatics#10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to run medaka consensus #195

Failed to run medaka consensus #195

jagos01 commented Aug 18, 2020

cjw85 commented Aug 18, 2020 •

edited

Loading

jagos01 commented Aug 18, 2020

asan-emirsaleh commented Dec 21, 2020

Failed to run medaka consensus #195

Failed to run medaka consensus #195

Comments

jagos01 commented Aug 18, 2020

cjw85 commented Aug 18, 2020 • edited Loading

jagos01 commented Aug 18, 2020

asan-emirsaleh commented Dec 21, 2020

cjw85 commented Aug 18, 2020 •

edited

Loading