-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM for GPU training #85
Comments
Could you try |
I tried that without luck. I'm also very curious about the huge GPU memory consumption. In my mind, efficientdet is light-weighted and efficient. However, I can only run efficientdet-d4 on a single Nvidia Titan V GPU by setting |
How about config_proto.graph_options.rewrite_options.auto_mixed_precision=rewriter_config_pb2.RewriterConfig.ON
config_proto.graph_options.rewrite_options.memory_optimization = rewriter_config_pb2.RewriterConfig.RECOMPUTATION_HEURISTICS |
Yes, i guess this implement initially might not be designed for existing GPU libs, i have to set batch size of 2 when training d4 on my 24G device. This is not so acceptable since i can train larger model(tridentnet r101) with batch size of 4/8. |
I also tried that with
at the top of |
I disable ema and I could train bigger model. |
What do you refer to by "ema"? Could you give us a detailed guide? |
h.moving_average_decay = 0. |
24GB GPU, d6 model, with h.moving_average_decay =0 or h.moving_average_decay = 0.9998, batch_size=1. |
I could train EfficientDet-D7 with batch size 4 per core on TPUv3, where each core has 16GB memory. But it seems like GPU training OOM is a big issue. Need more investigation why GPU has so much memory. Does anyone happen to know good memory profiling tools or instructions? Thanks! |
Here is the solution. automl/efficientdet/det_model_fn.py Lines 238 to 269 in 75ba619
|
I'm sorry, how do I solve this? I added I'm using a batch size of 16 with efficientdet-d0, with the mixed precision and memory optimization flags from above on a 2080ti with 11 gigs of ram. |
Decrease batch_size. |
With a batch size of 16, without the AggregationMethod flag, it trains, albeit with a warning that "maybe things would be faster if we had more RAM". If I add the flag, the same thing happens - is that the expected result? |
Added the above line in main.py resolved the error. But the higher model training is very very slower than training with d0 model. |
Hi all, is there any progress on this problem? I am using GeForce GTX 1080 Ti, and train the models on my own dataset. I can train D4 when using batch size = 1, and get OOM on D5. I have tried all the approached mentioned ere, but none of them worked. @Samjith888 If adding the lines works but very slowly, are you sure you are using GPU or perhaps the system happened to use CPU? I also tried your approach, but still get OOM. |
You are right, its using cpu instead of gpu |
Decrease fpn_cell_repeats could solve your problem. But it also decreases performance. |
@LucasSloan Are you having some accaptable results with your RTX card and this configuration? |
Just for the record: I'm trying to finetune PASCAL VOC 2012 as described in the readme on a RTX 2080ti with
it still exceeds the memory limit of 10.7 GB on D5. It is about the same limitation as without the changes. |
Same here I can't train D4 on colab even with train_batch_size=1 and moving_average_decay = 0 |
#!/bin/bash
MODEL=efficientdet-d1
train
CUDA_VISIBLE_DEVICES="1" python main.py --training_file_pattern=tfrecord/image_train*
--validation_file_pattern=tfrecord/image_val*
--mode='train_and_eval'
--model_name=$MODEL
--model_dir=$MODEL
--val_json_file='dataset/coco/annotations/image_val.json'
--hparams="use_bfloat16=false,num_classes=4" --use_tpu=False
--train_batch_size 8
No matter train_batch_size is 16 、8, the OOM always occur.
But for efficientdet-d0 all is fine.
My device is RTX2080 11G.
And using tensorflow-gpu 2.1
I'm surprised that the efficientdet-d1 occupied so much memory.
Is that normal?
The text was updated successfully, but these errors were encountered: