Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is there some error with dynamic export for TensorRT? #8866

Closed
1 of 2 tasks
hdnh2006 opened this issue Aug 4, 2022 · 10 comments · Fixed by #8869
Closed
1 of 2 tasks

is there some error with dynamic export for TensorRT? #8866

hdnh2006 opened this issue Aug 4, 2022 · 10 comments · Fixed by #8869
Labels
bug Something isn't working

Comments

@hdnh2006
Copy link
Contributor

hdnh2006 commented Aug 4, 2022

Search before asking

  • I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

Export

Bug

Hi, I was trying the new feature of dynamic size for TensorRT. I exported yolov5s.pt to TensorRT using the following command:

python export.py --include engine --device 0 --workspace 8 --dynamic --batch-size 8

According to this the max batch size is set as 8, so I tried with a couple of ip cameras I have for testing and I got the following:

python detect.py --weights yolov5s.engine --source streams1.txt

Traceback (most recent call last):
  File "detect.py", line 257, in <module>
    main(opt)
  File "detect.py", line 252, in main
    run(**vars(opt))
  File "/home/henry/.virtualenvs/yolov5/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "detect.py", line 138, in run
    p, im0, frame = path[i], im0s[i].copy(), dataset.count
IndexError: list index out of range
terminate called without an active exception
terminate called recursively
Aborted

I have to tell that the normal yolov5s.pt works perfectly, so it is not a proble of my ip cameras:

python detect.py --weights yolov5s.pt --source streams1.txt 
detect: weights=['yolov5s.pt'], source=streams1.txt, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False
YOLOv5 🚀 v6.1-359-g628c05ca Python-3.8.10 torch-1.11.0+cu102 CUDA:0 (NVIDIA GeForce RTX 2060 SUPER, 7974MiB)

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
[hevc @ 0x685bf80] PPS id out of range: 0
[hevc @ 0x685bf80] PPS id out of range: 0
[hevc @ 0x7638a00] Could not find ref with POC 30
1/2: rtsp://XXXXXXXXXXX@XXXXXXXXXXXXX/Streaming/Channels/601...  Success (inf frames 1920x1080 at 40.00 FPS)
[hevc @ 0x765d2c0] PPS id out of range: 0
[hevc @ 0x765d2c0] PPS id out of range: 0
[hevc @ 0x707bc40] Could not find ref with POC 31
2/2: rtsp://XXXXXXXXXXX@XXXXXXXXXXXXX/Streaming/Channels/701...  Success (inf frames 1920x1080 at 40.00 FPS)

0: 384x640 3 motorcycles, 1: 384x640 Done. (0.345s)
0: 384x640 3 motorcycles, 1: 384x640 Done. (0.006s)
0: 384x640 3 motorcycles, 1: 384x640 Done. (0.006s)
0: 384x640 3 motorcycles, 1: 384x640 Done. (0.006s)
0: 384x640 3 motorcycles, 1: 384x640 Done. (0.006s)
0: 384x640 3 motorcycles, 1: 384x640 Done. (0.006s)
0: 384x640 3 motorcycles, 1: 384x640 Done. (0.006s)
0: 384x640 3 motorcycles, 1: 384x640 Done. (0.006s)
0: 384x640 3 motorcycles, 1: 384x640 Done. (0.006s)
0: 384x640 3 motorcycles, 1: 384x640 Done. (0.006s)

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@hdnh2006 hdnh2006 added the bug Something isn't working label Aug 4, 2022
@glenn-jocher
Copy link
Member

glenn-jocher commented Aug 4, 2022

@hdnh2006 I'm not able to test with multiple streams, but I did test with detect.py normally (batch size 1) and PyTorch Hub (batch size 2) and it works correctly, i.e.

!python export.py --weights yolov5s.pt --include engine --imgsz 640 --device 0 --dynamic --batch 8  # export
!python detect.py --weights yolov5s.engine --imgsz 640 --device 0  # inference

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5s.engine')

# Images
dir = 'https://ultralytics.com/images/'
imgs = [dir + f for f in ('zidane.jpg', 'bus.jpg')]  # batch of images

# Inference
results = model(imgs)
results.print()  # or .show(), .save()

Do you have a streams.txt with public addresses we could use to debug?
Screen Shot 2022-08-04 at 6 22 50 PM

@hdnh2006
Copy link
Contributor Author

hdnh2006 commented Aug 4, 2022

Thanks Glenn for your quick reply, I can share with you my "streams.txt" file in private.

@glenn-jocher
Copy link
Member

glenn-jocher commented Aug 4, 2022

@hdnh2006 maybe there's an easier way. Can you reproduce the issue by running your TRT model with val.py at different batch-sizes?

@glenn-jocher
Copy link
Member

glenn-jocher commented Aug 4, 2022

@hdnh2006 more info here. When I export at --batch 8 and run val.py with --batch 1 and --batch 2 they both work but they both seem locked into --batch-size 8 inference. This is probably why 3 streams are not working, they are at batch size 3 instead of 8.

I think this is being forced in DetectMultiBackend, I'll take a look.

Screen Shot 2022-08-04 at 7 29 53 PM

@hdnh2006
Copy link
Contributor Author

hdnh2006 commented Aug 4, 2022

Yes, you are right, I am getting same logs.


(yolotest) henry@henrymlearning:~/Projects/yolo/yolov5$ python val.py --weights yolov5s.engine --batch-size 2
val: data=data/coco128.yaml, weights=['yolov5s.engine'], batch_size=2, imgsz=640, conf_thres=0.001, iou_thres=0.6, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5 🚀 v6.1-359-g628c05ca Python-3.8.10 torch-1.12.0+cu102 CUDA:0 (NVIDIA GeForce RTX 2060 SUPER, 7974MiB)

Loading yolov5s.engine for TensorRT inference...
[08/04/2022-19:39:21] [TRT] [I] [MemUsageChange] Init CUDA: CPU +311, GPU +0, now: CPU 416, GPU 331 (MiB)
[08/04/2022-19:39:21] [TRT] [I] Loaded engine size: 34 MiB
[08/04/2022-19:39:21] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +35, now: CPU 0, GPU 35 (MiB)
[08/04/2022-19:39:21] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +264, now: CPU 0, GPU 299 (MiB)
val: Scanning '/home/henry/Projects/yolo/datasets/coco128/labels/train2017.cache' images and labels... 128 found, 0 missing, 2 empty, 0 corrupt: 100%|██████████| 128/128 [00:00<?, ?it/s]                 
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95:  19%|█▉        | 3/16 [00:01<00:07,  1.71it/s]                                                                     WARNING: NMS time limit 0.540s exceeded
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100%|██████████| 16/16 [00:03<00:00,  4.35it/s]                                                                    
                 all        128        929      0.665       0.65      0.696      0.464
Speed: 0.2ms pre-process, 3.7ms inference, 13.6ms NMS per image at shape (8, 3, 640, 640)
Results saved to runs/val/exp

@glenn-jocher
Copy link
Member

@hdnh2006 it looks like there is some bug in TRT dynamic handling in DetectMultiBackend. I've added a TODO to resolve this. In the meantime I would export at a fixed batch size, i.e. --batch 3 and avoid dynamic.

@glenn-jocher
Copy link
Member

Even PyTorch Hub is actually outputting 8 results when passed 2 images. The last 6 outputs appear uninitialized. I wonder if this is how TRT is handling dynamic outputs, by simply populating the first x outputs.

Screen Shot 2022-08-04 at 7 48 53 PM

@glenn-jocher
Copy link
Member

glenn-jocher commented Aug 4, 2022

@democat3457 can you take a look at this bug report? It's related to your PR #8526. It seems that --dynamic TRT models are always outputting their max match size, i.e.

im = 'data/images/zidane.jpg'
model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5s.engine')  # exported with --dynamic --batch 8
results = model(im)
print(len(results)) # returns 8, but only first image has meaningful output, results 2-7 appear uninitialized data

I can always update the output in DetectMultiBackend to remove excess results, but this seems a bit hackish. Is this the way TRT dynamic is supposed to operate or is there a problem?

@glenn-jocher glenn-jocher linked a pull request Aug 4, 2022 that will close this issue
@glenn-jocher
Copy link
Member

@hdnh2006 @democat3457 possible fix in #8869, but a little more hack-ish than I'd like.

@glenn-jocher
Copy link
Member

@hdnh2006 good news 😃! Your original issue may now be fixed ✅ in PR #8869 with the help of @democat3457. To receive this update:

  • Gitgit pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
  • PyTorch Hub – Force-reload model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
  • Notebooks – View updated notebooks Open In Colab Open In Kaggle
  • Dockersudo docker pull ultralytics/yolov5:latest to update your image Docker Pulls

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

@glenn-jocher glenn-jocher removed the TODO label Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants