Fix cuda graph capture #15005

tianleiwu · 2023-03-11T02:22:00Z

Description

Fix two issues related to cuda graph capture: #14942 and #15002

Issue 1: Previously, graph capture starts at the second run. However, memory pattern optimization will allocate memory from the second run, and cudamalloc is not allowed during graph capture. In this PR, the graph capture will start graph capture after 2 runs to avoid the issue.

Issue 2: #13495 introduced multiple stream support. But stream cleanup will call cudaStreamSyncronize which is not allowed in cuda graph capture. In this PR, we move stream cleanup after cuda graph capture.

Update the squeeze net test model with dynamic axis so that we can test with larger batch size. Add a test that could reproduce the bug (when changing min runs from 2 back to 1).

Motivation and Context

onnxruntime/core/session/inference_session.cc

onnxruntime/test/python/onnxruntime_test_python_cudagraph.py

hariharans29 · 2023-03-14T18:09:34Z

However, memory pattern optimization will allocate memory from the second run - If memory pattern optimization will always allocate memory on the second run, how come this doesn't repro universally for all models (Seem to recall we had a C++ test for a simple model and this issue never happens there)? Is it possible, that while memory pattern optimization kicks in always (based on the session option), it only triggers an arena extension for some models based on the peak memory usage identified by the planner in the first run and the arena's state at that point in time ?

onnxruntime/core/providers/cuda/cuda_graph.cc

onnxruntime/core/providers/cuda/cuda_execution_provider.h

tianleiwu · 2023-03-14T18:25:28Z

However, memory pattern optimization will allocate memory from the second run - If memory pattern optimization will always allocate memory on the second run, how come this doesn't repro universally for all models (Seem to recall we had a C++ test for a simple model and this issue never happens there)? Is it possible, that while memory pattern optimization kicks in based on the session option, it only triggers an arena extension for some models based on the peak memory usage identified by the planner in the first run and the arena's state at that point in time ?

I think the default Arena setting make it. Let me change the Arena setting in the test case, and it shall be able to reproduce.

onnxruntime/test/python/onnxruntime_test_python_cudagraph.py

onnxruntime/core/session/inference_session.cc

onnxruntime/test/python/onnxruntime_test_python_cudagraph.py

tianleiwu · 2023-06-13T16:13:43Z

However, memory pattern optimization will allocate memory from the second run - If memory pattern optimization will always allocate memory on the second run, how come this doesn't repro universally for all models (Seem to recall we had a C++ test for a simple model and this issue never happens there)? Is it possible, that while memory pattern optimization kicks in always (based on the session option), it only triggers an arena extension for some models based on the peak memory usage identified by the planner in the first run and the arena's state at that point in time ?

It is due to Arena setting of 1M default initial buffer bytes and the NextPowerOfTwo extend strategy could have extra memory covering small memory allocation. I updated the Arena setting to use SameAsRequest, also use large batch size so that 1M is not enough. Now the new test can reproduce the bug.

onnxruntime/core/session/inference_session.cc

hariharans29 · 2023-06-13T18:34:08Z

However, memory pattern optimization will allocate memory from the second run - If memory pattern optimization will always allocate memory on the second run, how come this doesn't repro universally for all models (Seem to recall we had a C++ test for a simple model and this issue never happens there)? Is it possible, that while memory pattern optimization kicks in always (based on the session option), it only triggers an arena extension for some models based on the peak memory usage identified by the planner in the first run and the arena's state at that point in time ?

It is due to Arena setting of 1M default initial buffer bytes and the NextPowerOfTwo extend strategy could have extra memory covering small memory allocation. I updated the Arena setting to use SameAsRequest, also use large batch size so that 1M is not enough. Now the new test can reproduce the bug.

Thanks for catching this.

hariharans29

LGTM for the core changes. Didn't review the multi-stream specific changes.

tianleiwu marked this pull request as draft March 11, 2023 02:22

tianleiwu commented Mar 11, 2023

View reviewed changes

onnxruntime/core/session/inference_session.cc Show resolved Hide resolved

github-advanced-security bot found potential problems Mar 14, 2023

View reviewed changes

onnxruntime/test/python/onnxruntime_test_python_cudagraph.py Fixed Show fixed Hide fixed

tianleiwu force-pushed the tlwu/fix_cuda_graph branch from efef6b1 to a5f3810 Compare March 14, 2023 16:39

tianleiwu marked this pull request as ready for review March 14, 2023 16:44

tianleiwu requested review from jslhcl, RandySheriffH, hariharans29, wangyems and yufenglee March 14, 2023 17:26

hariharans29 requested a review from feihugis March 14, 2023 18:06

hariharans29 reviewed Mar 14, 2023

View reviewed changes

onnxruntime/core/providers/cuda/cuda_graph.cc Outdated Show resolved Hide resolved

hariharans29 reviewed Mar 14, 2023

View reviewed changes

onnxruntime/core/providers/cuda/cuda_execution_provider.h Show resolved Hide resolved

hariharans29 reviewed Mar 14, 2023

View reviewed changes

onnxruntime/test/python/onnxruntime_test_python_cudagraph.py Outdated Show resolved Hide resolved

hariharans29 reviewed Mar 14, 2023

View reviewed changes

onnxruntime/test/python/onnxruntime_test_python_cudagraph.py Outdated Show resolved Hide resolved

feihugis reviewed Apr 12, 2023

View reviewed changes

onnxruntime/core/session/inference_session.cc Show resolved Hide resolved

chilo-ms mentioned this pull request May 25, 2023

CUDA graph support for TRT EP #16081

Merged

tianleiwu requested a review from a team as a code owner June 13, 2023 05:29

tianleiwu marked this pull request as draft June 13, 2023 05:31

github-advanced-security bot found potential problems Jun 13, 2023

View reviewed changes

onnxruntime/test/python/onnxruntime_test_python_cudagraph.py Fixed Show fixed Hide fixed

Fix cuda graph capture

12cbbd0

tianleiwu force-pushed the tlwu/fix_cuda_graph branch from e4f89f0 to 12cbbd0 Compare June 13, 2023 06:40

update comments

8f12c59

tianleiwu marked this pull request as ready for review June 13, 2023 16:09

tianleiwu requested review from hariharans29, chilo-ms and feihugis June 13, 2023 16:09

format

c3fc6a8

hariharans29 reviewed Jun 13, 2023

View reviewed changes

onnxruntime/core/session/inference_session.cc Show resolved Hide resolved

Merge branch 'main' into tlwu/fix_cuda_graph

92c6239

hariharans29 approved these changes Jun 13, 2023

View reviewed changes

Merge branch 'main' into tlwu/fix_cuda_graph

8d7a3b7

tianleiwu merged commit 9be1332 into main Jun 15, 2023

tianleiwu deleted the tlwu/fix_cuda_graph branch June 15, 2023 01:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cuda graph capture #15005

Fix cuda graph capture #15005

tianleiwu commented Mar 11, 2023 •

edited

Loading

hariharans29 commented Mar 14, 2023 •

edited

Loading

tianleiwu commented Mar 14, 2023 •

edited

Loading

tianleiwu commented Jun 13, 2023

hariharans29 commented Jun 13, 2023

hariharans29 left a comment

Fix cuda graph capture #15005

Fix cuda graph capture #15005

Conversation

tianleiwu commented Mar 11, 2023 • edited Loading

Description

Motivation and Context

hariharans29 commented Mar 14, 2023 • edited Loading

tianleiwu commented Mar 14, 2023 • edited Loading

tianleiwu commented Jun 13, 2023

hariharans29 commented Jun 13, 2023

hariharans29 left a comment

Choose a reason for hiding this comment

tianleiwu commented Mar 11, 2023 •

edited

Loading

hariharans29 commented Mar 14, 2023 •

edited

Loading

tianleiwu commented Mar 14, 2023 •

edited

Loading