-
Notifications
You must be signed in to change notification settings - Fork 842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do GPU and CPU block each other? #1537
Comments
When you author a python which we call backend handler file it's spawned as a process by the Java part of the codebase which we call frontend. The frontend and backend communicate via sockets. I believe your question is about whether we pipeline preprocessing in case an inference is slow, I'm not sure we do but maybe @lxning @HamidShojanazeri or @maaquib know The way we scale is by increasing the number of workers in If you're looking for source you can browse you can learn more here https://github.com/pytorch/serve/blob/master/docs/internals.md |
@msaroufim Imagine that you have a video payload and you want to run inference on each frame. The decoding can be performed on CPU, and as soon as each frame (or batch of frames) gets decoded, we feed it to GPU for forward pass. This way we overlap CPU preproc with GPU model exec and can substantially reduce latencies + allow proc of arbitrary video length. How (and if) can we do it with TorchServe today? |
@alar0330 based on my experience GPU process does not block CPU workers. I did not find the code to prove it, but I did several tests to prove it to myself. |
I guess you question is a bit different. If you want to process video you have to decode it on client side and use torchserve only for images. |
@alar0330 if not mistaking, you are sending the whole video as one request? if its not streaming, then I think in a custom handler it should be doable, does something like this help?
|
📚 Documentation
Did not find a documentation about processes/threads of GPU and CPU work.
During gpu inference can we continue cpu pre/post-processing of other objects asynchronously?
If yes, which way do you send data between processes (queue/file/socket)? May be you can provide a link to the code.
If no, do you have a plan to do it?
The text was updated successfully, but these errors were encountered: