Do GPU and CPU block each other? #1537

BraginIvan · 2022-03-28T21:31:28Z

📚 Documentation

Did not find a documentation about processes/threads of GPU and CPU work.
During gpu inference can we continue cpu pre/post-processing of other objects asynchronously?

If yes, which way do you send data between processes (queue/file/socket)? May be you can provide a link to the code.
If no, do you have a plan to do it?

msaroufim · 2022-03-28T23:03:55Z

When you author a python which we call backend handler file it's spawned as a process by the Java part of the codebase which we call frontend. The frontend and backend communicate via sockets.

I believe your question is about whether we pipeline preprocessing in case an inference is slow, I'm not sure we do but maybe @lxning @HamidShojanazeri or @maaquib know

The way we scale is by increasing the number of workers in config.properties so imagine each worker is a different process with the same handler code so its embarrassingly parallel. One worker can be doing preprocessing while another could be doing inferencing.

If you're looking for source you can browse you can learn more here https://github.com/pytorch/serve/blob/master/docs/internals.md

alar0330 · 2022-04-06T17:43:35Z

@msaroufim Imagine that you have a video payload and you want to run inference on each frame. The decoding can be performed on CPU, and as soon as each frame (or batch of frames) gets decoded, we feed it to GPU for forward pass. This way we overlap CPU preproc with GPU model exec and can substantially reduce latencies + allow proc of arbitrary video length.

How (and if) can we do it with TorchServe today?

BraginIvan · 2022-04-06T18:50:01Z

@alar0330 based on my experience GPU process does not block CPU workers. I did not find the code to prove it, but I did several tests to prove it to myself.
But actually GPU needs its own CPU process and it loads a core, but if you have several cores, other cores will continue CPU bound preprocess tasks simultaneously.

BraginIvan · 2022-04-06T18:54:45Z

I guess you question is a bit different. If you want to process video you have to decode it on client side and use torchserve only for images.
If you will send whole video, then you will need to overwrite handle method and CPU-GPU will work synchronously. But I'm not sure

msaroufim · 2022-05-04T03:39:46Z

I'm sorry for the delay @alar0330 but it sounds like you're asking for pipelined execution when doing heavyweight preprocessing. As if today I don't believe we support this but could do something like this when I finish #1546

HamidShojanazeri · 2022-05-18T05:20:07Z

@msaroufim Imagine that you have a video payload and you want to run inference on each frame. The decoding can be performed on CPU, and as soon as each frame (or batch of frames) gets decoded, we feed it to GPU for forward pass. This way we overlap CPU preproc with GPU model exec and can substantially reduce latencies + allow proc of arbitrary video length.

How (and if) can we do it with TorchServe today?

@alar0330 if not mistaking, you are sending the whole video as one request? if its not streaming, then I think in a custom handler it should be doable, does something like this help?

class custom handler():

    def initialize ():
        load_model()

    def frame_process(video):
        processed_frame = process(video)
        return processed_frame

    def preprocess(request):
        video = decode(reuqest)

    def inference (video):
        inferences =[]
        number_of_frames = metadata(video)
        for i in range(number_of_frames): # or we could make a buffer here 
            frame = frame_process(video) # or spawn multiple processes to process the video frames, not sure if there is any perf hit here. 
            ouptut = model(frame)
            inferences.append(output)

msaroufim added the documentation Improvements or additions to documentation label Mar 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do GPU and CPU block each other? #1537

Do GPU and CPU block each other? #1537

BraginIvan commented Mar 28, 2022 •

edited

Loading

msaroufim commented Mar 28, 2022

alar0330 commented Apr 6, 2022

BraginIvan commented Apr 6, 2022

BraginIvan commented Apr 6, 2022

msaroufim commented May 4, 2022

HamidShojanazeri commented May 18, 2022 •

edited

Loading

Do GPU and CPU block each other? #1537

Do GPU and CPU block each other? #1537

Comments

BraginIvan commented Mar 28, 2022 • edited Loading

📚 Documentation

msaroufim commented Mar 28, 2022

alar0330 commented Apr 6, 2022

BraginIvan commented Apr 6, 2022

BraginIvan commented Apr 6, 2022

msaroufim commented May 4, 2022

HamidShojanazeri commented May 18, 2022 • edited Loading

BraginIvan commented Mar 28, 2022 •

edited

Loading

HamidShojanazeri commented May 18, 2022 •

edited

Loading