[Core] Add MultiprocessingGPUExecutor #4539

njhill · 2024-05-01T19:37:15Z

This introduces a python multiprocessing-based executor that can be used as an alternative to Ray for single-node inferencing.

With the changes in this PR, Ray will continue to be used for parallel workers if it's installed, otherwise vanilla python multiprocessing is used. It can also be overridden with ~~--no-worker-use-ray~~ --distributed-executor-backed=mp.

By default, worker process are started using spawn. This can be changed to fork by setting env var VLLM_WORKER_MULTIPROC_METHOD=fork. fork mode has a benefit of starting faster.

The existing distributed tests have been updated to run with/without Ray.

Worker processes are shut down when the LLMEngine is garbage collected.

This replaces original PRs #3466 #2898 and #4345. It was originally co-authored by @sahilsuneja1.

This introduces a python multiprocessing-based executor that can be used as an alternative to Ray for single-node inferencing. With the changes in this PR, Ray will continue to be used for parallel workers if it's installed, otherwise vanilla python multiprocessing is used. It can also be overridden with --no-worker-use-ray. The existing distributed tests have been updated to run with/without Ray. Worker processes are shut down when the LLMEngine is garbage collected. Co-authored-by: SAHIL SUNEJA <suneja@us.ibm.com>

vrdn-23 · 2024-05-01T22:00:58Z

This is really great work @njhill. Thanks for all the effort!
Will this change also enable ray to become an optional dependency?

youkaichao · 2024-05-01T22:27:54Z

I think --no-worker-use-ray is bad. I suggest something like --distributed-executor-backend, which can be either ray or mp , and we might have more in the future.

njhill · 2024-05-01T22:45:35Z

I think --no-worker-use-ray is bad. I suggest something like --distributed-executor-backend, which can be either ray or mp , and we might have more in the future.

@youkaichao that sounds very reasonable, but maybe it could be a separate PR? This is not actually a newly introduced arg - there is already a boolean --worker-use-ray arg, it's just the (argparse-standard) way of specifying a "false" value for that arg.

njhill · 2024-05-01T22:48:51Z

This is really great work @njhill. Thanks for all the effort! Will this change also enable ray to become an optional dependency?

Yes, although ray is already optional if you are only using single-GPU. I do have some changes to make it an optional "extra" from a python package installation pov but was thinking of a follow-on PR to avoid making this one bigger.

youkaichao · 2024-05-01T22:52:42Z

If it is possible, I suggest add --distributed-executor-backend in this pr, and route --worker-use-ray to --distributed-executor-backend=ray . This PR can be enabled under --distributed-executor-backend=multiprocessing .

njhill · 2024-05-02T03:34:29Z

@youkaichao any idea why the ray distributed CI test might be failing now due to a gloo timeout? I think it's something to do with a second engine using TP being created in the same pytest process after the first one is shut down (the test now runs with mp executor followed by ray executor). This wasn't a problem with an earlier version of this PR, but I know you've made changes in this area. I will dig in more but just wanted to check if it's anything obvious to you.

youkaichao · 2024-05-02T03:41:08Z

Try to merge the main branch in? I'm not sure, but the latest commit i merge into main can pass the ci test

…c-gpu-executor

njhill · 2024-05-02T05:01:53Z

@youkaichao fyi the problem is still there after pulling in your latest fix commit, I'll try to narrow it down tomorrow.

youkaichao · 2024-05-02T05:11:31Z

My suspection is improper clean up. You can try to have one test for mp, and another for ray. Then they will not have interference.

youkaichao · 2024-05-02T05:46:44Z

@njhill maybe we should cancel the test for this pr, until you figure it out locally? otherwise the ci will be blocked.

…c-gpu-executor

…c-gpu-executor # Conflicts: # tests/distributed/test_basic_distributed_correctness.py

vrdn-23 · 2024-05-07T22:19:31Z

Sorry to keep asking, but is there any update on this?
@njhill @rkooo567

…c-gpu-executor

…c-gpu-executor # Conflicts: # .buildkite/test-pipeline.yaml

njhill · 2024-05-14T15:28:38Z

My suspection is improper clean up. You can try to have one test for mp, and another for ray. Then they will not have interference.

@youkaichao I've updated this now to run in separate tests. Do you think it would be worth opening a separate issue to address the distributed cleanup issue? Currently it seems you can't create an LLM with tensor parallel, delete it, and then create another one in the same process.

If it is possible, I suggest add --distributed-executor-backend in this pr, and route --worker-use-ray to --distributed-executor-backend=ray . This PR can be enabled under --distributed-executor-backend=multiprocessing .

I've now made this update as requested, can enable with --distributed-executor-backend=mp.

@youkaichao @rkooo567 hopefully this is now ready to merge? 🙏 the failing tests look unrelated (same failures on main branch).

We could discuss in a follow-on whether it makes sense to change the default from ray to mp.

rkooo567 · 2024-05-14T15:57:21Z

I will finish review it by today!

youkaichao · 2024-05-14T17:24:27Z

Do you think it would be worth opening a separate issue to address the distributed cleanup issue? Currently it seems you can't create an LLM with tensor parallel, delete it, and then create another one in the same process.

I have a comment on this: #4508 (comment)

TL;DR is Python garbage collection is unreliable. If we want to address the distributed cleanup issue, we need some user-interface change, like context manager to explicitly control the cleanup.

youkaichao

LGTM, thanks for the efforts!

njhill · 2024-05-14T17:38:34Z

Thanks again @youkaichao @rkooo567 @zhuohan123! And thanks for your patience @vrdn-23!

Co-authored-by: SAHIL SUNEJA <suneja@us.ibm.com>

kerthcet · 2024-06-12T09:38:02Z

Hi all, thanks for the efforts, just have one question, is there any performance difference between python multiprocessing and ray? Thanks in advance.

njhill · 2024-06-12T20:45:54Z

@kerthcet we found multiprocessing to be faster, but since #4894 the difference probably isn't very significant, especially for larger models.

Co-authored-by: SAHIL SUNEJA <suneja@us.ibm.com>

Merge remote-tracking branch 'refs/remotes/origin/main' into multipro…

079480f

…c-gpu-executor

rkooo567 self-assigned this May 2, 2024

njhill added 2 commits May 2, 2024 09:51

Merge remote-tracking branch 'refs/remotes/origin/main' into multipro…

3ef4da3

…c-gpu-executor

Merge remote-tracking branch 'refs/remotes/origin/main' into multipro…

e7d06d9

…c-gpu-executor # Conflicts: # tests/distributed/test_basic_distributed_correctness.py

njhill added 6 commits May 10, 2024 17:03

Merge remote-tracking branch 'refs/remotes/origin/main' into multipro…

5ac9525

…c-gpu-executor

Run tests in separate procs to work around gloo cleanup issue

f728fdd

Merge remote-tracking branch 'refs/remotes/origin/main' into multipro…

9161e5e

…c-gpu-executor

Also update test_chunked_prefill_distributed.py

3dc25c5

Introduce new distributed-executor-backend arg per @youkaichao

4c6e4ed

Merge remote-tracking branch 'refs/remotes/origin/main' into multipro…

1431de6

…c-gpu-executor # Conflicts: # .buildkite/test-pipeline.yaml

youkaichao approved these changes May 14, 2024

View reviewed changes

njhill merged commit 676a999 into vllm-project:main May 14, 2024
52 of 55 checks passed

njhill deleted the multiproc-gpu-executor branch May 14, 2024 17:39

zifeitong mentioned this pull request May 14, 2024

[Bugfix] Properly set distributed_executor_backend in ParallelConfig #4816

Merged

njhill mentioned this pull request May 15, 2024

[Core] Multiprocessing executor for single-node multi-GPU deployment #3466

Closed

hmellor mentioned this pull request May 18, 2024

Remove Ray for the dependency #208

Closed

robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request May 19, 2024

[Core] Add MultiprocessingGPUExecutor (vllm-project#4539)

6838a99

Co-authored-by: SAHIL SUNEJA <suneja@us.ibm.com>

dtrifiro pushed a commit to dtrifiro/vllm that referenced this pull request May 21, 2024

[Core] Add MultiprocessingGPUExecutor (vllm-project#4539)

3008471

Co-authored-by: SAHIL SUNEJA <suneja@us.ibm.com>

njhill mentioned this pull request May 25, 2024

[Bug]: with worker_use_ray = true, and tensor_parallel_size > 1, the process is pending forever #4639

Open

rcarrata mentioned this pull request Jun 3, 2024

[Doc]: Update the vllm distributed Inference and Serving with the new MultiprocessingGPUExecutor #5221

Closed

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

[Core] Add MultiprocessingGPUExecutor (vllm-project#4539)

623495f

Co-authored-by: SAHIL SUNEJA <suneja@us.ibm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Add MultiprocessingGPUExecutor #4539

[Core] Add MultiprocessingGPUExecutor #4539

njhill commented May 1, 2024 •

edited

Loading

vrdn-23 commented May 1, 2024

youkaichao commented May 1, 2024

njhill commented May 1, 2024

njhill commented May 1, 2024

youkaichao commented May 1, 2024

njhill commented May 2, 2024

youkaichao commented May 2, 2024

njhill commented May 2, 2024

youkaichao commented May 2, 2024

youkaichao commented May 2, 2024

vrdn-23 commented May 7, 2024 •

edited

Loading

njhill commented May 14, 2024

rkooo567 commented May 14, 2024

youkaichao commented May 14, 2024

youkaichao left a comment

njhill commented May 14, 2024

kerthcet commented Jun 12, 2024

njhill commented Jun 12, 2024

[Core] Add MultiprocessingGPUExecutor #4539

[Core] Add MultiprocessingGPUExecutor #4539

Conversation

njhill commented May 1, 2024 • edited Loading

vrdn-23 commented May 1, 2024

youkaichao commented May 1, 2024

njhill commented May 1, 2024

njhill commented May 1, 2024

youkaichao commented May 1, 2024

njhill commented May 2, 2024

youkaichao commented May 2, 2024

njhill commented May 2, 2024

youkaichao commented May 2, 2024

youkaichao commented May 2, 2024

vrdn-23 commented May 7, 2024 • edited Loading

njhill commented May 14, 2024

rkooo567 commented May 14, 2024

youkaichao commented May 14, 2024

youkaichao left a comment

Choose a reason for hiding this comment

njhill commented May 14, 2024

kerthcet commented Jun 12, 2024

njhill commented Jun 12, 2024

njhill commented May 1, 2024 •

edited

Loading

vrdn-23 commented May 7, 2024 •

edited

Loading