[Bugfix] Add custom Triton cache manager to resolve MoE MP issue #6140

tdoublep · 2024-07-04T13:21:35Z

We have been using this fix via our fork (see here) for a while and it seems stable.

~~Note, this will only resolve the problem if you are using vLLM from the docker image~~. Maybe a better approach would be to bundle the custom cache manager code inside vllm package, that way it will get shipped via pip install too, and the user could still set env variable to enable it.

Update: I've now implemented it by including the custom cache manager inside vLLM and setting the necessary env variable via code.

cc @jeejeelee

Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com> Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

jeejeelee · 2024-07-04T13:40:31Z

Thanks

IMHO, this issue should be addressed by bundling the custom cache manager code inside the vllm.

cc @simon-mo @youkaichao @Yard1

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

njhill · 2024-07-04T14:48:09Z

Thanks @tdoublep, I had mentioned this to @youkaichao previously but kept forgetting to open a PR.

Not immediately obvious why this seems to only affect the mp case and not ray.

I agree that it would be better for this to be incorporated into the library if possible.

I wonder if we could open a PR or issue in the triton for this (if one doesn't already exist)

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

njhill · 2024-07-04T14:48:55Z

triton_patch/custom_cache_manager.py

+            else:
+                raise RuntimeError("Could not create or locate cache dir")
+
+        print(f"Triton cache dir: {self.cache_dir=}")


This should probably be a debug log instead, it produces a lot of output.

have just removed it for now

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

tdoublep · 2024-07-04T14:52:55Z

@njhill @jeejeelee I have re-implemented it as part of the vllm library.

One thing I'm not sure about is whether setting the env variable from fused_moe code is sufficient, or whether there are other parts of the code where this fix would be needed. Maybe it's OK for now.

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

tdoublep · 2024-07-04T15:09:45Z

After reading the conversation here, it sounds like we would also need to set this env variable accordingly when using Triton punica kernel (e.g., once we merge this PR).

jeejeelee · 2024-07-04T15:34:37Z

After reading the conversation here, it sounds like we would also need to set this env variable accordingly when using Triton punica kernel (e.g., once we merge this PR).

Even if we don't consider #5036, prefix_prefill and triton_flash_attention are still necessary.

jeejeelee · 2024-07-04T15:41:03Z

vllm/model_executor/layers/fused_moe/fused_moe.py

+def maybe_set_triton_cache_manager(module: str) -> None:
+    cache_manger = os.environ.get("TRITON_CACHE_MANAGER", None)
+    if cache_manger != module:
+        os.environ["TRITON_CACHE_MANAGER"] = module


If the user manually sets this env, can we modify it? Additionally, I suggest adding a log message for clarification

have changed it so that we only set it if the user has not. also added a log message

tdoublep · 2024-07-04T17:28:22Z

Even if we don't consider #5036, prefix_prefill and triton_flash_attention are still necessary.

@jeejeelee ok, in that case I guess it makes sense to call maybe_set_triton_cache_manager in one single place rather than in each individual place we use Triton. perhaps we can do it if we detect tp>1 and multi-processing being used?

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

tdoublep · 2024-07-04T19:54:43Z

@njhill @jeejeelee I've moved the call to maybe_set_triton_cache_manager to the MultiprocessingGPUExecutor. I guess this is safer and should cover all cases we need.

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

tdoublep · 2024-07-05T11:20:39Z

CI tests failures look like network blips (Read timed out.)

simon-mo · 2024-07-05T17:03:58Z

vllm/triton_utils/custom_cache_manager.py

+        os.environ["TRITON_CACHE_MANAGER"] = manager
+
+
+class CustomCacheManager(FileCacheManager):


Document why do we need this?

added some docstrings

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

tdoublep · 2024-07-05T19:53:36Z

CI failure looks unrelated:

FAILED distributed/test_multimodal_broadcast.py::test_models[5-128-half-2] - huggingface_hub.utils._errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on

youkaichao · 2024-07-06T01:54:37Z

Not immediately obvious why this seems to only affect the mp case and not ray.

I'm also wondering this, too. cc anyscale folks @cadedaniel @Yard1 for visibility.

comaniac · 2024-07-11T04:34:58Z

vllm/triton_utils/custom_cache_manager.py

+    """Re-implements Triton's cache manager, ensuring that a
+    unique cache directory is created for each process. This is
+    needed to avoid collisions when running with tp>1 and
+    using multi-processing as the distributed backend.


If triton 3.0.0 could solve this problem, it'd be better to note here that this custom cache manager can be removed when we upgrade triton.

The fix for the issue is not yet in v3.0.0, but I guess would be in whatever version comes after that (see my summary here). I will add a comment to that end.

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

tdoublep · 2024-07-15T09:04:22Z

All comments have been addressed. Is there anything else you would like to see? @comaniac @njhill @jeejeelee @simon-mo

I think it would be good to get this one in since there are quite a few people struggling with this issue.

comaniac

LGTM. cc @Yard1 to take a final pass.

simon-mo · 2024-07-15T17:12:45Z

merging to unblock release

…m-project#6140) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com>

fixed in vllm-project#6140

fixed in vllm-project#6140 fixes https://issues.redhat.com/browse/RHOAIENG-8043

tdoublep and others added 2 commits July 4, 2024 09:18

Dockerfile: use custom cache manager in Triton.

6a15062

Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com> Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

Minor error in Dockerfile

0d54387

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

Include custom cache manager as part of vllm

b803165

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

Add __init__.py

d3ef0d8

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

njhill reviewed Jul 4, 2024

View reviewed changes

tdoublep changed the title ~~[Bugfix] Install custom cache manager in Dockerfile to resolve Triton MoE MP issue~~ [Bugfix] Add custom Triton cache manager to resolve MoE MP issue Jul 4, 2024

remove triton_patch dir

eb5c892

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

Format

81eef8a

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

jeejeelee reviewed Jul 4, 2024

View reviewed changes

Address review comments

b040645

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

Only change cache manager for tp>1

4dd9367

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

jeejeelee mentioned this pull request Jul 5, 2024

[Bug]: fused_moe_kernel compile bug #6103

Closed

simon-mo reviewed Jul 5, 2024

View reviewed changes

Add some docstrings

889d6dd

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

youkaichao mentioned this pull request Jul 7, 2024

[Bug]: Multiprocessing FileNotFound error in triton cache #6180

Closed

tdoublep mentioned this pull request Jul 10, 2024

[Bugfix] Require triton >= 3.0.0 to resolve issue with MoE and TP>1 #6304

Closed

comaniac reviewed Jul 11, 2024

View reviewed changes

Update docstring

3307522

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

LSC527 mentioned this pull request Jul 11, 2024

Support Deepseek-V2 #4650

Merged

comaniac approved these changes Jul 15, 2024

View reviewed changes

WoosukKwon mentioned this pull request Jul 15, 2024

bump version to v0.5.2 #6433

Merged

simon-mo merged commit eaec4b9 into vllm-project:main Jul 15, 2024
71 checks passed

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

[Bugfix] Add custom Triton cache manager to resolve MoE MP issue (vll…

cdce38a

…m-project#6140) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com>

dtrifiro added a commit to dtrifiro/vllm that referenced this pull request Jul 24, 2024

Dockerfile.ubi: get rid of custom cache manager

4fe7eb5

fixed in vllm-project#6140

dtrifiro added a commit to dtrifiro/vllm that referenced this pull request Jul 24, 2024

Dockerfile.ubi: get rid of custom cache manager

c7d5ce2

fixed in vllm-project#6140 fixes https://issues.redhat.com/browse/RHOAIENG-8043

dtrifiro mentioned this pull request Jul 24, 2024

Dockerfile.ubi: get rid of custom cache manager opendatahub-io/vllm#108

Merged

dtrifiro added a commit to dtrifiro/vllm that referenced this pull request Jul 24, 2024

Dockerfile.ubi: get rid of custom cache manager

58b7419

fixed in vllm-project#6140 fixes https://issues.redhat.com/browse/RHOAIENG-8043

hnyls2002 mentioned this pull request Jul 25, 2024

Trouble Shooting sgl-project/sglang#548

Closed

dtrifiro added a commit to opendatahub-io/vllm that referenced this pull request Jul 25, 2024

Dockerfile.ubi: get rid of custom cache manager

7a21f52

fixed in vllm-project#6140 fixes https://issues.redhat.com/browse/RHOAIENG-8043

dtrifiro added a commit to opendatahub-io/vllm that referenced this pull request Aug 6, 2024

Dockerfile.ubi: get rid of custom cache manager

8ca9ede

fixed in vllm-project#6140 fixes https://issues.redhat.com/browse/RHOAIENG-8043

dtrifiro added a commit to dtrifiro/vllm that referenced this pull request Sep 13, 2024

Dockerfile.ubi: get rid of custom cache manager

bf46368

fixed in vllm-project#6140 fixes https://issues.redhat.com/browse/RHOAIENG-8043

dtrifiro added a commit to opendatahub-io/vllm that referenced this pull request Sep 13, 2024

Dockerfile.ubi: get rid of custom cache manager

d40e335

fixed in vllm-project#6140 fixes https://issues.redhat.com/browse/RHOAIENG-8043

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Add custom Triton cache manager to resolve MoE MP issue #6140

[Bugfix] Add custom Triton cache manager to resolve MoE MP issue #6140

tdoublep commented Jul 4, 2024 •

edited

Loading

jeejeelee commented Jul 4, 2024

njhill commented Jul 4, 2024

njhill Jul 4, 2024

tdoublep Jul 4, 2024

tdoublep commented Jul 4, 2024

tdoublep commented Jul 4, 2024

jeejeelee commented Jul 4, 2024

jeejeelee Jul 4, 2024

tdoublep Jul 4, 2024 •

edited

Loading

tdoublep commented Jul 4, 2024

tdoublep commented Jul 4, 2024

tdoublep commented Jul 5, 2024

simon-mo Jul 5, 2024

tdoublep Jul 5, 2024

tdoublep commented Jul 5, 2024

youkaichao commented Jul 6, 2024

comaniac Jul 11, 2024

tdoublep Jul 11, 2024

tdoublep Jul 11, 2024

tdoublep commented Jul 15, 2024

comaniac left a comment

simon-mo commented Jul 15, 2024

		os.environ["TRITON_CACHE_MANAGER"] = manager


		class CustomCacheManager(FileCacheManager):

[Bugfix] Add custom Triton cache manager to resolve MoE MP issue #6140

[Bugfix] Add custom Triton cache manager to resolve MoE MP issue #6140

Conversation

tdoublep commented Jul 4, 2024 • edited Loading

jeejeelee commented Jul 4, 2024

njhill commented Jul 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tdoublep commented Jul 4, 2024

tdoublep commented Jul 4, 2024

jeejeelee commented Jul 4, 2024

Choose a reason for hiding this comment

tdoublep Jul 4, 2024 • edited Loading

Choose a reason for hiding this comment

tdoublep commented Jul 4, 2024

tdoublep commented Jul 4, 2024

tdoublep commented Jul 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tdoublep commented Jul 5, 2024

youkaichao commented Jul 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tdoublep commented Jul 15, 2024

comaniac left a comment

Choose a reason for hiding this comment

simon-mo commented Jul 15, 2024

tdoublep commented Jul 4, 2024 •

edited

Loading

tdoublep Jul 4, 2024 •

edited

Loading