feat: Support weight-stripped engine and REFIT_IDENTICAL flag #3167

zewenli98 · 2024-09-19T15:42:14Z

Description

Supported weight-stripped engine for python runtime
Added REFIT_IDENTICAL flag

Fixes #3146

Type of change

New feature (non-breaking change which adds functionality)

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

zewenli98 · 2024-09-19T15:46:32Z

py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py

        name: str = "",
        settings: CompilationSettings = CompilationSettings(),  # Assumes engine was built with default compilation settings if object not passed
        weight_name_map: Optional[dict[Any, Any]] = None,
+        graph_module: torch.fx.GraphModule = None,


@narendasan I tried to do refitting for C++ runtime like for Python runtime but didn't work. Any suggestions? should I do in C++ or Python?

Doesnt refit already work on both apis?

Also why do we need the graph module in this module?

In this PR I moved the refitting part into TRTModule, so only works for Python runtime.

graph module is used for refitting

cehongwang · 2024-09-19T18:59:49Z

py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py

@@ -619,27 +609,32 @@ def run(
            builder_config, self.compilation_settings.timing_cache_path
        )

-        serialized_engine = self.builder.build_serialized_network(
+        # if strip_engine_weights is true, the serialized engine need to be refitted before using
+        maybe_unrefitted_serialized_engine = self.builder.build_serialized_network(


Why is this maybe unrefitted engine?

please see the design in the comment below. If compilation_settings.strip_engine_weights is true, it needs to be refitted, else it doesn't. so it's maybe

cehongwang · 2024-09-19T19:06:20Z

py/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py

+            ), "weight-stripped engines must be refittable, please set make_refittable=True"
+
+            # Refit the weights
+            refitter = trt.Refitter(self.engine, TRT_LOGGER)


Can you use this function?

TensorRT/py/torch_tensorrt/dynamo/_refit.py

Line 138 in fa02fd3

def _refit_single_trt_engine_with_gm(

The function requires input_list which is not provided in the caller.

narendasan · 2024-09-19T19:17:19Z

py/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py

@@ -121,6 +124,52 @@ def setup_engine(self) -> None:
        self.engine = runtime.deserialize_cuda_engine(self.serialized_engine)
        self.context = self.engine.create_execution_context()

+        if self.settings.strip_engine_weights:


We likely shouldnt be doing the refit in these modules

I think for weight stripping there are 3 workflows.

a user just wants a weight stripped engine. They should use convert_exported_program_to_trt_engine with settings strip_weights. The choice of make_refittable can be used to decide between kREFIT and kREFIT_IDENTICAL (though it might not be entirely clear so we might want to think about that setting).

We want to utilize weight stripping to have a lighter weight cache. Here this choice is opaque to the user. The user choice of make_refittable controls if we use kREFIT or kREFIT_IDENTICAL. But once the engine is loaded or we pull from cache we immediately refit (prior to passing the engine to the TRTModule). Same as we do today

The user wants a stripped weights compiled program (im not sure why or if this is a real usecase). Here, this is basically the same as lazy engine loading. We would require that users need to run through refit_engine_weights before executing.

Got it. The very beginning idea/design is commented below. I'll move the refitting part back to TRTInterpreter.run()

The choice of make_refittable can be used to decide between kREFIT and kREFIT_IDENTICAL

Do you mean we use make_refittable to control both kREFIT and kREFIT_IDENTICAL?

narendasan

@zewenli98 do you have a design for this feature?

zewenli98 · 2024-09-20T04:46:25Z

@narendasan Ok, at first the overall design was like:

In TRTInterpreter.run():

if compilation_settings.strip_engine_weights is True:
    if engine_cache not hit:
        1. build a weight-stripped engine
        2. save the weight-stripped engine if engine_cache is set
        3. return the weight-stripped engine (not yet refit)
    else:
        load and return the weight-stripped engine (not yet refit)
else:
    if engine_cache not hit:
        1. build a weight-included engine
        2. save the weight-included engine if engine_cache is set
        3. return the weight-included engine (don't need to refit)
    else:
        load and return the weight-included engine (not yet refit)

Then, in TRTModule, refit if necessary before inference.
The reason that I didn't put the refitting part into TRTInterpreter.run() is that I want to avoid repeated de/serializations of TRT engines: (1) deserialize in TRTInterpreter.run() for refitting and then serialize (2) deserialize in TRTModule again.

support weight-stripped engine and REFIT_IDENTICAL flag

77b1cfc

zewenli98 requested review from narendasan, peri044 and cehongwang September 19, 2024 15:42

facebook-github-bot added the cla signed label Sep 19, 2024

zewenli98 self-assigned this Sep 19, 2024

github-actions bot added component: tests Issues re: Tests component: conversion Issues re: Conversion stage component: api [Python] Issues re: Python API component: runtime component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Sep 19, 2024

github-actions bot requested a review from apbose September 19, 2024 15:42

zewenli98 commented Sep 19, 2024

View reviewed changes

cehongwang reviewed Sep 19, 2024

View reviewed changes

narendasan reviewed Sep 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support weight-stripped engine and REFIT_IDENTICAL flag #3167

feat: Support weight-stripped engine and REFIT_IDENTICAL flag #3167

zewenli98 commented Sep 19, 2024

zewenli98 Sep 19, 2024 •

edited

Loading

narendasan Sep 19, 2024

narendasan Sep 19, 2024

zewenli98 Sep 20, 2024

cehongwang Sep 19, 2024

zewenli98 Sep 20, 2024

cehongwang Sep 19, 2024

zewenli98 Sep 20, 2024

narendasan Sep 19, 2024 •

edited

Loading

zewenli98 Sep 20, 2024

narendasan left a comment

zewenli98 commented Sep 20, 2024

feat: Support weight-stripped engine and REFIT_IDENTICAL flag #3167

Are you sure you want to change the base?

feat: Support weight-stripped engine and REFIT_IDENTICAL flag #3167

Conversation

zewenli98 commented Sep 19, 2024

Description

Type of change

Checklist:

zewenli98 Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

narendasan Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

narendasan left a comment

Choose a reason for hiding this comment

zewenli98 commented Sep 20, 2024

zewenli98 Sep 19, 2024 •

edited

Loading

narendasan Sep 19, 2024 •

edited

Loading