Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Client-side input shape/element validation #742

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

yinggeh
Copy link
Contributor

@yinggeh yinggeh commented Jul 9, 2024

What does the PR do?

Add client input size check to make sure input shape byte size matches input data byte size.

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • feat

Related PRs:

triton-inference-server/server#7427

Where should the reviewer start?

src/c++/library/common.cc
src/python/library/tritonclient/grpc/_infer_input.py
src/python/library/tritonclient/http/_infer_input.py

Test plan:

n/a

  • CI Pipeline ID:
    17202351

Caveats:

Shared memory byte size checks for string inputs is not implemented.

Background

Stop malformed input request at client side before sending to the server.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Relates to triton-inference-server/server#7171

@yinggeh yinggeh changed the title feat: Client Input Byte Size Checks feat: Client input byte size checks Jul 9, 2024
src/c++/library/common.cc Outdated Show resolved Hide resolved
@@ -275,6 +277,10 @@ if(TRITON_ENABLE_CC_HTTP OR TRITON_ENABLE_PERF_ANALYZER)
http-client-library EXCLUDE_FROM_ALL OBJECT
${REQUEST_SRCS} ${REQUEST_HDRS}
)
add_dependencies(
http-client-library
proto-library
Copy link
Contributor

@rmccorm4 rmccorm4 Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't proto-library used for protobuf<->grpc? Why is it needed for HTTP client?

edit: guessing the requirement is here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any concerns with introducing the new protobuf dependency to the HTTP client, or any alternatives? CC @GuanLuo @tanmayv25

src/c++/library/common.cc Outdated Show resolved Hide resolved
@yinggeh
Copy link
Contributor Author

yinggeh commented Jul 26, 2024

There is a known issue with TensorRT (Jira DLIS-6805 ) which causes TRT tests to fail again at client (CI job 102924904). There is no way to know the platform of inference model at the client side. Should we wait until @pskiran1 finish his change first?
CC @tanmayv25 @GuanLuo @rmccorm4

@pskiran1
Copy link

There is a known issue with TensorRT (Jira DLIS-6805 ) which causes TRT tests to fail again at client (CI job 102924904). There is no way to know the platform of inference model at the client side. Should we wait until @pskiran1 finish his change first? CC @tanmayv25 @GuanLuo @rmccorm4

@yinggeh, I just merged DLIS-6805 changes, could you please try with the latest code?

 into yinggeh-DLIS-6657-client-input-byte-size-check
@@ -232,6 +236,26 @@ InferInput::SetBinaryData(const bool binary_data)
return Error::Success;
}

Error
InferInput::ValidateData() const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving TRT reformat conversation to a thread 🧵

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yingge:

There is a known issue with TensorRT (Jira DLIS-6805 ) which causes TRT tests to fail again at client (CI job 102924904). There is no way to know the platform of inference model at the client side. Should we wait until @pskiran1 finish his change first?
CC @tanmayv25 @GuanLuo @rmccorm4

Sai:

@yinggeh, I just merged DLIS-6805 changes, could you please try with the latest code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just merged DLIS-6805 changes, could you please try with the latest code?

Sai's changes will allow the check to work on core side, but probably not on client side, right? @yinggeh

There is no way to know the platform of inference model at the client side.

You can query the platform/backend through the model config APIs on client side, which would work when inferring on a TensorRT model directly. You can probably even query is_non_linear_format_io from the model config if needed.

For an ensemble model containing one of these TRT models with non-linear inputs, you may need to follow the ensemble definition to find out if it's calling a TRT model with its inputs, which can be a pain. It may be simpler to skip the check on ensemble models and let the core check handle it (but it feels like we're starting to introduce a lot of special checks and cases with this feature).

For a BLS model, I think it's fine and will work as any other python model, then it will trigger the core check internally if the BLS is calling the TRT model.

CC @tanmayv25

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative is to introduce a new flag in client Input/Output tensors to skip the byte size check on the client side.
We can document when we expect the user to provide this option (using non-linear format).
This way the user can be aware of what they are doing.
Pro:

  1. Generic API change allows for all the flexibility
  2. Powerful expression for the client-side code.

Cons:

  1. Adding a flag to skip these checks seems to be counter-intuitive and makes us question even the requirement of such checks in the first place.
    a. This can be alleviated by an additional check to some degree by validating the skip_byte_size check flag is set for the correct scenario.
  2. Breaks backwards compatibility, as the user now has to set a new flag to use models with non-linear tensors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an ensemble model containing one of these TRT models with non-linear inputs, you may need to follow the ensemble definition to find out if it's calling a TRT model with its inputs, which can be a pain.

@rmccorm4 Can you elaborate on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this?

If you have an ensemble with ENSEMBLE_INPUT0 where the first step is a TRT model with non-linear IO INPUT0 and a mapping of ENSEMBLE_INPUT0 -> INPUT0, do we require an ensemble config to mention that the ENSEMBLE_INPUT0 is non-linear IO too? Or is it inferred internally?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a flag to skip these checks seems to be counter-intuitive and makes us question even the requirement of such checks in the first place

+1 that I think this is counter-intuitive to the goal

This can be alleviated by an additional check to some degree by validating the skip_byte_size check flag is set for the correct scenario.

If we are able to internally determine "the correct scenario" programatically, isn't this the same as being able to skip internally without user specification?

@yinggeh yinggeh force-pushed the yinggeh-DLIS-6657-client-input-byte-size-check branch from 82b8d57 to 2a5c507 Compare July 31, 2024 02:23
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
@yinggeh yinggeh requested a review from rmccorm4 July 31, 2024 02:25
@yinggeh yinggeh force-pushed the yinggeh-DLIS-6657-client-input-byte-size-check branch from 2b34379 to a584741 Compare August 2, 2024 21:48
@rmccorm4 rmccorm4 changed the title feat: Client input byte size checks feat: Client-side input shape/element validation Aug 5, 2024
cnt += self._raw_content != None
cnt += "shared_memory_region" in self._input.parameters
if cnt != 1:
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we return an error when more that one fields are specified in the inputs?

Copy link
Contributor Author

@yinggeh yinggeh Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error was handled by the server.

data_num_elements = num_elements(self._data_shape)
if expected_num_elements != data_num_elements:
raise_error(
"input '{}' got unexpected elements count {}, expected {}".format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also include the respective shapes in the error message as well?

something like:

input 'XYZ' got unexpected elements count 8 (shape: 8,1), expected 16 (shape: 16,1)

I think you are trying to keep supporting the case where a user might just want to call set_shape() with a different shape but same underlying data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are shapes (8,2), (4,4) also valid?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

Successfully merging this pull request may close these issues.

5 participants