Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ONNX support for SaT models #129

Merged
merged 4 commits into from
Sep 24, 2024
Merged

add ONNX support for SaT models #129

merged 4 commits into from
Sep 24, 2024

Conversation

markus583
Copy link
Collaborator

This adds ONNX support for sat, sat-sm, and sat-lora models, and includes documentation and testing.

TODOs:

  • Use correct timings in the README. Current ones for ONNX are incorrect because I can't access a device with onnxruntime-gpu support. Can you @bminixhofer?
  • Is it fine to use model_optimized.onnx?
  • [WIP] ONNX weights upload to HF Hub for easy loading.
  • Why do we need to reverse the input_names in extract.py? It works but it is quite weird.

@markus583 markus583 added the enhancement New feature or request label Sep 10, 2024
@markus583
Copy link
Collaborator Author

I had to remove the exact timings. ORT on GPU just doesn't work on any of my systems, even with extensive trial and error. If you're fine with it @bminixhofer, we can merge!

@bminixhofer
Copy link
Collaborator

The naming of the inputs in the ONNX model was swapped (the attention_mask input was named input_ids and vice versa). It still worked because the arguments in the call in extract.py were swapped again, but I removed both swaps now (one by changing arg order in the export).

Also added timings for onnxruntime on GPU, it is indeed ~50% faster!

LGTM now, @markus583 maybe take a final look then we can merge and release.

@markus583 markus583 merged commit 00d2d6c into main Sep 24, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants