Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nightly Release of Torch-MLIR python wheels for linux_x86_64 builds #1

Merged
merged 2 commits into from
Feb 8, 2024

Conversation

sjain-stanford
Copy link
Member

@sjain-stanford sjain-stanford commented Feb 8, 2024

Reuses build_linux_packages.sh script to build the project and generate python wheels. I however chose to not port the original github actions workflows for snapshot releases, which seemed to do a lot more than what I initially wanted. Here's what's roughly different in this trimmed down version:

  • html patching for hosting pip wheels is no longer used - instead went with the expand_assets trick, similar to openxla/stablehlo
  • only release linux x86_64 prebuilt wheels (not macos, windows, aarch64 builds) - this may be updated later by interested parties

I used a standalone fork which was useful for the initial bringup - it took a good 30-ish iterations on main branch to nail the github workflow right, and some playing with auth tokens.

Here's how the release page looks like:

Here's the workflow:

@stellaraccident and I took a call to avoid using Personal Access Tokens (PAT) here for release permissions, and instead elevating permissions on GITHUB_TOKEN for contents scope (required for an action to make a release). We think this is reasonable considering this repo is locked down in terms of number of contributors with write access, and that this is scope specific and not a write-all.

- name: Make assets available in dist
run: |
mkdir dist
cp build_tools/python_deploy/wheelhouse/torch*.whl dist/
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This includes both torch and torch_mlir wheels, sticking to the convention of previous snapshot releases.

minor

fix py_version

fix

test hendrik/ccache, and split py3.10 and py3.11 insto separate jobs

include torch wheels too, add release body message

remove ccache entirely

try GITHUB_TOKEN

elevate permissions for GITHUB_TOKEN

trial

fix

fix

fix

fixes
@sjain-stanford sjain-stanford merged commit 3c6d32e into main Feb 8, 2024
@sjain-stanford sjain-stanford deleted the sambhav/release_wheels branch February 8, 2024 16:08
@stellaraccident
Copy link
Collaborator

Thanks. As discussed on discord, let's land and iterate on these. It can be tricky to bring things up from scratch. I'll do a detailed review of the whole.

@sjain-stanford
Copy link
Member Author

The first set of wheels are up: https://github.com/llvm/torch-mlir-release/releases 🎉
I'll update the README with instructions, but this should work (once the repo is made public)

pip install torch-mlir -f https://github.com/llvm/torch-mlir-release/releases/expanded_assets/dev-wheels

sjain-stanford added a commit to cruise-automation/mlir-tcp that referenced this pull request Feb 12, 2024
…r dynamo exported models (#37)

This PR includes support for calling into the TorchDynamo export + FX
import APIs in torch-mlir to generate `Torch` dialect, followed by
conversions to TCP dialect. This is bundled together with a fully
functional hermetic python sandbox in TCP bazel, with access to
torch-mlir and core mlir's python APIs and ability to run python
lit/integration style tests.

This depended on a bunch of foundational work:
1. Add a hermetic python interpreter to `mlir-tcp` bazel: Needed for
mlir/torch-mlir pybind integrations, python lit+integration tests.
2. Register python bindings for torch-mlir/core mlir: Upstreaming
torch-mlir's bazel python build had several issues (as it's heavily
dependent on Cruise's internal bazel workspace), so I decided to instead
install from pre-built python wheels (for torch and torch-mlir). This
allows keeping the setup in TCP lean, and leverage/reuse torch-mlir's
Cmake python build for any python integrations.
3. Automate releases of torch-mlir python binaries
(llvm/torch-mlir-release#1): This was
[discontinued](https://discourse.llvm.org/t/rfc-discontinuing-pytorch-1-binary-releases/76371)
as a part of the CI revamp.
4. Support for running python lit tests: Updated lit configs with python
interpreter integration.
5. Add `fx_importer/basic_test.py` (as a python lit test) and mlir lit
tests for TorchToTcp conversions.


Other minor workflow updates:
- Use `--test_output=errors` for better log readability in CI failures
- Add setup-build/action.yml to consolidate workspace setup
- Include disk space cleanup actions (due to OOM issues with default GHA
runners)
- Bump actions versions

P.S. Our stablehlo build fails at the moment due to disk space issues
(unrelated to this change, verified it passes locally). This is just an
outcome of using default GitHub hosted runners for CI with very limited
disk space resources. Adding some cleanup steps helped with llvm build
but still fails stablehlo. We might want to look into self-hosted
runners with larger memory/compute going forward. For now the non-TCP
workflows are not merge-gating (optional) so this is not a blocker to
this PR landing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants