Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and add test for build_image.sh: Make it invariant to arguments order #2226

Merged
merged 7 commits into from
Apr 14, 2023

Conversation

fabridamicelli
Copy link
Contributor

Description

Current parsing of arguments in docker/build_image.sh is order dependant which makes it pretty easy to make mistakes.
I found it out by trying to build an image with gpu and a custom tag like so:

PY_VERSION=3.9
IMAGE_TAG="orga/repo:custom-tag"
./build_image.sh -py "${PY_VERSION}" -t "${IMAGE_TAG}" -g     # <--- notice the -g at the end

This produces an image with right gpu nvidia base image, but with the following wrong tag: pytorch/torchserve:latest-gpu.
The correct tag should be orga/repo:custom-tag.
This happens because the arguments parsing overrides the DOCKER_TAG like so:

        -g|--gpu)
          MACHINE=gpu
          DOCKER_TAG="pytorch/torchserve:latest-gpu"
          BASE_IMAGE="nvidia/cuda:11.7.0-cudnn8-runtime-ubuntu20.04"
          CUDA_VERSION="cu117"
          shift

Therefore using is as last argument (-g) (in particular, after -t) will lead to ignoring the custom tag (-t).

This PR:

  • shellcheck the whole script (unused variables, Make string replacement safer)
  • fixes the order-dependant arguments problem
  • adds a CI workflow testing builds explicitely with arguments in different orders

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Docker CI extension

Feature/Issue validation/testing

The fix happens by checking if a custom tag was passed and letting that override the DOCKER_TAG value like so:

if [ "$USE_CUSTOM_TAG" = true ]
then 
  DOCKER_TAG=${CUSTOM_TAG}
fi

So now running

PY_VERSION=3.9
IMAGE_TAG="orga/repo:custom-tag"
./build_image.sh -py "${PY_VERSION}" -t "${IMAGE_TAG}" -g

is equivalent to this:

./build_image.sh -py "${PY_VERSION}" -g -t "${IMAGE_TAG}"

and to this:

./build_image.sh -g -t "${IMAGE_TAG}" -py "${PY_VERSION}" 

and they all build an image with tag orga/repo:custom-tag and the same digest.

Checklist:

  • Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?

@codecov
Copy link

codecov bot commented Apr 11, 2023

Codecov Report

Merging #2226 (8b02d90) into master (9edd461) will not change coverage.
The diff coverage is n/a.

❗ Current head 8b02d90 differs from pull request most recent head 1db211a. Consider uploading reports for the commit 1db211a to get more accurate results

@@           Coverage Diff           @@
##           master    #2226   +/-   ##
=======================================
  Coverage   71.41%   71.41%           
=======================================
  Files          73       73           
  Lines        3348     3348           
  Branches       57       57           
=======================================
  Hits         2391     2391           
  Misses        954      954           
  Partials        3        3           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@msaroufim msaroufim requested a review from agunapal April 11, 2023 20:05
Copy link
Collaborator

@agunapal agunapal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabridamicelli Thanks for the PR!

Could you please consolidate the workflow file here with the the other one you created.
You could call it Docker CI.

Ideally, we would want a single script to be run which will call a bunch of tests.

@fabridamicelli
Copy link
Contributor Author

fabridamicelli commented Apr 12, 2023

@agunapal
Thank you for the feedback!

Could you please consolidate the workflow file here with the the other one you created. You could call it Docker CI.

That makes sense. I put all together under docker-ci.yaml workflow now so that it makes more sense.
Notice that I added the test build-image-tagging step first in the job in order to take advantage of all the docker caches in the following steps.

PS: I believe we should have a battery of usage examples (like the one here) that we consider to be fundamental and we want to guarantee that every new released image leads to a container that successfully runs those out of the box.
If you agree with that, can you (maybe @msaroufim as well?) point to a handful of such examples that are more important in your opinion? We could then step by step integrate them in Docker CI to build a bit more trust around the testing of the images.

@agunapal
Copy link
Collaborator

@fabridamicelli
Thanks for taking this up!

Please feel free to split the below items over 1 or multiple PRs as you see fit.

Here is the benchmark config we run.
If you check the yaml files, it will point you to the exact example, config etc.

You don't need to exactly follow this. I would say one example with ResNet, one with HF Transformers, One with Accelerated Transformers, one with ONNX or TensorRT would be a great start. I would also recommend to design in a way which makes it easy for you to extend the tests with number of workers, batch_size etc.

@fabridamicelli
Copy link
Contributor Author

@agunapal

@fabridamicelli Thanks for taking this up!

Please feel free to split the below items over 1 or multiple PRs as you see fit.

Here is the benchmark config we run. If you check the yaml files, it will point you to the exact example, config etc.

You don't need to exactly follow this. I would say one example with ResNet, one with HF Transformers, One with Accelerated Transformers, one with ONNX or TensorRT would be a great start. I would also recommend to design in a way which makes it easy for you to extend the tests with number of workers, batch_size etc.

Great, thank you for the hints. I will try in following PRs to start putting something simple together that is extensible so that we can add step by step more examples to the test battery.

Copy link
Collaborator

@namannandan namannandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Just one minor nitpick about organizing the test files.

IMGS_FILE="test_images.json"
docker images --no-trunc --format "{{json .}}" | jq '{"repo": .Repository, "tag": .Tag, "digest": .ID}' | jq -s > "${IMGS_FILE}"

python <<EOF
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Would it be better to put this python code in a separate file and invoke it here or maybe implement the entire test in python?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @namannandan . Hi @fabridamicelli this is perhaps another thing to consider for future PRs on adding more tests. It would be great if these can be implemented using pytest. You can take a look at the this directory to see examples of how this is being done for the non-docker examples. We don't have an example with docker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reviewing and for the feedback !

@namannandan
I can understand the point about the combination of python/bash code in one script. My logic behind having everything in one script is to kind of have an atomic component with (1 test -> 1 file) – and the mixture will anyways exist if we call things from python with subprocess.run. Still we could refactor it as you suggest and have it all in python.

@agunapal
Regarding pytest: I take the point and I will check the other existing tests to follow the standards a bit closer, so we can step by step refactor things and iterate a bit on the design of the docker tests such that it is easy to extend with more tests and usage examples. I still have the feeling that it makes sense to keep docker tests under docker/ for docker things to be all in one place..

@agunapal agunapal merged commit 1af13ae into pytorch:master Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants