Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove tarball.WithCompressedCaching flag to resolve OOM Killed error #1722

Merged
merged 3 commits into from
Oct 19, 2021

Conversation

Phylu
Copy link
Contributor

@Phylu Phylu commented Aug 13, 2021

Description

Large images cannot be build as the kaniko container will be killed due to an OOM error. Removing the tarball compression drastically reduces the memory required to push large image layers. Fixes #1680

Removing the tarball compression may increase the build time for smaller images. Therefore a command line option to disable the compression was chosen for the implementation.

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

See the contribution guide for more details.

Reviewer Notes

  • The code flow looks good.
  • Unit tests and or integration tests added.

Release Notes

- a command line flag `--compressed-caching` enables the configuration whether kaniko uses tarball compression when copying image layers to prevent an out of memory error for large images

@google-cla
Copy link

google-cla bot commented Aug 13, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@google-cla google-cla bot added the cla: no CLA not signed by all commit authors label Aug 13, 2021
@Phylu
Copy link
Contributor Author

Phylu commented Aug 13, 2021

@googlebot I signed it!

@google-cla google-cla bot added cla: yes CLA signed by all commit authors and removed cla: no CLA not signed by all commit authors labels Aug 13, 2021
@Phylu
Copy link
Contributor Author

Phylu commented Sep 17, 2021

I just improved my PR by adding a command line flag. Now it is possible to set --compressed-caching=false to disable the compression, which lets the build work even with those large images.

I really hope, that one of the maintainer can pick this up.

Large images cannot be build as the kaniko container will be killed due to an OOM error. Removing the tarball compression drastically reduces the memory required to push large image layers. Fixes GoogleContainerTools#1680

This change may increase the build time for smaller images. Therefore a command line option to trigger the compression or a more intelligent behaviour may be useful.
@tejal29
Copy link
Member

tejal29 commented Oct 19, 2021

Rebased over latest master! Will merge and pick up for release.

@tejal29 tejal29 merged commit 46e0134 into GoogleContainerTools:master Oct 19, 2021
@Phylu Phylu deleted the 1680-oom-killed branch October 19, 2021 09:05
stefannica added a commit to stefannica/extensions that referenced this pull request Nov 15, 2021
If the k8s node where the MLFlow builder step is running doesn't
have a lot of memory, the builder step will fail if it has to build
larger images. For example, building the trainer image for the keras
CIFAR10 codeset example resulted in an OOM failure on a node where
only 8GB of memory were available.

This is a known kaniko issue [1] and there's a fix available [2] with
more recent (>=1.7.0) kaniko versions: disabling the compressed
caching via the `--compressed-caching` command line argument.

This commit models a workflow input parameter mapped to this
new command line argument. To avoid OOM errors with bigger
images, the user may set it in the workflow like so:

```
  - name: builder
    image: ghcr.io/stefannica/mlflow-builder:latest
    inputs:
      - name: mlflow-codeset
        codeset:
          name: '{{ inputs.mlflow-codeset }}'
          path: /project
      - name: compressed_caching
        # Disable compressed caching to avoid running into OOM errors on cluster nodes with lower memory
        value: false
```

[1] GoogleContainerTools/kaniko#909
[2] GoogleContainerTools/kaniko#1722
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes CLA signed by all commit authors kokoro:run
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kaniko gets OOMKilled while building vast images
3 participants