Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add history metadata for container builds #871

Merged
merged 2 commits into from
Nov 9, 2018

Conversation

davidcassany
Copy link
Collaborator

This commit adds the history section in contianerconfig. With it
'author', 'created_by' and 'comment' can be customized. In addition
'created' is always included with the image creation date time.
'created_by' entry is set to 'KIWI version' by default if nothing
is provided.

Fixes #852

Copy link
Collaborator

@Vogtinator Vogtinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a typo.

Looks good from the XML, but I wonder whether it makes sense to allow specifying multiple history entries - I think kiwi does everything in one transaction.

I'm not opposed to it though, it might even be useful in the future.

Can you build kiwi with this PR on OBS so that I can have a try?

kiwi/container/oci.py Outdated Show resolved Hide resolved
@davidcassany
Copy link
Collaborator Author

@schaefi, @Vogtinator while implementing it I noticed we should agree on the behavior of the history records. With the current implementation a type section like the following one:

<type image="docker">
 <containerconfig
   name="opensuse"
   tag="tumbleweed"
   maintainer="David Cassany &lt;dcassany@suse.com&gt;">
   <history author="David" created_by="A kiwi build">This is some comment for history</history>
 </containerconfig>
</type>

Produces a docker history opensuse:tumbleweed output like the following one:

IMAGE               CREATED              CREATED BY          SIZE                COMMENT
9e0d44e62083        About a minute ago   A kiwi build        0B                  This is some comment for history
<missing>           About a minute ago   KIWI 9.16.36        113MB               umoci repack

I noticed that each umoci config and umoci repack creates a new history record into the image. Thus for any KIWI build we will have at least a couple of history entries. In both cases umoci can customize those entries with the flags:

--history.comment
--history.author
--history.created
--history.created_by

With the suggested implementation --history.created is always used for all commands with the build time date, this is not customizable in any case.

umoci repack is also not customizable and comment and created_by history values are set by default to umoci repack and to KIWI __version__.

The last umoci config call is customizable, in that case created_by, author and comment can be defined like in the above <containerconfig> example. In that case if no history is provided created_by is set to the default value: KIWI __version__. comment and author have no default value.

My doubts here are related to default values and what happens if multiple umoci config or umoci repack calls are executed (each additional tag represents at least an umoci config call).

Regarding default values:

  • We could assume that default the author is the same author value provided inside <description><author>
  • In the same direction we could assume that the default comment is the same value provided inside <description><specification>

I am not a fan of those options, but would make the description file less verbose and also fill history records with the user provided data.

Regarding multiple umoci calls causing new history entries:

  • We could use the same history flags for every single call, in that case there would be duplicated entries in history.
  • We could use the history flags only for the last umoci config call, in that case there would be at a least an entry with an empty comment and umoci config as the CREATED BY value.
  • We could use the history flags for the last umoci config call and use some defaults for the rest.

All those options have very little impact on the current suggested implementation, so I am opened to agree on a different behavior.

@Vogtinator
Copy link
Collaborator

Isn't it possible to just use a single umoci call for everything?

@davidcassany
Copy link
Collaborator Author

Isn't it possible to just use a single umoci call for everything?

No, at least umoci-repack and umoci-config calls are required. After commenting it with @cyphar an option could be to add a --no-history flag in umoci-config or some other similar approach.

In any case we need to agree about the expectations for history records using current umoci.

@schaefi
Copy link
Collaborator

schaefi commented Nov 7, 2018

I think a good solution would be the ability to have a --no-history flag for umoci repack/config commands. I think that's what @cyphar suggested to add. However the current umoci doesn't provide this feature. Given it can be added I vote for that solution plus an updated version check in the check_docker_tool_chain_installed() runtime check.

Having a history entry for the umoci repack looks weird to me as we need that to process the initial creation of the container archive only. If we have to add a comment it imho should express something like kiwi: initial archive packing. Having a history entry for the umoci config could express kiwi: initial container setup.

That information wouldn't be that meaningful and I think if the container is used as a base container all that history entries gets inherited which creates a weird history.

So my first vote would be to have an option to skip the history entry such that we can create an image which only provides meaningful history entries

Thoughts ?

@davidcassany
Copy link
Collaborator Author

Having a history entry for the umoci repack looks weird to me as we need that to process the initial creation of the container archive only.

umoci repack includes history entries because each layer requires at least one history entry and umoci repack is actually appending a new layer to the image. Up to my understanding container configuration (umoci config) is not tight to any layer, is image wide, thus we could even reorganize our calls order and call umoci-repack as the last step, then probably it looks more natural to include history records there.

@Vogtinator
Copy link
Collaborator

No, at least umoci-repack and umoci-config calls are required. After commenting it with @cyphar an option could be to add a --no-history flag in umoci-config or some other similar approach.

Yup.

So my first vote would be to have an option to skip the history entry such that we can create an image which only provides meaningful history entries

As long as there's a fallback to having a mostly useless entry in case umoci doesn't support the option, I agree.

@cyphar
Copy link

cyphar commented Nov 7, 2018

The patch for --no-history is in openSUSE/umoci#270.

Having a history entry for the umoci repack looks weird to me as we need that to process the initial creation of the container archive only.

This is a peculiarity of OCI images. In order for other tools to understand your image history, you have to have an image history associated with each layer (it's really ugly how it works internally, it's just a dumb list with a boolean saying whether a particular entry is associated with a layer or not -- so layers without history entries break the history entries of later layers).

In theory you could order them in any order you like, but obviously we'd need to have a UX to do this and thus another umoci patch. I wouldn't particularly mind (one of the ideas I pitched to @davidcassany was this), but I think that --no-history is a simpler option.

@schaefi
Copy link
Collaborator

schaefi commented Nov 7, 2018

Up to my understanding container configuration (umoci config) is not tight to any layer, is image wide, >thus we could even reorganize our calls order and call umoci-repack as the last step, then probably it >looks more natural to include history records there.

I agree, having the repack last would be a good call order change

@schaefi
Copy link
Collaborator

schaefi commented Nov 7, 2018

The patch for --no-history is in openSUSE/umoci#270.

great you are fast :) So once merged this results in a package build somewhere ?

@cyphar
Copy link

cyphar commented Nov 7, 2018

great you are fast :) So once merged this results in a package build somewhere ?

No, but I can do a release and so on quite fast. There is a bug in tests, I will fix it up. :P

@davidcassany
Copy link
Collaborator Author

Given the discussion and @cyphar collaboration I suggest the following approach.

1- Make a new PR with a little refactor on how we call umoci, so we call umoci-repack as the last step. (no history flags used)
2- Update this PR (or new one) so history flags are used for the umoci-repack call only and the schema is updated to include the history data.
3- Finally another PR to check weather --no-history flag is available and used it if possible for any umoci-config call.

I think this should be simple and safe.

@Vogtinator, @schaefi does it make sense to you?

@Vogtinator
Copy link
Collaborator

@Vogtinator, @schaefi does it make sense to you?

Yup. I'm not sure why step 1 is necessary though.

@davidcassany
Copy link
Collaborator Author

Yup. I'm not sure why step 1 is necessary though.

not necessary but desirable I'd say. Also for dervied images we have one extra umoci-config call that is not strictly needed that leads to an additional history record.

@schaefi
Copy link
Collaborator

schaefi commented Nov 7, 2018

workflow sounds good to me, I think we don't need a new PR and just update this one here

@davidcassany davidcassany force-pushed the add_container_history branch 2 times, most recently from f067549 to 769b97e Compare November 8, 2018 12:25
@davidcassany
Copy link
Collaborator Author

davidcassany commented Nov 8, 2018

@schaefi, finally I have not reordered our umoci-repack call to be the last step, apparently umoci-repack restores the configuration that was in place at the time of umoci-unpack, thus any umoci-config in between gets lost. I just done a very little refactor to avoid the extra umoci-config call we had for derived images.

So now with a type section like:

<type image="docker">
  <containerconfig name="opensuse" tag="tumbleweed">
    <history>Include tumbleweed base image layer</history>
  </containerconfig>
</type>

we get a history record like:

IMAGE               CREATED             CREATED BY          SIZE                COMMENT
fa044e36fc1f        About an hour ago   KIWI 9.16.36        0B                  Include tumbleweed base image layer
<missing>           About an hour ago   umoci config        113MB

And for derived images, with type section like:

<type image="docker" derived_from="file:///mybaseimage.tar">
  <containerconfig name="opensuse/derived" tag="derived">
    <history author="David" created_by="A kiwi build">Adding a new layer</history>
  </containerconfig>
</type>

we get a history record like:

IMAGE               CREATED             CREATED BY          SIZE                COMMENT
bfc980576a05        About an hour ago   A kiwi build        0B                  Adding a new layer
<missing>           About an hour ago   umoci config        9.03MB
<missing>           About an hour ago   KIWI 9.16.36        0B                  Include tumbleweed base image layer
<missing>           About an hour ago   umoci config        113MB

This will be the result of any build using an umoci that does not have the --no-history flag. Using the --no-history flag will get rid of all umoci default entries.

I'll make the implementation of --no-history flag in a follow up request.

This commit adds the history section in contianerconfig. With it
'author', 'created_by' and 'comment' can be customized. In addition
'created' is always included with the image creation date time.
'created_by' entry is set to 'KIWI __version__' by default if nothing
is provided.

Fixes #852
@cyphar
Copy link

cyphar commented Nov 8, 2018

@davidcassany

apparently umoci-repack restores the configuration that was in place at the time of umoci-unpack, thus any umoci-config in between gets lost.

Yes, this is the case. It's actually not that the configuration is being restored, it's that umoci unpack stores the exact descriptor that the extracted image referred to so that it know what image it is going to be based on when it creates a new layer. If you modify the image tag, it doesn't affect this descriptor (which cryptographically points at the manifest for the image and thus the whole Merkle tree) and so you end up creating a modified version of the old image (complete with old config). There isn't really a robust way to work around this in umoci without making umoci potentially generate broken images (imagine if the tag was changed to a completely different image and then you generate a diff layer).

<missing>           About an hour ago   umoci config        113MB

Are you sure that output is right? Why is a config change being listed as being 113MB (looks like there's a mismatch in the history when it is parsed)? What are you using to get the history information (please don't tell me Docker broke something again)?

@davidcassany
Copy link
Collaborator Author

@davidcassany

apparently umoci-repack restores the configuration that was in place at the time of umoci-unpack, thus any umoci-config in between gets lost.

There isn't really a robust way to work around this in umoci without making umoci potentially generate broken images

Yes I understand, anyway this is not an issue.

<missing>           About an hour ago   umoci config        113MB

Are you sure that output is right?

It is right, at least the copy & paste I did :)

Why is a config change being listed as being 113MB (looks like there's a mismatch in the history when it is parsed)? What are you using to get the history information (please don't tell me Docker broke something again)?

This is the command sequence:

in kiwi buid:
  umoci repack
  umoci config
  umoci gc
  skopeo copy

loading the image:
  docker load
  docker history

I have no checked the history records for the OCI image neither the history records before being loaded, thus I guess it could be also skopeo that is messing around with the history.

kiwi/container/oci.py Outdated Show resolved Hide resolved
})
'entry_subcommand': ['/bin/bash'],
'history': {'created_by': 'KIWI {0}'.format(__version__)}
}, True)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had added the base_image as real Boolean because this test was really confusing if not. I know the result is the same, but having a string pointing to a directory here felt really strange.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes understood, actually I also liked what @cyphar suggested

bool(base_image)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants