Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev container metadata in image labels #18

Closed
Chuxel opened this issue Mar 21, 2022 · 32 comments
Closed

Dev container metadata in image labels #18

Chuxel opened this issue Mar 21, 2022 · 32 comments
Assignees
Labels
active important proposal Still under discussion, collecting feedback

Comments

@Chuxel
Copy link
Member

Chuxel commented Mar 21, 2022

Unlike some developer centric formats, a non-goal for devcontainer.json is to become yet another multi-container orchestrator format. Instead, its goal is to enable development in containers regardless of how they are orchestrated. With this in mind, allowing image labels be used to set devcontainer.json properties could have significant advantages. In particular, there is a problem with images and devcontainer.json files needing to be paired but are deployed and managed separately, which can lead to disconnects. This is important in scenarios like container runtime settings required to enable debugging (e.g. cap_add: SYS_PTRACE), but applies broadly to any settings the images was purpose built to support.

In particular:

  1. When the dev container CLI is used to build the image, these labels can added automatically.
  2. Labels can be added to images using a Dockerfile, command line, an orchestrator format, or another image build system like Buildpacks.

This second point also makes it easier to use the metadata when some other mechanism is used to build the container images - whether an orchestrator or another non-Docker CLI like pack for Buildpacks.

The json-based nature of devcontainer.json would main this fairly straight forward. To simplify just reusing json blobs in automation scenarios, we can also support a whole structure being added:

LABEL dev.containers.metadata='{ "userEnvProbe": "loginInteractiveShell", "customizations": { "vscode": { "extensions": ["ms-python.python", "ms-toolsai.jupyter" ] } } }'

For hand editing, common and less complex properties could be referenced directly.

LABEL dev.containers.metadata.userEnvProbe="loginInteractiveShell"
LABEL dev.containers.metadata.customizations.vscode.extensions= '["ms-python.python","ms-toolsai.jupyter"]'

More complex any type properties could then be encoded json.

LABEL dev.containers.metadata.customizations.vscode.settings="{\"some.setting\": \"some-value\"}"

In the common case, these labels would be automatically added by the dev container CLI to the image when it is pre-built. (devcontainer build ...), but manual entry would enable these additions to be embedded in orchestrator formats as well.

services:
  app:
    build:
      context: .
      dockerfile: Dockerfile
      labels:
        - "dev.containers.metadata.userEnvProbe=loginInteractiveShell"
        - "dev.containers.metadata.customizations.vscode.extensions=[\"ms-python.python\",\"ms-toolsai.jupyter\"]"

Any tool that supports the dev container spec would then look for these image labels - regardless of whether they are on a pre-built image or one built by an orchestrator. The reference implementation would then illustrate how to make this happen.

We should be able make this work with any general, lifecycle, or tool specific property.

Dev container features

The dev container CLI enables pre-building images that include the proposed dev container features concept. However, there's a couple things here to consider. First, we can include an image label that indicates whether the build step is already done for the image. This avoids accidentally re-firing of the same feature multiple times based on reused configuration.

LABEL dev.containers.features.built='["docker-in-docker","github-cli"]'

Second, we can render out the devcontainer.json properties that tie to the feature in the resulting image. e.g., extensions would include those that were indicated by the feature. Properties like capAdd we've talked about as higher level properties, so these could be handled the same way.

Processing then is always: build (optional), read labels with metadata, run. You can use devcontainer.json as you use it today, but the labels could also come from somewhere else.

Beyond providing processing consistency, this avoids a possible issue where the referenced feature in devcontainer.json updates, but the pre-built image does not. Problems here could be extremely difficult to track down when they happen otherwise.

Multi-container

Finally - this should help with the [one-to-many problem at least tactically or when there is not a specific integration to an orchestrator format. For pre-building, each image can have a separate devcontainer.json or Dockerfile with the labels - with the dev container CLI enabling features. These can then be referenced in an orchestrator directly. Even when the dev container CLI is not used, at least a subset of properties could be included in the orchestrator format.

Net-net, the resulting image should have labels on it that explains how it should be set up. Orchestrator formats can influence processing where they support adding properties. In all cases, the dev container CLI will inspect the image and make the needed adjustments.

@Chuxel Chuxel added the proposal Still under discussion, collecting feedback label Mar 21, 2022
@jkeech
Copy link
Contributor

jkeech commented Mar 21, 2022

Embedding devcontainer metadata in the generated images as labels seems like a really valuable feature 👍. A couple quick comments/questions:

  1. @Chuxel, what's your thought on using separate labels per devcontainer property vs a single label that contains the entire set of partial devcontainer.json contents that the image contains. Are there any technical limitations that would be in play (such as max label size or max number of labels allowed)?
  2. I would propose dropping the .microsoft portion of the label namespace and just go with something like com.devcontainer since we aim to make this more of an open/neutral specification.

@Chuxel
Copy link
Member Author

Chuxel commented Mar 21, 2022

  1. @Chuxel, what's your thought on using separate labels per devcontainer property vs a single label that contains the entire set of partial devcontainer.json contents that the image contains. Are there any technical limitations that would be in play (such as max label size or max number of labels allowed)?

@jkeech There may be a maximum (I need to check), but it's large. From what I've seen, things like Buildpacks add pretty giant json structures as labels - far bigger than what a devcontainer.json would include.

The reason for considering separate properties is simplify the process of hand editing and additions into a Dockerfile when you just have one or two. Format wise, json is quote heavy, which means lots of escaping. We could support both I suppose. e.g. com.microsoft.devcontainer could be the full json blob.

  1. I would propose dropping the .microsoft portion of the label namespace and just go with something like com.devcontainer since we aim to make this more of an open/neutral specification.

Yeah, we need to use a domain we own (or more accurately, the spec owns), but agreed that the .microsoft bit we'd ideally get rid the .microsoft part as we open the spec. I talked to @bamurtaugh about one option to consider. I just didn't put something else down since I didn't want to set false expectations - we can update the summary once we have something.

@joshspicer
Copy link
Member

joshspicer commented Mar 21, 2022

If the limit is not too high, it seems cool to me to store the entire partial devcontainer.json as a label, and be able to "reverse engineer" the devcontainer components of an image without too much manual copy/pasting.

Doing both also makes sense to me for flexibility.

@Chuxel
Copy link
Member Author

Chuxel commented May 27, 2022

This repository has a concrete example of where this could be useful. It uses Buildpacks to generate dev container images w/config. https://github.com/chuxel/devpacks

The image build is performed in a way that is not easily to update (since creating a Buildpack is not easy), and does not have an out-of-box way to generate secondary outputs like devcontainer.json. Allowing information to be on labels on the image results in a completely independent workflow being able to participate in devcontainer creation. It also covers #2 and illustrates setting container capabilities for Go (specifically ptrace).

To @jkeech's point, I just used the label dev.containers.metadata for a merged json structure. There's a little CLI to bridge gaps right now - but it's pretty simple, so I think it's got a great value for effort ratio. 😄

@chrmarti
Copy link
Contributor

Goal

Record feature metadata in prebuilt images, such that, the image and the built-in features can be used with a devcontainer.json (image-, Dockerfile- or Docker Compose-based) that does not repeat the features or feature metadata. Other tools should be able to record the same metadata without necessarily using features themselves.

Feature metadata relevant for using the feature when it is already part of the image: mounts, init, privileged, capAdd, securityOpt, entrypoint and customizations.

Proposal

We investigated adding a devcontainer.json to the image as a label on the image. This devcontainer.json would contain the merged feature metadata (same merge logic as when starting a dev container, e.g., privileged is true when it is true for one or more features). This approach has the following difficulties:

  • We could use runArgs for the image- or Dockerfile-based devcontainer.json to record init, privileged, capAdd and securityOpt feature metadata, but runArgs does not apply when the user's devcontainer.json uses Docker Compose.
    • For example: "privileged": true in the feature metadata can be recorded in the devcontainer.json as "runArgs": [ "--privileged" ].
  • In general, we don't know how to merge customizations. (E.g., there is currently no VS Code-specific code in the CLI required to handle that.)
    • For example: A tool "foo" might have an array of values as customization and two features might have different values for it:
      • { "customizations": { "foo": { "bar": ["baz"] } } }
      • { "customizations": { "foo": { "bar": ["xyz"] } } }
    • These could be merged as
      • additions: { "customizations": { "foo": { "bar": ["baz", "xyz"] } } }
      • or replacement: { "customizations": { "foo": { "bar": ["xyz"] } } }.
  • Having one devcontainer.json on the image and one in the workspace folder requires a general merge logic for devcontainer.jsons.
    • For example: Both devcontainer.jsons might have an array of mount points (mounts) and the correct way to merge can depend on what the user intends to do.
    • For example: Docker Compose supports merging configs and has long-standing issues related to that: docker/compose#3729.
    • Proposal for automatic merging of all devcontainer.json properties: devcontainers/spec#22
    • Proposal for user-controlled merging of devcontainer.json properties: devcontainers/spec#23

To avoid these hurdles we propose to add metadata to the image with the following structure using one entry per feature:

[
	{
		"id": string,
		"init"?: boolean,
		"privileged"?: boolean,
		"capAdd"?: string[],
		"securityOpt"?: string[],
		"entrypoint"?: string,
		"mounts"?: [],
		"customizations"?: {
			...
		}
	},
	...
]

To simplify adding this metadata for other tools, we also support having a single top-level object with the same properties omitting the id property.

We already have the merge logic for merging the metadata from multiple features originating from a devcontainer.json and can use the same logic for also merging this new metadata from the container image.

Notes

  • There is no size limit documented for labels, but the daemon returns an error when the request header is >500kb.
    • The 500kb limit is shared, so we cannot use a second label in the same build to avoid it.
    • If/when this becomes an issue we could embed the metadata as a file in the image (e.g., with a label indicating it).

@Chuxel
Copy link
Member Author

Chuxel commented Jul 20, 2022

@chrmarti Beyond recoding feature metadata, the goal was also to enable alternate build systems like Buildpacks or Nixpacks or even Docker Compose itself without the ability to output a devcontainer.json file to contribute dev container metadata. This is also important and should not be required to use features under the hood (and in some cases can't use them). Are you thinking the feature metadata is in addition to the main proposal? Or would you express the metadata in this situation like a feature even though no such feature exists? If the idea would be to express the metadata as a feature, we should probably clarify how this should be expressed - in particular how we would think about the "id" in this situation.

We could use runArgs for the image- or Dockerfile-based devcontainer.json to record init, privileged, capAdd and securityOpt feature metadata, but runArgs does not apply when the user's devcontainer.json uses Docker Compose.
For example: "privileged": true in the feature metadata can be recorded in the devcontainer.json as "runArgs": [ "--privileged" ].

@chrmarti FYI - Raised #2 a while back for this bit. Agree 100% we should do it as a part of this.

@jkeech
Copy link
Contributor

jkeech commented Jul 20, 2022

to contribute dev container metadata

This is an important part. Even when using the devcontainer build command, you could have the following devcontainer.json:

{
  "build": {
    "dockerfile": "Dockerfile"
  },
 "features": {
    "node": "latest"
  },
  "postStartCommand": "npm install",
  "customizations": {
    "vscode": {
      "extensions": [ "my-publisher.myextension" ]
    }
  },

The resulting image metadata should have all of the feature contributions that need to get applied at runtime as well as the lifecycle hooks like postStartCommand and customizations in the devcontainer.json. That way the end user can just reference the resulting image and get everything working as expected in the same way is directly running the original devcontainer.json.

Instead of cherry-picking which properties from devcontainer.json and from features would need to get applied at runtime and carried forward in the image metadata, it might be better to include the full contents. That way we don't have to continually update the list of properties that apply after the image is built as new properties are added to the spec.

@Chuxel
Copy link
Member Author

Chuxel commented Jul 21, 2022

Yeah, postCreateCommand was used in the Buildpacks prototype I did. It is partly why I raised #60. It also assumed
#2 was done for the reasons mentioned above since one of the buildpacks was for Go and needed ptrace.

@chrmarti
Copy link
Contributor

The difficulty with adding an entire devcontainer.json is that the correct merge logic for a property depends on its semantic. E.g., when merging string properties of the same name I would generally guess that the last one wins, but with postStartCommand we actually want to collect them and run them all. Also some properties from the devcontainer.json (e.g., the Dockerfile path) might not make sense as metadata on an image.

To simplify use of the metadata by other tools we support a single object and an array of the same type of object (updating id to the optional feature here and adding the lifecycle properties):

[
	{
		"feature"?: string,
		"init"?: boolean,
		"privileged"?: boolean,
		"capAdd"?: string[],
		"securityOpt"?: string[],
		"entrypoint"?: string,
		"mounts"?: [],
		"onCreateCommand"?: string | string[],
		"updateContentCommand"?: string | string[],
		"postCreateCommand"?: string | string[],
		"postStartCommand"?: string | string[],
		"postAttachCommand"?: string | string[],
		"customizations"?: {
			...
		}
	},
	...
]

E.g.:

[
	{
		"postStartCommand": "npm install",
		"customizations": {
			"vscode": {
				"extensions": [
					"my-publisher.myextension"
				]
			}
		}
	},
	{
		"feature": "devcontainers/features/node@1.2",
		"customizations": {
			"vscode": {
				"extensions": [
					"dbaeumer.vscode-eslint"
				]
			}
		}
	},
	// ...
]

@Chuxel
Copy link
Member Author

Chuxel commented Jul 21, 2022

@chrmarti @alexdima as we just chatted, this could work if we also did #2 so that we can encourage people not to use things like "runArgs" in the pre-build scenario given we know how common these are. We can update all of our samples as well. (In some ways it's tech-debt to back-port what we already did for features - the reasons for the properties is the same.) Totally good with the array idea and merging later.

For the list of properties available in the label though, looking at devcontainer.json, I'd propose we support all General and Lifecycle properties. In particular, these will matter if you pre-build your image that is not in the list above:

  • remoteUser - This one is in every devcontainer.json we've published and has major impacts.
  • userEnvProbe - Has a m major effect on what ends up in the container.
  • remoteEnv - Used to update the PATH since it can reference containerEnv
  • overrideCommand - Some images will not work without this being set.
  • portsAttributes / otherPortsAttributes - Users that name their ports or set HTTPS as the protocol should be respected

These are a bit debatable, but I think customers would expect if you built from a devcontainer.json, they would be applied:

  • forwardPorts
  • shutdownAction
  • updateRemoteUserUID

These are not applicable to Dev Container Features (#60), but that's because they are inputs to constructing the image. Here we are dealing with a fully constructed image that either needs to container dev container metadata, or was created as a build from a devcontainer.json file.

Or put another way, it may be easier to just say we don't support Scenario Specific Properties. These are really the ones that would differ between orchestrator rather than being about the contents of the image itself. @jkeech made a pretty good point there as I think about it.

@chrmarti
Copy link
Contributor

We have to decide on how to merge each property. Going by the above list:

Property Type/Format Merge Logic
feature devcontainers/features/node@1.2 Not merged.
init boolean true if at least one is true, false otherwise.
privileged boolean true if at least one is true, false otherwise.
capAdd string[] Union of all capAdd arrays without duplicates.
securityOpt string[] Union of all securityOpt arrays without duplicates.
entrypoint string Collected list of all entrypoints.
mounts (string | { type, src, dst })[] Collected list of all mountpoints.
onCreateCommand string | string[] Collected list of all onCreateCommands.
updateContentCommand string | string[] Collected list of all updateContentCommands.
postCreateCommand string | string[] Collected list of all postCreateCommands.
postStartCommand string | string[] Collected list of all postStartCommands.
postAttachCommand string | string[] Collected list of all postAttachCommands.
customizations Object of tool-specific customizations. Merging is left to the tools.
remoteUser string Last value wins.
userEnvProbe string (enum) Last value wins.
remoteEnv Object of strings. Per variable, last value wins.
containerEnv Object of strings. Per variable, last value wins.
overrideCommand boolean Last value wins.
portsAttributes Map of ports to attributes. Per port (not port attribute), last value wins.
otherPortsAttributes Port attributes. Last value wins (not per attribute).
forwardPorts (number | string)[] Union of all ports without duplicates. Last one wins (when mapping changes).
shutdownAction string (enum) Last value wins.
updateRemoteUserUID boolean Last value wins.

Does it make sense, e.g., for overrideCommand to have a different logic than init?

@Chuxel
Copy link
Member Author

Chuxel commented Jul 22, 2022

@chrmarti The nice thing is there's really three states for booleans, true, false, and unset. With that in mind, I think it's safe to say that "last set value wins" for all booleans. For things that are features, that would also mirror how they are applied, correct? Or are some of these unions? The only reason you'd explicitly set one of these is if you wanted to override the default behavior because it has to be different, which I could see doing in devcontainer.json. Unset is just ignored in that evaluation. An alternative would be to just have devcontainer.json able to override but that results in less predictable behaviors.

Agree with the commands and entrypoint being exceptions though. In retrospect, we likely should have supported an array of commands rather than just one since that's often how it gets used (separated by &&).

I'd also propose that the default behavior for objects under customizations is straight forward unless the tool overrides it so we can handle things that are not in the schema. Specifically, string/boolean are last one in wins if set, arrays are union and objects you go into individual properties under them and apply these same rules. I think we can do otherPortsAttributes and portsAttribures the same way given how they are set up.

Makes it easy to document as well - we can describe default rules and then the few exceptions.

@jkeech
Copy link
Contributor

jkeech commented Jul 22, 2022

After some offline discussion, I think it's fine to start with only embedding allow-listed properties from the devcontainer.json and features into the image label. If this ends up being a maintenance burden as new properties are added to the spec, we can always choose to expand the scope to be the full devcontainer and feature metadata, which wouldn't be a breaking change if we choose to do that in the future.

@Chuxel
Copy link
Member Author

Chuxel commented Aug 1, 2022

Another +1 thought on the idea of using an array to persist the metadata here is that we could also add support for the same label existing on an individual container (in addition to the image). That would allow orchestrators to drop in dev container metadata as well in addition to placing it on images. Certain properties wouldn't work like this since they affect how the container is configured, but anything that would fit into the "attach" scenario (#20) could be added this way as well. That would be super powerful.

@Chuxel
Copy link
Member Author

Chuxel commented Aug 18, 2022

@chrmarti Per #82, I'd also suggest hostRequirements also be on this list given it's also not scenario specific.

@chrmarti
Copy link
Contributor

@chrmarti The nice thing is there's really three states for booleans, true, false, and unset. With that in mind, I think it's safe to say that "last set value wins" for all booleans. For things that are features, that would also mirror how they are applied, correct? Or are some of these unions? The only reason you'd explicitly set one of these is if you wanted to override the default behavior because it has to be different, which I could see doing in devcontainer.json. Unset is just ignored in that evaluation. An alternative would be to just have devcontainer.json able to override but that results in less predictable behaviors.

Sounds good.

I'd also propose that the default behavior for objects under customizations is straight forward unless the tool overrides it so we can handle things that are not in the schema. Specifically, string/boolean are last one in wins if set, arrays are union and objects you go into individual properties under them and apply these same rules. I think we can do otherPortsAttributes and portsAttribures the same way given how they are set up.

When using an array to collect the metadata, there is no need to merge customizations. One question is how the CLI will pass these back to products (e.g., with the read-configuration command), but that is not part of the spec and I think it will be easier to just return the array of customizations (otherwise we will need a CLI option to turn off auto-merging of customizations?).

@Chuxel
Copy link
Member Author

Chuxel commented Aug 31, 2022

When using an array to collect the metadata, there is no need to merge customizations.

@chrmarti Yeah agreed. I was more mentioning how the merge would happen once the labels were processed.

One question is how the CLI will pass these back to products (e.g., with the read-configuration command), but that is not part of the spec and I think it will be easier to just return the array of customizations (otherwise we will need a CLI option to turn off auto-merging of customizations?).

I think read-configuration in the CLI should be capable of returning a fully processed config as a good reference implementation. This will ensure the same logic is applied correctly by implementors and in addition to demonstrate the spec - since the behavior here is part of the specification. It's a pain already with features to deal with the separation that exists in that command. That's why I raised devcontainers/cli#113. You can see the hack I had to do for the demos here: https://github.com/devcontainers/cli/blob/main/example-usage/tool-vscode-server/server/init-vscode-server.sh#L28-L65 If it now has to parse additional config, it gets even worse.

The command has an argument to pass all of the different parts separately, so I think that can include image config, but even in these simple examples, you can see the strong need for returning a normalized, processed config.

@jkeech
Copy link
Contributor

jkeech commented Aug 31, 2022

One question is how the CLI will pass these back to products (e.g., with the read-configuration command), but that is not part of the spec and I think it will be easier to just return the array of customizations (otherwise we will need a CLI option to turn off auto-merging of customizations?).

I think read-configuration in the CLI should be capable of returning a fully processed config as a good reference implementation. This will ensure the same logic is applied correctly by implementors and in addition to demonstrate the spec - since the behavior here is part of the specification.

👍 I think the merging semantics should be part of the spec and the reference implementation in the CLI. If it's not, then different implementors could have different behavior due to it being undefined, leading to different results at runtime.

@chrmarti
Copy link
Contributor

chrmarti commented Sep 1, 2022

@chrmarti The nice thing is there's really three states for booleans, true, false, and unset. With that in mind, I think it's safe to say that "last set value wins" for all booleans. For things that are features, that would also mirror how they are applied, correct? Or are some of these unions? The only reason you'd explicitly set one of these is if you wanted to override the default behavior because it has to be different, which I could see doing in devcontainer.json. Unset is just ignored in that evaluation. An alternative would be to just have devcontainer.json able to override but that results in less predictable behaviors.

Sounds good.

Actually, this needs clarification: We need to make sure features work independently of their order, e.g., if a feature sets "privileged" true, another feature should not be able to override with "privileged": false because that would potentially break the first feature (e.g., with go debugging would no longer work). To retain the "last set value wins" semantic in the image metadata we can remove any "privileged": false (and similar flags) from the feature metadata at the time we add the metadata to the image.

@chrmarti
Copy link
Contributor

chrmarti commented Sep 1, 2022

One question is how the CLI will pass these back to products (e.g., with the read-configuration command), but that is not part of the spec and I think it will be easier to just return the array of customizations (otherwise we will need a CLI option to turn off auto-merging of customizations?).

I think read-configuration in the CLI should be capable of returning a fully processed config as a good reference implementation. This will ensure the same logic is applied correctly by implementors and in addition to demonstrate the spec - since the behavior here is part of the specification.

👍 I think the merging semantics should be part of the spec and the reference implementation in the CLI. If it's not, then different implementors could have different behavior due to it being undefined, leading to different results at runtime.

I agree. I was thinking specifically about the customizations property where the correct merging strategy also depends on the semantics of the properties (like with built-in properties) and these are not known to the CLI. E.g. with:

  "customizations": {
    "codespaces": {
      "repositories": {
        "chrmarti/example": {
          "permissions": {
            "contents": "read",
            "workflows": "write"
          }
        }
      }
    }
  }

and

  "customizations": {
    "codespaces": {
      "repositories": {
        "chrmarti/example": {
          "permissions": {
            "contents": "write",
            "workflows": "read"
          }
        }
      }
    }
  }

you might want to merge the permissions both as write because that will satisfy the requirements of both configurations.

@Chuxel
Copy link
Member Author

Chuxel commented Sep 1, 2022

@chrmarti Yeah I think whatever logic we put in place should be emitted by the CLI. I think documenting a default behavior and then mentioning that tools can override this behavior is a good starting point. To your point, the default on booleans should be "or". For the CLI, generally this indicates that we may need a way for implementors to contribute their own merge semantics where needed so they can alter default behaviors in the CLI. The CLI can also emit all details like it does now for features if an implementing client wants to do something on its own. But, for merged view, we could create a function interface that for a "customizations" name and a given property takes in two values to perform a merge. If no function for a given customization is available, it drops to the default logic. This would be something that could be contributed in addition to a schema update, and we could document that fact

That keeps things simple for the majority, but allows for exceptions, rather than forcing complexity on everyone unequivocally. I can see conceptually that there could be customizations that are also used by more than one implementor, so this also keeps logic consistent for those scenarios.

@chrmarti
Copy link
Contributor

chrmarti commented Sep 6, 2022

To your point, the default on booleans should be "or".

For image metadata and/or feature metadata? The previous suggestion seemed to be that the "last one set wins" in the image metadata and "true if at least one is true, false otherwise" in the feature metadata.

But, for merged view, we could create a function interface that for a "customizations" name and a given property takes in two values to perform a merge.

The CLI is currently the "API". If we were to make it extensible we would either have to accept other (small) CLIs as extensions or start having a JavaScript (or so) extension API. We have carefully avoided any need for either so far as that would introduce more complexity. I don't see a pressing need to change that and suggest we first see how far we get without.

@chrmarti
Copy link
Contributor

chrmarti commented Sep 6, 2022

On merging booleans: Maybe overrideCommand and updateRemoteUserUID only make sense in the devcontainer.json and not in the feature metadata and therefore "last wins" is an appropriate merge strategy because that allows for overriding later. For init and privileged the or strategy seems to make more sense because that is about including the required behavior or permission for a feature.

@chrmarti
Copy link
Contributor

chrmarti commented Sep 6, 2022

Updated proposal (I have removed the previously included discussion of alternatives for clarity):

Goal

Record feature metadata in prebuilt images, such that, the image and the built-in features can be used with a devcontainer.json (image-, Dockerfile- or Docker Compose-based) that does not repeat the features or feature metadata. Other tools should be able to record the same metadata without necessarily using features themselves.

Feature metadata relevant for using the feature when it is already part of the image: mounts, init, privileged, capAdd, securityOpt, entrypoint and customizations.

Proposal

We propose to add metadata to the image with the following structure using one entry per feature and devcontainer.json (see table below for the full list):

[
	{
		"id"?: string,
		"init"?: boolean,
		"privileged"?: boolean,
		"capAdd"?: string[],
		"securityOpt"?: string[],
		"entrypoint"?: string,
		"mounts"?: [],
		...
		"customizations"?: {
			...
		}
	},
	...
]

To simplify adding this metadata for other tools, we also support having a single top-level object with the same properties.

The metadata is added to the image as a devcontainer.metadata label with a JSON string value representing the above array or single object.

Merge Logic

To apply the metadata together with a user's devcontainer.json at runtime the following merge logic by property is used:

Property Type/Format Merge Logic
id E.g., ghcr.io/devcontainers/features/node:1 Not merged.
init boolean true if at least one is true, false otherwise.
privileged boolean true if at least one is true, false otherwise.
capAdd string[] Union of all capAdd arrays without duplicates.
securityOpt string[] Union of all securityOpt arrays without duplicates.
entrypoint string Collected list of all entrypoints.
mounts (string | { type, src, dst })[] Collected list of all mountpoints.
onCreateCommand string | string[] Collected list of all onCreateCommands.
updateContentCommand string | string[] Collected list of all updateContentCommands.
postCreateCommand string | string[] Collected list of all postCreateCommands.
postStartCommand string | string[] Collected list of all postStartCommands.
postAttachCommand string | string[] Collected list of all postAttachCommands.
customizations Object of tool-specific customizations. Merging is left to the tools.
remoteUser string Last value wins.
userEnvProbe string (enum) Last value wins.
remoteEnv Object of strings. Per variable, last value wins.
containerEnv Object of strings. Per variable, last value wins.
overrideCommand boolean Last value wins.
portsAttributes Map of ports to attributes. Per port (not per port attribute), last value wins.
otherPortsAttributes Port attributes. Last value wins (not per port attribute).
forwardPorts (number | string)[] Union of all ports without duplicates. Last one wins (when mapping changes).
shutdownAction string (enum) Last value wins.
updateRemoteUserUID boolean Last value wins.
hostRequirements cpus, memory, storage Max value wins.

Notes

  • There is no size limit documented for labels, but the daemon returns an error when the request header is >500kb.
    • The 500kb limit is shared, so we cannot use a second label in the same build to avoid it.
    • If/when this becomes an issue we could embed the metadata as a file in the image (e.g., with a label indicating it).

@edgonmsft
Copy link
Contributor

Sounds good to me, only the following comments:

  • Mountpoints should probably be filtered for duplicates.
  • Would userEnvProbe cause issues if some declare it and others don't?

@chrmarti
Copy link
Contributor

chrmarti commented Sep 8, 2022

It might be best to keep userEnvProbe at its default loginInteractiveShell (unless the user explicitly requests otherwise) to ensure tools depending on the shell startup scripts work. So maybe this is another case where we don't want features to have the same flexibility as the devcontainer.json.

@Chuxel
Copy link
Member Author

Chuxel commented Sep 8, 2022

@edgonmsft @chrmarti Yeah, any value that is not set should remain at its default across the board. The same is true for booleans, so there's really three states: true, false, unset. This mirrors how devcontainer.json itself is processed.

@chrmarti
Copy link
Contributor

chrmarti commented Sep 8, 2022

@Chuxel Do you agree that features should not be able to change userEnvProbe? Changing it seems to have too broad an impact to allow for individual features to control it.

@Chuxel
Copy link
Member Author

Chuxel commented Sep 8, 2022

@chrmarti Image labels should be able to do it for sure given you can do this from devcontainer.json, but Features is an interesting question given the default is loginInteractiveShell which is the most aggressive and what I'd typically expect Features would want (since they may need to alter rc files). In the past when the default was none I would have said that they absolutely need to support it - but things are better here now.

Since the user may extend an image that already has the value set, we'll still need to support processing more than one value. However, given that, adding it into Features later would be pretty trivial I'd assume - so omitting for now is probably the right thing to do until we get feedback otherwise.

Unrelated: One other thing we may want to document is how variables will be applied - I'm assuming these would generally just appear as the variables and processed at the end when the container is coming up (rather than when the label is applied). Is that how you were thinking about it as well?

@chrmarti
Copy link
Contributor

chrmarti commented Sep 9, 2022

Unrelated: One other thing we may want to document is how variables will be applied - I'm assuming these would generally just appear as the variables and processed at the end when the container is coming up (rather than when the label is applied). Is that how you were thinking about it as well?

I agree, that seems to make the most sense.

@chrmarti
Copy link
Contributor

Posted the proposal with the discussed clarifications in PR #95 for review.

@eitsupi
Copy link

eitsupi commented Nov 3, 2022

Hello. What is the reason why the label is devcontainer.metadata instead of dev.containers.metadata?

dev.containers.metadata seems more natural as to the OCI spec Annotations.
https://github.com/opencontainers/image-spec/blob/ff0257194e20f7fd17ca21c67a2e725f21c4d4f2/annotations.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
active important proposal Still under discussion, collecting feedback
Projects
None yet
Development

No branches or pull requests

8 participants