Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroup v2 error - "sed: write error" #308

Closed
gene-miller opened this issue Apr 15, 2021 · 7 comments · Fixed by moby/moby#42331
Closed

cgroup v2 error - "sed: write error" #308

gene-miller opened this issue Apr 15, 2021 · 7 comments · Fixed by moby/moby#42331

Comments

@gene-miller
Copy link

The 'dind' script assumes that the dockerd process will start with pid 1.
This is often not the case when building from the dind image and this causes an error "sed: write error" on systems that use cgroup v2.

Could I suggest checking the process ID of dockerd rather than assuming it is 1 in the dind script? Something like:

# cgroup v2: enable nesting
if [ -f /sys/fs/cgroup/cgroup.controllers ]; then
	# move the init process (PID 1) from the root group to the /init group,
	# otherwise writing subtree_control fails with EBUSY.
	mkdir -p /sys/fs/cgroup/init
	DOCKERDPID=$(/usr/bin/pgrep dockerd | wc -l)
	echo $DOCKERDPID > /sys/fs/cgroup/init/cgroup.procs
	# enable controllers
	sed -e 's/ / +/g' -e 's/^/+/' < /sys/fs/cgroup/cgroup.controllers > /sys/fs/cgroup/cgroup.subtree_control
fi
@tianon
Copy link
Member

tianon commented Apr 15, 2021

Ah, looks like this code was introduced in moby/moby#41065 (this script comes from https://github.com/moby/moby/blob/6110ba3d7c8b83794a8d2e915410c11e7460e4b5/hack/dind).

Given that this script is used as an entrypoint, it will normally start before dockerd (it is intended to wrap the dockerd invocation), so I think instead of trying to look for DOCKERDPID or hard-coding 1, we should probably just be using $$, but I have to admit I don't understand the details / cgroupsv2 sufficiently to say for certain. It's probably worth opening this as an issue or PR on https://github.com/moby/moby to discuss more with the maintainers there.

@AkihiroSuda
Copy link
Contributor

This has to be done with PID 1.

How do you get non-1 PID?

@tianon
Copy link
Member

tianon commented Apr 19, 2021

I can reproduce by just using --init on my Docker-in-Docker container on a cgroupsv2 host. 😬

$ docker pull docker:dind
dind: Pulling from library/docker
Digest: sha256:e0cef8e03463c7dde0613bb68a3fa211f4e3a12823b38f03f92bf330abaef3a9
Status: Image is up to date for docker:dind
docker.io/library/docker:dind
$ docker run -it --rm --init --privileged -e DOCKER_TLS_CERTDIR= docker:dind
sed: write error

@AkihiroSuda
Copy link
Contributor

PR: moby/moby#42331

@tianon
Copy link
Member

tianon commented Apr 30, 2021

Should be fixed via docker-library/official-images#10087! Thank you @AkihiroSuda 😄

(30d7b9b)

@tianon tianon closed this as completed Apr 30, 2021
petermetz added a commit to petermetz/cacti that referenced this issue Jun 17, 2023
Upgrade the base images to docker:24.0.2-dind
which contain the fix for the cgroup v2 problems.

The images built locally from this commit are pushed to ghcr.io as

ghcr.io/hyperledger/cactus-fabric-all-in-one:2023-06-16-d436ef26e-issue2464-dind-v24
and
ghcr.io/hyperledger/cactus-fabric2-all-in-one:2023-06-16-d436ef26e-issue2464-dind-v24

Additional context:

The root cause analysis can be found here [1][2]
which states that the solution is to upgrade the
dind image to a version of at least 20.10.16

[1] docker-library/docker#308
[2] testcontainers/dind-drone-plugin#18

Fixes hyperledger#2464

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
petermetz added a commit to petermetz/cacti that referenced this issue Jun 19, 2023
Upgrade the base images to docker:24.0.2-dind
which contain the fix for the cgroup v2 problems.

The images built locally from this commit are pushed to ghcr.io as

ghcr.io/hyperledger/cactus-fabric-all-in-one:2023-06-16-d436ef26e-issue2464-dind-v24
and
ghcr.io/hyperledger/cactus-fabric2-all-in-one:2023-06-16-d436ef26e-issue2464-dind-v24

Additional context:

The root cause analysis can be found here [1][2]
which states that the solution is to upgrade the
dind image to a version of at least 20.10.16

[1] docker-library/docker#308
[2] testcontainers/dind-drone-plugin#18

Fixes hyperledger#2464

===================================
P.S.:
I'm also sneaking in a hot-fix for the CI failures that are slowing down
everyone else's work with false-negative checks wasting time and resources:
The root package.json codegen, precodegen and postcodegen scripts are now
safe from race conditions (or at least that's the theory for now).

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
petermetz added a commit to hyperledger/cacti that referenced this issue Jun 21, 2023
Upgrade the base images to docker:24.0.2-dind
which contain the fix for the cgroup v2 problems.

The images built locally from this commit are pushed to ghcr.io as

ghcr.io/hyperledger/cactus-fabric-all-in-one:2023-06-16-d436ef26e-issue2464-dind-v24
and
ghcr.io/hyperledger/cactus-fabric2-all-in-one:2023-06-16-d436ef26e-issue2464-dind-v24

Additional context:

The root cause analysis can be found here [1][2]
which states that the solution is to upgrade the
dind image to a version of at least 20.10.16

[1] docker-library/docker#308
[2] testcontainers/dind-drone-plugin#18

Fixes #2464

===================================
P.S.:
I'm also sneaking in a hot-fix for the CI failures that are slowing down
everyone else's work with false-negative checks wasting time and resources:
The root package.json codegen, precodegen and postcodegen scripts are now
safe from race conditions (or at least that's the theory for now).

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
barnapa pushed a commit to barnapa/cacti that referenced this issue Jun 22, 2023
Upgrade the base images to docker:24.0.2-dind
which contain the fix for the cgroup v2 problems.

The images built locally from this commit are pushed to ghcr.io as

ghcr.io/hyperledger/cactus-fabric-all-in-one:2023-06-16-d436ef26e-issue2464-dind-v24
and
ghcr.io/hyperledger/cactus-fabric2-all-in-one:2023-06-16-d436ef26e-issue2464-dind-v24

Additional context:

The root cause analysis can be found here [1][2]
which states that the solution is to upgrade the
dind image to a version of at least 20.10.16

[1] docker-library/docker#308
[2] testcontainers/dind-drone-plugin#18

Fixes hyperledger#2464

===================================
P.S.:
I'm also sneaking in a hot-fix for the CI failures that are slowing down
everyone else's work with false-negative checks wasting time and resources:
The root package.json codegen, precodegen and postcodegen scripts are now
safe from race conditions (or at least that's the theory for now).

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
rudranshsharma123 pushed a commit to rudranshsharma123/cacti that referenced this issue Jul 5, 2023
Upgrade the base images to docker:24.0.2-dind
which contain the fix for the cgroup v2 problems.

The images built locally from this commit are pushed to ghcr.io as

ghcr.io/hyperledger/cactus-fabric-all-in-one:2023-06-16-d436ef26e-issue2464-dind-v24
and
ghcr.io/hyperledger/cactus-fabric2-all-in-one:2023-06-16-d436ef26e-issue2464-dind-v24

Additional context:

The root cause analysis can be found here [1][2]
which states that the solution is to upgrade the
dind image to a version of at least 20.10.16

[1] docker-library/docker#308
[2] testcontainers/dind-drone-plugin#18

Fixes hyperledger#2464

===================================
P.S.:
I'm also sneaking in a hot-fix for the CI failures that are slowing down
everyone else's work with false-negative checks wasting time and resources:
The root package.json codegen, precodegen and postcodegen scripts are now
safe from race conditions (or at least that's the theory for now).

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
sandeepnRES pushed a commit to sandeepnRES/cacti that referenced this issue Dec 21, 2023
Upgrade the base images to docker:24.0.2-dind
which contain the fix for the cgroup v2 problems.

The images built locally from this commit are pushed to ghcr.io as

ghcr.io/hyperledger/cactus-fabric-all-in-one:2023-06-16-d436ef26e-issue2464-dind-v24
and
ghcr.io/hyperledger/cactus-fabric2-all-in-one:2023-06-16-d436ef26e-issue2464-dind-v24

Additional context:

The root cause analysis can be found here [1][2]
which states that the solution is to upgrade the
dind image to a version of at least 20.10.16

[1] docker-library/docker#308
[2] testcontainers/dind-drone-plugin#18

Fixes hyperledger#2464

===================================
P.S.:
I'm also sneaking in a hot-fix for the CI failures that are slowing down
everyone else's work with false-negative checks wasting time and resources:
The root package.json codegen, precodegen and postcodegen scripts are now
safe from race conditions (or at least that's the theory for now).

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
@rousku
Copy link

rousku commented May 16, 2024

I get a similar error with the following script. It takes some time for the issue to appear.

#!/bin/bash

while true; do
   DIND_CONTAINER_ID=$(docker run -t --privileged -d docker:26.1.2-dind)
   echo $DIND_CONTAINER_ID
   while ! docker exec "$DIND_CONTAINER_ID" docker info | grep "Server Version: 26.1.2"; do
   	sleep 1
   done
   docker stop $DIND_CONTAINER_ID
   docker rm $DIND_CONTAINER_ID
done
.
.
.
fc7502cfe87e48cfd464d4a5713f2efaf5e2b4341d5a13d5381c324ff80ec8df
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info
 Server Version: 26.1.2
fc7502cfe87e48cfd464d4a5713f2efaf5e2b4341d5a13d5381c324ff80ec8df
fc7502cfe87e48cfd464d4a5713f2efaf5e2b4341d5a13d5381c324ff80ec8df
4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info
Error response from daemon: container 4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e is not running
Error response from daemon: container 4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e is not running
Error response from daemon: container 4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e is not running
Error response from daemon: container 4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e is not running
Error response from daemon: container 4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e is not running
.
.
.
$ docker logs 4940ab34359e57a661
Certificate request self-signature ok
subject=CN = docker:dind server
/certs/server/cert.pem: OK
Certificate request self-signature ok
subject=CN = docker:dind client
/certs/client/cert.pem: OK
cat: can't open '/proc/net/ip6_tables_names': No such file or directory
cat: can't open '/proc/net/arp_tables_names': No such file or directory
iptables v1.8.10 (nf_tables)
sed: write error

@tianon
Copy link
Member

tianon commented May 16, 2024

Yeah, I see this pretty regularly over in https://github.com/tianon/dockerfiles/actions?query=is%3Afailure (a non-trivial number of those failures are this exact error -- as far as I can tell, some kind of race condition between the shell and the kernel 😭).

sed: couldn't flush stdout: Device or resource busy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants