Skip to content
This repository has been archived by the owner on Nov 9, 2020. It is now read-only.

Volume unmount failed after docker restart #1724

Closed
ashahi1 opened this issue Aug 8, 2017 · 6 comments
Closed

Volume unmount failed after docker restart #1724

ashahi1 opened this issue Aug 8, 2017 · 6 comments
Assignees

Comments

@ashahi1
Copy link
Contributor

ashahi1 commented Aug 8, 2017

Setup: Datastore name had spaces e.g. DS name: 'shared Vmfs - (10)'

Test steps:

1. Created a volume.
2. Ran a container with volume mounted.
3. While container is running, killed the docker.
4. Unmounting failed after docker restart.

Logs have been attached at the end.


Test steps and their output are as follows:

  1. Created a volume on a datastore "shared Vmfs - (10)"
root@sc-rdops-vm02-dhcp-52-237:~# docker volume create -d vsphere --name 'TestXXXX@shared   Vmfs   - (10)'
TestXXXX@shared   Vmfs   - (10)
root@sc-rdops-vm02-dhcp-52-237:~#
root@sc-rdops-vm02-dhcp-52-237:~# docker volume ls
DRIVER              VOLUME NAME
vsphere:latest      RefcountTest_volume_957663@shared   Vmfs   - (10)
vsphere:latest      TestXXXX@shared   Vmfs   - (10)
  1. Ran a container with volume mounted
root@sc-rdops-vm02-dhcp-52-237:~# docker run -d -v 'TestXXXX@shared   Vmfs   - (10)':/vol1 busybox tail -F /dev/null
f7bc85e60014a14be0f2de553eda6f09ac530285becb632f9ebe1de5278a122f
  1. While container is running, killed the docker (It restarts automatically)
root@sc-rdops-vm02-dhcp-52-237:~# pkill -9 dockerd
root@sc-rdops-vm02-dhcp-52-237:~#
  1. Unmounting failed after docker restart
2017-08-08 15:56:14.514381213 -0700 PDT [WARNING] Received signal signal=terminated
2017-08-08 15:56:16.63074616 -0700 PDT [INFO] No config file found. Using defaults.
2017-08-08 15:56:16.632034196 -0700 PDT [INFO] Starting plugin driver=vsphere log_level=info config="/etc/docker-volume-vsphere.conf"
2017-08-08 15:56:16.632187238 -0700 PDT [INFO] Getting volume data from unix:///var/run/docker.sock
2017-08-08 15:56:18.632695576 -0700 PDT [INFO] Can't connect to unix:///var/run/docker.sock due to (An error occurred trying to connect: context deadline exceeded), skipping discovery
2017-08-08 15:56:18.632764519 -0700 PDT [INFO] Refcounting failed: (An error occurred trying to connect: context deadline exceeded).
2017-08-08 15:56:18.632788627 -0700 PDT [INFO] Docker VMDK plugin started port=1019 mock_esx=false version="vSphere Volume Driver v0.5"
2017-08-08 15:56:18.632879206 -0700 PDT [INFO] Scheduling again after 2 seconds
2017-08-08 15:56:18.633147514 -0700 PDT [INFO] Going into ServeUnix - Listening on Unix socket address="/run/docker/plugins/vsphere.sock"
2017-08-08 15:56:20.633177747 -0700 PDT [INFO] Getting volume data from unix:///var/run/docker.sock
2017-08-08 15:56:22.633733809 -0700 PDT [INFO] Can't connect to unix:///var/run/docker.sock due to (An error occurred trying to connect: context deadline exceeded), skipping discovery
2017-08-08 15:56:22.633786469 -0700 PDT [INFO] Refcounting failed: (An error occurred trying to connect: context deadline exceeded). Attempts left: 20
2017-08-08 15:56:22.633795139 -0700 PDT [INFO] Scheduling again after 4 seconds
2017-08-08 15:56:26.633985126 -0700 PDT [INFO] Getting volume data from unix:///var/run/docker.sock
2017-08-08 15:56:28.634387757 -0700 PDT [INFO] Can't connect to unix:///var/run/docker.sock due to (An error occurred trying to connect: context deadline exceeded), skipping discovery
2017-08-08 15:56:28.634436682 -0700 PDT [INFO] Refcounting failed: (An error occurred trying to connect: context deadline exceeded). Attempts left: 19
2017-08-08 15:56:28.634445627 -0700 PDT [INFO] Scheduling again after 8 seconds
2017-08-08 15:56:36.634655096 -0700 PDT [INFO] Getting volume data from unix:///var/run/docker.sock
2017-08-08 15:56:36.668870752 -0700 PDT [INFO] Found 0 running or paused containers
2017-08-08 15:56:36.669123622 -0700 PDT [INFO] Initiating recovery unmount. name="RefcountTest_volume_957663@shared\\040\\040\\040Vmfs\\040\\040\\040-\\040(10)" refcnt=0 mounted=true dev="/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"
2017-08-08 15:56:36.669155516 -0700 PDT [ERROR] Failed to unmount volume. Now trying to detach... mountpoint="/mnt/vmdk/RefcountTest_volume_957663@shared\\040\\040\\040Vmfs\\040\\040\\040-\\040(10)" error="Unmount device at /mnt/vmdk/RefcountTest_volume_957663@shared\\040\\040\\040Vmfs\\040\\040\\040-\\040(10) failed: no such file or directory"
2017-08-08 15:56:36.806468513 -0700 PDT [WARNING] Failed to unmount - manual recovery may be needed
2017-08-08 15:56:36.806743488 -0700 PDT [INFO] Initiating recovery unmount. name="TestXXXX@shared\\040\\040\\040Vmfs\\040\\040\\040-\\040(10)" refcnt=0 mounted=true dev="/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0"
2017-08-08 15:56:36.806818766 -0700 PDT [ERROR] Failed to unmount volume. Now trying to detach... mountpoint="/mnt/vmdk/TestXXXX@shared\\040\\040\\040Vmfs\\040\\040\\040-\\040(10)" error="Unmount device at /mnt/vmdk/TestXXXX@shared\\040\\040\\040Vmfs\\040\\040\\040-\\040(10) failed: no such file or directory"
2017-08-08 15:56:36.926483704 -0700 PDT [WARNING] Failed to unmount - manual recovery may be needed
2017-08-08 15:56:36.926551407 -0700 PDT [INFO] Discovered 2 volumes in use.
2017-08-08 15:56:36.926563152 -0700 PDT [INFO] Volume name=RefcountTest_volume_957663@shared\040\040\040Vmfs\040\040\040-\040(10) count=0 mounted=true device='/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0'
2017-08-08 15:56:36.92657014 -0700 PDT [INFO] Volume name=TestXXXX@shared\040\040\040Vmfs\040\040\040-\040(10) count=0 mounted=true device='/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0'

Logs:
docker.txt
docker-volume-vsphere.txt
vmdk_ops.1.txt
vmdk_ops.txt

@pshahzeb
Copy link
Contributor

pshahzeb commented Aug 9, 2017

This is essentially due to the logic of reading volume name from /proc/mounts after restart.
The datastore name with spaces would be something like TestXXXX@shared\040\040\040Vmfs\040\040\040-\040(10) in the entry in /proc/mounts.
Can happen with volume name as well. We have volume name with spaces tests and restart tests, but not one that does both.

@govint
Copy link
Contributor

govint commented Aug 10, 2017

@ashahi1 @venilnoronha @pshahzeb - does this need to get verified with the Windows plugin also? Are we running the e2e tests with Windows plugin? Some of the fixes tend to be in common code and we need to ensure nothing gets broken on the Windows side

@govint
Copy link
Contributor

govint commented Aug 10, 2017

Reading /proc/mounts with names with spaces is mighty confusing for the plugin, here the pl;ugin (when it alone is restarted) believes there are two volumes while they are actually one.

2017-08-10 07:01:25.077849205 +0000 UTC [INFO] Found 1 running or paused containers
2017-08-10 07:01:25.549428647 +0000 UTC [INFO] Successfully retrieved mountsmap=map[snapvol\0404@sharedVmfs-0:/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0]
2017-08-10 07:01:25.944571329 +0000 UTC [WARNING] Initiating recovery mount. name="snapvol 4@sharedVmfs-0" refcnt=1 mounted=false dev=
2017-08-10 07:01:36.421285025 +0000 UTC [WARNING] Exceeded timeout while waiting for device attach to completetimeout=10s device="/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0"
2017-08-10 07:01:36.421701455 +0000 UTC [WARNING] Failed to mount - manual recovery may be needed
2017-08-10 07:01:36.421732105 +0000 UTC [INFO] Initiating recovery unmount. dev="/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0" name="snapvol\0404@sharedVmfs-0" refcnt=0 mounted=true
2017-08-10 07:01:36.421773182 +0000 UTC [ERROR] Failed to unmount volume. Now trying to detach... mountpoint="/mnt/vmdk/snapvol\0404@sharedVmfs-0" error="Unmount device at /mnt/vmdk/snapvol\0404@sharedVmfs-0 failed: no such file or directory"
2017-08-10 07:01:36.835214341 +0000 UTC [WARNING] Failed to unmount - manual recovery may be needed

These are the same volume
2017-08-10 07:01:36.835278834 +0000 UTC [INFO] Discovered 2 volumes that may be in use.

2017-08-10 07:01:36.835291072 +0000 UTC [INFO] Volume name=snapvol 4@sharedVmfs-0 count=1 mounted=false device=''
2017-08-10 07:01:36.835298656 +0000 UTC [INFO] Volume name=snapvol\0404@sharedVmfs-0 count=0 mounted=true device='/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0'

@pshahzeb
Copy link
Contributor

2017-08-10 07:01:36.835291072 +0000 UTC [INFO] Volume name=snapvol 4@sharedVmfs-0 count=1 mounted=false device=''
2017-08-10 07:01:36.835298656 +0000 UTC [INFO] Volume name=snapvol\0404@sharedVmfs-0 count=0 mounted=true device='/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0'

snapvol 4@sharedVmfs-0 was reported to us by docker, snapvol\0404@sharedVmfs-0 is the one we read from /proc/mounts.
Due to string comparison mismatch while populating refMap, we see 2 volumes.

does this need to get verified with the Windows plugin also?

As of today, windows plugin is planned as tech preview. But sometime soon this needs to be fixed for windows plugin as well. @ashahi1 @venilnoronha could you please confirm the behavior of such a case on windows env?

What's the fix for this issue?
Should the escape sequence '\040' be replaced with " " ? (in case of linux)

@venilnoronha
Copy link
Contributor

venilnoronha commented Aug 10, 2017

This seems to be a problem on the Windows plugin too. Creation / mounting of a volume with spaces in the name works perfectly fine; however, the PowerShell implementation for retrieving mount information is incapable of handling spaces in the name. Output from the PowerShell script looks like this:

As observed, a space is used as the default delimiter while printing multiple access paths of a single disk, and thus an issue. Simply using a slash-whitespace delimiter should solve this problem.

Also, Docker doesn't support spaces in volume names in their local driver on Windows.

@govint
Copy link
Contributor

govint commented Aug 11, 2017

Yes, the escape sequence is replaced with a ' ' in the PR #1736. Looks like for Windows the PowerShell script would need tweaking. @venilnoronha will you be able to handle the scripting I'm not familiar with the Windows equivalent of mounted FS and how thats retrieved.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants