-
Notifications
You must be signed in to change notification settings - Fork 95
Handle datastores missing in the datatore cache and special chars in datastore names. #1140
Conversation
esx_service/vmdk_ops.py
Outdated
dock_vol_path = os.path.join("/vmfs/volumes", datastore, DOCK_VOLS_DIR) | ||
path = dock_vol_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add this line? code refactor or fix some bug here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the else part thats really not needed. Its not a bug fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could be path = dock_vol_path = os.path.join("/vmfs/volumes", datastore, DOCK_VOLS_DIR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
esx_service/vmdk_ops.py
Outdated
|
||
# Its possible the datastore is None, which can happen if datastores on the | ||
# host could not be identified or initialized with the dockvols path. | ||
if not datastore: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to differentiate these two cases? And why the error message for these two cases printed out is different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had kept both cases together but the error message is indicating "default_datastore = None" (which can happen if there were errors in identifying datastores on the host and hence missing in the cache). The user doesn't know how the default_datastore is identified, what it is and why its none. I can remove default_datastore and make a single error message perhaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think default_datastore is useful. Can we print "default_datastore" as "N/A" when it is "None"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the new condition added.
esx_service/cli/vmdkops_admin.py
Outdated
@@ -839,6 +839,9 @@ def generate_tenant_ls_rows(tenant_list): | |||
else: | |||
default_datastore = vmdk_utils.get_datastore_name(tenant.default_datastore_url) | |||
|
|||
if default_datastore is None: | |||
return rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this happens for one tenant, the function returns. rows could be half populated. Shouldn't rows be flushed; or you should still traverse remaining tenants in tenant_list if this happens in the middle of iterating tenant_list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it seem default_datastore could be "" (line 838) and None. Why use 2 representations for the same ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
esx_service/cli/vmdkops_admin.py
Outdated
@@ -1019,6 +1022,10 @@ def generate_tenant_access_ls_rows(privileges): | |||
datastore = "" | |||
else: | |||
datastore = vmdk_utils.get_datastore_name(p.datastore_url) | |||
|
|||
if datastore is None: | |||
return rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same. This can happen for one privilege. Need to appropriately consider the list of privileges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, using "" for the datastore if not found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
I have a couple of comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good with a few minor nits (see inline).
One issue though - it seems a missing datastore something is indicated by "" and sometimes by None.
Python will handle the difference in the majority of cases (in if not
checks) but it feels wrong. Can you fix it and resolve merge, and the fix is good to go
esx_service/cli/vmdkops_admin.py
Outdated
@@ -839,6 +839,9 @@ def generate_tenant_ls_rows(tenant_list): | |||
else: | |||
default_datastore = vmdk_utils.get_datastore_name(tenant.default_datastore_url) | |||
|
|||
if default_datastore is None: | |||
return rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it seem default_datastore could be "" (line 838) and None. Why use 2 representations for the same ?
esx_service/utils/vmdk_utils.py
Outdated
@@ -394,6 +394,8 @@ def get_datastore_name(datastore_url): | |||
# get_datastores() return a list of tuple | |||
# each tuple has format like (datastore_name, datastore_url, dockvol_path) | |||
res = [d[0] for d in get_datastores() if d[1] == datastore_url] | |||
if not res: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 397-399 could be replaced with return res[0] if res else None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done and thanks for suggesting, looks way better.
esx_service/vmdk_ops.py
Outdated
dock_vol_path = os.path.join("/vmfs/volumes", datastore, DOCK_VOLS_DIR) | ||
path = dock_vol_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could be path = dock_vol_path = os.path.join("/vmfs/volumes", datastore, DOCK_VOLS_DIR)
default_datastore = vm_datastore | ||
default_datastore_url = vm_datastore_url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any reason for swapping these lines ? No issue, just curious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this was just editing. I'll restore.
@govint You need to rebase and resolve conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Some minor comments.
Not your change. I was expecting a one place at entry point in executerequest where we identify datastore and validate for that request. However, code is spread all over the place. We clearly have an opportunity to cleanse this code.
esx_service/cli/vmdkops_admin.py
Outdated
@@ -1019,6 +1022,10 @@ def generate_tenant_access_ls_rows(privileges): | |||
datastore = "" | |||
else: | |||
datastore = vmdk_utils.get_datastore_name(p.datastore_url) | |||
|
|||
if datastore is None: | |||
return rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like existing code already handles datastore=None and you should not return in middle of loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
esx_service/vmdk_ops.py
Outdated
@@ -837,25 +838,31 @@ def executeRequest(vm_uuid, vm_name, config_path, cmd, full_vol_name, opts): | |||
except ValidationError as ex: | |||
return err(str(ex)) | |||
|
|||
if datastore and not vmdk_utils.validate_datastore(datastore): | |||
if not datastore: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L841 & L846 has same if condition, combine them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed and removed the new condition added. Its not needed for now and handles the rare case that a datastore is temporarily not visible on the host.
Failing when there is a space in a name should be done asap regardless of other fixes , IMO |
110eb6c
to
2683c33
Compare
CI is failing as service isn't coming up for almost 10 - 20 secs when vmdk_ops_test.py is started., 04/11/17 11:41:39 106779 [photon.vmfs-datastore1._DEFAULT.dockerVol_1491910877] [INFO ] executeRequest 'remove' completed with ret=None |
@govint Where is 10-20Secs delay? @shuklanirdesh82 Can you please check CI run why there is "connection closed" error. |
Sorry, its just over a 10 second delay. Test-esx started at this time, => Running target test-esx Tue Apr 11 11:41:42 UTC 2017 <------- Running unit tests in /tmp/vmdk_ops_unittest5002/vmdk_ops_test.py... ...........Connection to 192.168.31.62 closed by remote host. <------- ESX service comes up after this time, .... |
|
||
error_info, tenant_uuid, tenant_name = auth.authorize(vm_uuid, datastore_url, cmd, opts) | ||
if error_info: | ||
return err(error_info) | ||
|
||
# get /vmfs/volumes/<volid>/dockvols path on ESX: | ||
datastore = get_datastore_name(datastore_url) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove this block? was it redundant or not required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code has the same logic distributed in the first part of executeRequest. I've consolidated the datastore and datastore_url selection above.
Line 844 above sets datastore to the default_datastore (if user hasn't provided a datastore name) and above that default_datastore is set to either the VM datastore or the one found for the tenant.
The code change looks good to me. Will give LGTM when CI passes |
@@ -920,6 +920,8 @@ def generate_tenant_ls_rows(tenant_list): | |||
default_datastore = "" | |||
else: | |||
default_datastore = vmdk_utils.get_datastore_name(tenant.default_datastore_url) | |||
if default_datastore is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Would be nice to comment... something like "use empty string to simplify line formatting down the road"
esx_service/vmdk_ops.py
Outdated
@@ -705,7 +705,7 @@ def get_vol_path(datastore, tenant_name=None): | |||
|
|||
if not os.path.isdir(dock_vol_path): | |||
# The osfs tools are usable for DOCK_VOLS_DIR on all datastores | |||
cmd = "{0} {1}".format(OSFS_MKDIR_CMD, dock_vol_path) | |||
cmd = "{0} '{1}'".format(OSFS_MKDIR_CMD, dock_vol_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (since the line is touched anyways) - it could be cmd = "{} '{}'".format(OSFS_MKDIR_CMD, dock_vol_path)
, no need for positionals, just sequential params are fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
As we are using snapshot, the esx vm state was gone. I had checked sshd connection and esx VM has sufficient configuration. FYI:Restarted the failed run and it passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a race in init_datastoreCache(). Liping has the fix.
Could it be the reason for missing DB name ? Or did you actually added / renamed a datastore while vmdk_ops was in flight ?
I haven't changed the DS name during the test, this is the CI run that was failing. Can I get approval to submit this change as CI also has passed? |
If there was no change DS during the run, then the fix is for a symptom, not root cause. I think we should merge the 'spaces' fix, and get the race fixed before deciding on merging the rest . |
See issue #1116 for the root cause, there are two issues, one triggering the ##other,
Issue Added WIP to Readme.md #1 - The VM used in the test was located on a datastore "local-0 (11)", the mkdir command in vmdk_ops.py:get_vol_path() fails with the space and the '(' and ')' chars in the datastore name. This is fixed in get_vol_path().
Issue Installation and build fixes #2 - (1) opens up another issue. In vmdk_utils.py:get_datastore_name() the given datastore isn't found but the empty list is accessed and causes the exception.
The change handles (1) and (2) by allowing vmdk_utils.py:get_datastore_name() to return None if a datastore isn't found. The same scenario may happen where a datastore is temprarily inaccessible and hence not included in the in-memory datastore list. A subsequent request for the datastore name can cause the exception.
To handle the None return from vmdk_utils.py:get_datastore_name() changes are done in places where this function is called to handle the None return value.
Tests:
04/03/17 12:13:37 1177180 [Thread-1] [DEBUG ] Running cmd /usr/lib/vmware/osfs/bin/osfs-mkdir -n '/vmfs/volumes/bigone (1)/dockvols' <---------
04/03/17 12:13:37 1177180 [Thread-1] [INFO ] Created /vmfs/volumes/bigone (1)/dockvols
04/03/17 12:13:37 1177180 [Thread-1] [DEBUG ] Running cmd /bin/mkdir '/vmfs/volumes/bigone (1)/dockvols/11111111-1111-1111-1111-111111111111' <----------
04/03/17 12:13:37 1177180 [Thread-1] [INFO ] Symlink /vmfs/volumes/bigone (1)/dockvols/_DEFAULT is created to point to path /vmfs/volumes/bigone (1)/dockvols/11111111-1111-1111-1111-111111111111
04/03/17 12:13:37 1177180 [Thread-1] [INFO ] Created /vmfs/volumes/bigone (1)/dockvols/11111111-1111-1111-1111-111111111111
Without fix:
04/03/17 06:58:09 1162246 [Thread-1] [ERROR ] Unhandled Exception:
Traceback (most recent call last):
File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 1513, in execRequestThread
opts=opts)
File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 804, in executeRequest
vm_datastore = get_datastore_name(vm_datastore_url)
File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 778, in get_datastore_name
datastore_name = vmdk_utils.get_datastore_name(datastore_url)
File "/usr/lib/vmware/vmdkops/Python/vmdk_utils.py", line 359, in get_datastore_name
return res[0]
IndexError: list index out of range
04/03/17 06:58:09 1162246 [Thread-2] [ERROR ] Unhandled Exception:
Traceback (most recent call last):
File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 1513, in execRequestThread
opts=opts)
File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 804, in executeRequest
vm_datastore = get_datastore_name(vm_datastore_url)
File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 778, in get_datastore_name
datastore_name = vmdk_utils.get_datastore_name(datastore_url)
File "/usr/lib/vmware/vmdkops/Python/vmdk_utils.py", line 359, in get_datastore_name
return res[0]
IndexError: list index out of range
With fix:
administrator@hte-1s-eng-dhcp98:~/vpl-ut/docker-volume-vsphere$ docker volume create --driver=vsphere --name=TestVol1 -o size=400mb
Error response from daemon: create TestVol1: VolumeDriver.Create: Invalid datastore 'None'.
Known datastores: datastore1.