-
Notifications
You must be signed in to change notification settings - Fork 95
Add a basic test: TestVmGroupVolumeIsolation #1368
Conversation
tests/e2e/basic_test.go
Outdated
s.vm2Name = s.config.DockerHostNames[1] | ||
s.volName1 = inputparams.GetUniqueVolumeName(testName) | ||
s.volName2 = inputparams.GetUniqueVolumeName(testName) | ||
s.containerName = inputparams.GetContainerNameWithTimeStamp(testName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking maybe just put this into a dummy_test.go file and let these esx, ... variables be globals there. All test modules in the e2e package would be able to refer those variables without each module doing the above. Any repetition of code anywhere could be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I don't really have much concern about this. We have already introduced common TestConfig, which I feel is sufficient. Here I introduced these local variables just for convenience because I personally prefer short names locally. Otherwise I would just use s.config.XXX.
tests/e2e/basic_test.go
Outdated
// Remove Config DB | ||
admincli.ConfigRemove(s.esx) | ||
|
||
misc.LogTestEnd(testName, "TestVmGroupVolumeIsolation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test seems very similar if not exact to vmgroups_test.go:TestVmGroupVolumeAccessAcrossVmGroups(). If neeeding to enhance then the existing test can be enhanced vs. adding a test here. Plus its best to keep all vmgroup + volume related tests in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree with you - currently the test cases we defined on the Dashboard have a lot of overlap and might be duplicated. I will take a closer look at the test case you pointed out, and decide what we should do to avoid the duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on the Dashboard have a lot of overlap and might be duplicated.
We have broke down tests into smaller steps so you might see the overlap for some steps with others. As discussed in the past we have to avoid such duplication during the automation by enhancing the existing one.
+1 on @govint's comment.
We should be cautious too before updating/modifying the existing one, if any assertions goes wrong following wont run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@govint I took a further look into vmgroups_test.TestVmGroupVolumeAccessAcrossVmGroups(). This test is very similar to basic_test.TestVmGroupVolumeIsolation(). However, I think it makes sense to keep both because the former is trying to verify the volume isolation between two user-defined vmgroups, while the latter is trying to verify the volume isolation between _DEFAULT and user-defined vmgroup.
I agree we should keep all vmgroup related tests in one single place. I tried to move my test case into vmgroups_test.go. However, this needs too much refactoring to the existing logic in vmgroups_test.go, because the test assumption is different. In vmgroups_test.go, all tests assume that VM1&VM2 belong a user-defined vmgroup (we shouldn't call it "default" in the comments, which will cause confusion with _DEFAULT vmgroup). The current test case doesn't have this assumption. To avoid significant change to the existing vmgroups_test.go, I will keep this test in basic_test.go for now.
@@ -111,6 +111,9 @@ func (vg *VmGroupTest) TearDownSuite(c *C) { | |||
out, err = ssh.InvokeCommand(vg.config.EsxHost, cmd) | |||
log.Printf(out) | |||
|
|||
// Remove Config DB | |||
adminutils.ConfigRemove(vg.config.EsxHost) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must not be done, its creating a fresh and clean plate for the next test to run and we may potentially wipe out a scenario that may lead to tests failing and bugs getting exposed. In fact, for all the e2e tests create the config DB once and let the tests run thru without removng the config DB (except for those tests that test the config DB itself).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to do this, we must make sure each test clean up itself properly. For example, if a test create a VM group, it must remove the VM group when test end (not only in successful case, but also in failure case). I don't think current test code did this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow this. What's the default customer scenario we are expecting? I may be wrong , but I think we are expecting a typical customer environment will be a clean vDVS installation without initializing local or shared DB. So most of e2e tests should be run in the default setup, i.e. NotConfigured mode.
Multi-tenancy features are experimental. So only VmGroup related tests should take care of ConfigInit and ConfigRemove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lipingxue current test code removes all vmgroups in TearDownSuite() which is always called irrespective of pass/fail of the test suite.
A test must remove all artifacts that are created by it. But why remove the config DB?
After removing all test artifacts is the DB sane, how do we know that it is?
If we remove the config DB (which btw isn't a test artifact) then most likely we are removing bugs as well. If the removal of test artifacts leaves the DB is a corrupt/inconsistent state - whether local or clustered - then shouldn't that be discovered in the testing?
We can't run without the config DB - customers are trying it out and hence it must be in all our tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@govint Yesterday we (sorry you were not included) discussed this and all agreed that most of our e2e tests should be running in the "default" mode. The "default" mode here means "No Config DB", i.e. a fresh installation without running "Config Init". For now multi-tenancy feature is experimental, so (at least for now) we are expecting most of our customers will be using vDVS in the "default" mode. This assumption will remain true until multi-tenancy is fully/correctly implemented and becomes an official product feature.
Coming back to e2e tests, we are expecting most of the test groups will be run in the "default" mode without DB configured. Only vmgroup related tests will need to run "config init", and thus it is expected these tests should be responsible for cleaning up the DB (i.e. run "config rm") so that other e2e tests will still be running in a clean "default" mode.
Hope this helps. Let me know if you still have question about this.
tests/utils/misc/misc.go
Outdated
func LogTestStart(testGroup string, testName string) { | ||
log.Printf("START:%s %s %s", testGroup, testName, curTime()) | ||
log.Printf("START: %s.%s", testGroup, testName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please retain timestamp, when debugging we know exactly when a test started and when it finished and hence correlate other logs (daemon, plugin). It also lets us know which tests are taking time. Hence it was defined this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow this. As I've already explained in the PR description, go log itself will print out timestamp for each log message, for example:
2017/06/05 02:46:12 START: restart_test.TestPluginKill
2017/06/05 02:46:12 Attaching volume [restart_test_volume_1496630768] on VM[10.192.232.148]
2017/06/05 02:46:14 Confirming attached status for volume [restart_test_volume_1496630768]
2017/06/05 02:46:16 Killing vDVS plugin on VM [10.192.232.148]
2017/06/05 02:46:18 Sleep for 2 seconds
Why do we want to append a redundant timestamp?
tests/utils/misc/misc.go
Outdated
|
||
func curTime() string { | ||
return time.Now().Format(time.UnixDate) | ||
log.Printf("END: %s.%s", testGroup, testName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment for the timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
tests/e2e/basic_test.go
Outdated
s.vm2Name = s.config.DockerHostNames[1] | ||
s.volName1 = inputparams.GetUniqueVolumeName(testName) | ||
s.volName2 = inputparams.GetUniqueVolumeName(testName) | ||
s.containerName = inputparams.GetContainerNameWithTimeStamp(testName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on this
tests/e2e/basic_test.go
Outdated
s.vm2Name = s.config.DockerHostNames[1] | ||
s.volName1 = inputparams.GetUniqueVolumeName(testName) | ||
s.volName2 = inputparams.GetUniqueVolumeName(testName) | ||
s.containerName = inputparams.GetContainerNameWithTimeStamp(testName) | ||
} | ||
|
||
var _ = Suite(&BasicTestSuite{}) | ||
|
||
// Test volume lifecycle management on different datastores: | ||
// VM1 - local VMFS datastore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// VM1 - local VMFS datastore -> VM1 - created on local VMFS datastore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will clarify this.
@@ -111,6 +111,9 @@ func (vg *VmGroupTest) TearDownSuite(c *C) { | |||
out, err = ssh.InvokeCommand(vg.config.EsxHost, cmd) | |||
log.Printf(out) | |||
|
|||
// Remove Config DB | |||
adminutils.ConfigRemove(vg.config.EsxHost) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to do this, we must make sure each test clean up itself properly. For example, if a test create a VM group, it must remove the VM group when test end (not only in successful case, but also in failure case). I don't think current test code did this.
@@ -173,7 +173,7 @@ func VerifyDetachedStatus(name, hostName, esxName string) bool { | |||
log.Printf("Confirming detached status for volume [%s]\n", name) | |||
|
|||
//TODO: Need to implement generic polling logic for better reuse | |||
for attempt := 0; attempt < 30; attempt++ { | |||
for attempt := 0; attempt < 60; attempt++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
define a const for this instead of using hard coded "60".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've already added TODO here. We need a general poller util to avoid blind sleeping in the tests - see issue #1301.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it better to control const variable to adjust attempt
until we come to #1301? I would prefer to have const so we don't need to dive into code to adjust it and do some adjustment by controlling const variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine. Since you both prefer adding consts now, I will add it.
vmdk_plugin/Makefile
Outdated
@@ -334,6 +334,7 @@ checkremote: | |||
# expects binaries to be deployed ot the VM and ESX (see deploy-all target) | |||
test-vm: checkremote deploy-vm-test | |||
$(log_target) | |||
-$(SSH) root@$(ESX) '/etc/init.d/vmdk-opsd start' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a workaround for issue #1291 - we need to restart the service because it has crashed due to "Config Remove".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this file and rebase with master to grab fix for #1291
tests/e2e/basic_test.go
Outdated
} | ||
|
||
var _ = Suite(&BasicTestSuite{}) | ||
|
||
// Test volume lifecycle management on different datastores: | ||
// VM1 - local VMFS datastore | ||
// VM2 - shared VMFS datastore | ||
// VM3 - shared VSAN datastore | ||
// VM3 - shared VSAN datastore (TODO: currently not available) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the need for VM3 here? if I understand correctly ... VM3 should be registered on the same ESX where VM1 & VM2 are there then it is easily configured on the CI as well as local testbed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is to verify volume creation on different datastores. VM1 is created on local VMFS datastore, VM2 is created on shared VMFS datastore, and VM3 is created on shared VSAN datastore. I will update these comments to avoid confusion.
@govint @lipingxue @shuklanirdesh82 Please review the latest updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me.
vmdk_plugin/Makefile
Outdated
@@ -334,6 +334,7 @@ checkremote: | |||
# expects binaries to be deployed ot the VM and ESX (see deploy-all target) | |||
test-vm: checkremote deploy-vm-test | |||
$(log_target) | |||
-$(SSH) root@$(ESX) '/etc/init.d/vmdk-opsd start' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this file and rebase with master to grab fix for #1291
Removed the workaround of restarting vmdk-opsd. Please take a look @govint @shuklanirdesh82 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes. Pls. check note on removing config DB.
@@ -111,6 +111,9 @@ func (vg *VmGroupTest) TearDownSuite(c *C) { | |||
out, err = ssh.InvokeCommand(vg.config.EsxHost, cmd) | |||
log.Printf(out) | |||
|
|||
// Remove Config DB | |||
adminutils.ConfigRemove(vg.config.EsxHost) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lipingxue current test code removes all vmgroups in TearDownSuite() which is always called irrespective of pass/fail of the test suite.
A test must remove all artifacts that are created by it. But why remove the config DB?
After removing all test artifacts is the DB sane, how do we know that it is?
If we remove the config DB (which btw isn't a test artifact) then most likely we are removing bugs as well. If the removal of test artifacts leaves the DB is a corrupt/inconsistent state - whether local or clustered - then shouldn't that be discovered in the testing?
We can't run without the config DB - customers are trying it out and hence it must be in all our tests.
tests/e2e/basic_test.go
Outdated
c.Assert(accessible, Equals, true, Commentf("Volume %s is not available on [%s]", s.volName2, s.vm1)) | ||
|
||
accessible = verification.CheckVolumeAvailability(s.vm2, s.volName2) | ||
c.Assert(accessible, Equals, false, Commentf("Volume %s is still available on [%s]", s.volName2, s.vm2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Volume %s is still available on
=> Volume %s is available on
or Volume %s should not be available on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrase "... should not be available on" looks fine to me. I searched our exiting test code - it looks like we are all using "is still available" or "is still attached/detached" - please refer to restart_test, vmlistener_test, basic_test, swarm_test, etc. So I'd prefer to keep it consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and contingent on the following things.
- Need to close on the open comment from @govint's
- Need to fix CI failure (not sure you have invoked the test run locally after rebasing with master).
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue of how the config DB is handled in the tests is to be discussed, doesn't need to block this PR at all.
@shaominchen, does the swarm test time delays added need to be explained in the docs? Can a customer face the same issue as whats seen with these tests? |
Also increased the timeout limit for swarm test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure you resolve conflict correctly. Contingent on CI result.
Fixes issue #1263, #1372 and #1394 .