Skip to content
This repository has been archived by the owner on Nov 9, 2020. It is now read-only.

Add a basic test: TestVmGroupVolumeIsolation #1368

Merged
merged 9 commits into from
Jun 12, 2017
Merged

Conversation

shaominchen
Copy link
Contributor

@shaominchen shaominchen commented Jun 7, 2017

  1. Added a new test case: TestVmGroupVolumeIsolation
  2. Refactored existing test cases to use new common utilities
  3. Fixed a cleanup issue in vmgroups_test
  4. Updated common utils LogTestStart & LogTestEnd and all related test cases.

Fixes issue #1263, #1372 and #1394 .

s.vm2Name = s.config.DockerHostNames[1]
s.volName1 = inputparams.GetUniqueVolumeName(testName)
s.volName2 = inputparams.GetUniqueVolumeName(testName)
s.containerName = inputparams.GetContainerNameWithTimeStamp(testName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking maybe just put this into a dummy_test.go file and let these esx, ... variables be globals there. All test modules in the e2e package would be able to refer those variables without each module doing the above. Any repetition of code anywhere could be avoided.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I don't really have much concern about this. We have already introduced common TestConfig, which I feel is sufficient. Here I introduced these local variables just for convenience because I personally prefer short names locally. Otherwise I would just use s.config.XXX.

// Remove Config DB
admincli.ConfigRemove(s.esx)

misc.LogTestEnd(testName, "TestVmGroupVolumeIsolation")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seems very similar if not exact to vmgroups_test.go:TestVmGroupVolumeAccessAcrossVmGroups(). If neeeding to enhance then the existing test can be enhanced vs. adding a test here. Plus its best to keep all vmgroup + volume related tests in one place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree with you - currently the test cases we defined on the Dashboard have a lot of overlap and might be duplicated. I will take a closer look at the test case you pointed out, and decide what we should do to avoid the duplication.

Copy link
Contributor

@shuklanirdesh82 shuklanirdesh82 Jun 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the Dashboard have a lot of overlap and might be duplicated.

We have broke down tests into smaller steps so you might see the overlap for some steps with others. As discussed in the past we have to avoid such duplication during the automation by enhancing the existing one.

+1 on @govint's comment.

We should be cautious too before updating/modifying the existing one, if any assertions goes wrong following wont run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@govint I took a further look into vmgroups_test.TestVmGroupVolumeAccessAcrossVmGroups(). This test is very similar to basic_test.TestVmGroupVolumeIsolation(). However, I think it makes sense to keep both because the former is trying to verify the volume isolation between two user-defined vmgroups, while the latter is trying to verify the volume isolation between _DEFAULT and user-defined vmgroup.

I agree we should keep all vmgroup related tests in one single place. I tried to move my test case into vmgroups_test.go. However, this needs too much refactoring to the existing logic in vmgroups_test.go, because the test assumption is different. In vmgroups_test.go, all tests assume that VM1&VM2 belong a user-defined vmgroup (we shouldn't call it "default" in the comments, which will cause confusion with _DEFAULT vmgroup). The current test case doesn't have this assumption. To avoid significant change to the existing vmgroups_test.go, I will keep this test in basic_test.go for now.

@@ -111,6 +111,9 @@ func (vg *VmGroupTest) TearDownSuite(c *C) {
out, err = ssh.InvokeCommand(vg.config.EsxHost, cmd)
log.Printf(out)

// Remove Config DB
adminutils.ConfigRemove(vg.config.EsxHost)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must not be done, its creating a fresh and clean plate for the next test to run and we may potentially wipe out a scenario that may lead to tests failing and bugs getting exposed. In fact, for all the e2e tests create the config DB once and let the tests run thru without removng the config DB (except for those tests that test the config DB itself).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to do this, we must make sure each test clean up itself properly. For example, if a test create a VM group, it must remove the VM group when test end (not only in successful case, but also in failure case). I don't think current test code did this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow this. What's the default customer scenario we are expecting? I may be wrong , but I think we are expecting a typical customer environment will be a clean vDVS installation without initializing local or shared DB. So most of e2e tests should be run in the default setup, i.e. NotConfigured mode.

Multi-tenancy features are experimental. So only VmGroup related tests should take care of ConfigInit and ConfigRemove.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lipingxue current test code removes all vmgroups in TearDownSuite() which is always called irrespective of pass/fail of the test suite.

A test must remove all artifacts that are created by it. But why remove the config DB?
After removing all test artifacts is the DB sane, how do we know that it is?

If we remove the config DB (which btw isn't a test artifact) then most likely we are removing bugs as well. If the removal of test artifacts leaves the DB is a corrupt/inconsistent state - whether local or clustered - then shouldn't that be discovered in the testing?

We can't run without the config DB - customers are trying it out and hence it must be in all our tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@govint Yesterday we (sorry you were not included) discussed this and all agreed that most of our e2e tests should be running in the "default" mode. The "default" mode here means "No Config DB", i.e. a fresh installation without running "Config Init". For now multi-tenancy feature is experimental, so (at least for now) we are expecting most of our customers will be using vDVS in the "default" mode. This assumption will remain true until multi-tenancy is fully/correctly implemented and becomes an official product feature.

Coming back to e2e tests, we are expecting most of the test groups will be run in the "default" mode without DB configured. Only vmgroup related tests will need to run "config init", and thus it is expected these tests should be responsible for cleaning up the DB (i.e. run "config rm") so that other e2e tests will still be running in a clean "default" mode.

Hope this helps. Let me know if you still have question about this.

func LogTestStart(testGroup string, testName string) {
log.Printf("START:%s %s %s", testGroup, testName, curTime())
log.Printf("START: %s.%s", testGroup, testName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please retain timestamp, when debugging we know exactly when a test started and when it finished and hence correlate other logs (daemon, plugin). It also lets us know which tests are taking time. Hence it was defined this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow this. As I've already explained in the PR description, go log itself will print out timestamp for each log message, for example:

2017/06/05 02:46:12 START: restart_test.TestPluginKill
2017/06/05 02:46:12 Attaching volume [restart_test_volume_1496630768] on VM[10.192.232.148]
2017/06/05 02:46:14 Confirming attached status for volume [restart_test_volume_1496630768]
2017/06/05 02:46:16 Killing vDVS plugin on VM [10.192.232.148]
2017/06/05 02:46:18 Sleep for 2 seconds

Why do we want to append a redundant timestamp?


func curTime() string {
return time.Now().Format(time.UnixDate)
log.Printf("END: %s.%s", testGroup, testName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment for the timestamp.

Copy link
Contributor Author

@shaominchen shaominchen Jun 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

s.vm2Name = s.config.DockerHostNames[1]
s.volName1 = inputparams.GetUniqueVolumeName(testName)
s.volName2 = inputparams.GetUniqueVolumeName(testName)
s.containerName = inputparams.GetContainerNameWithTimeStamp(testName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this

s.vm2Name = s.config.DockerHostNames[1]
s.volName1 = inputparams.GetUniqueVolumeName(testName)
s.volName2 = inputparams.GetUniqueVolumeName(testName)
s.containerName = inputparams.GetContainerNameWithTimeStamp(testName)
}

var _ = Suite(&BasicTestSuite{})

// Test volume lifecycle management on different datastores:
// VM1 - local VMFS datastore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// VM1 - local VMFS datastore -> VM1 - created on local VMFS datastore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will clarify this.

@@ -111,6 +111,9 @@ func (vg *VmGroupTest) TearDownSuite(c *C) {
out, err = ssh.InvokeCommand(vg.config.EsxHost, cmd)
log.Printf(out)

// Remove Config DB
adminutils.ConfigRemove(vg.config.EsxHost)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to do this, we must make sure each test clean up itself properly. For example, if a test create a VM group, it must remove the VM group when test end (not only in successful case, but also in failure case). I don't think current test code did this.

@@ -173,7 +173,7 @@ func VerifyDetachedStatus(name, hostName, esxName string) bool {
log.Printf("Confirming detached status for volume [%s]\n", name)

//TODO: Need to implement generic polling logic for better reuse
for attempt := 0; attempt < 30; attempt++ {
for attempt := 0; attempt < 60; attempt++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define a const for this instead of using hard coded "60".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already added TODO here. We need a general poller util to avoid blind sleeping in the tests - see issue #1301.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it better to control const variable to adjust attempt until we come to #1301? I would prefer to have const so we don't need to dive into code to adjust it and do some adjustment by controlling const variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine. Since you both prefer adding consts now, I will add it.

@@ -334,6 +334,7 @@ checkremote:
# expects binaries to be deployed ot the VM and ESX (see deploy-all target)
test-vm: checkremote deploy-vm-test
$(log_target)
-$(SSH) root@$(ESX) '/etc/init.d/vmdk-opsd start'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need this here?

Copy link
Contributor Author

@shaominchen shaominchen Jun 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a workaround for issue #1291 - we need to restart the service because it has crashed due to "Config Remove".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this file and rebase with master to grab fix for #1291

}

var _ = Suite(&BasicTestSuite{})

// Test volume lifecycle management on different datastores:
// VM1 - local VMFS datastore
// VM2 - shared VMFS datastore
// VM3 - shared VSAN datastore
// VM3 - shared VSAN datastore (TODO: currently not available)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the need for VM3 here? if I understand correctly ... VM3 should be registered on the same ESX where VM1 & VM2 are there then it is easily configured on the CI as well as local testbed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is to verify volume creation on different datastores. VM1 is created on local VMFS datastore, VM2 is created on shared VMFS datastore, and VM3 is created on shared VSAN datastore. I will update these comments to avoid confusion.

@shaominchen
Copy link
Contributor Author

@govint @lipingxue @shuklanirdesh82 Please review the latest updates.

Copy link
Contributor

@lipingxue lipingxue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@shuklanirdesh82 shuklanirdesh82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me.

@@ -334,6 +334,7 @@ checkremote:
# expects binaries to be deployed ot the VM and ESX (see deploy-all target)
test-vm: checkremote deploy-vm-test
$(log_target)
-$(SSH) root@$(ESX) '/etc/init.d/vmdk-opsd start'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this file and rebase with master to grab fix for #1291

@shaominchen
Copy link
Contributor Author

Removed the workaround of restarting vmdk-opsd. Please take a look @govint @shuklanirdesh82

Copy link
Contributor

@govint govint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes. Pls. check note on removing config DB.

@@ -111,6 +111,9 @@ func (vg *VmGroupTest) TearDownSuite(c *C) {
out, err = ssh.InvokeCommand(vg.config.EsxHost, cmd)
log.Printf(out)

// Remove Config DB
adminutils.ConfigRemove(vg.config.EsxHost)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lipingxue current test code removes all vmgroups in TearDownSuite() which is always called irrespective of pass/fail of the test suite.

A test must remove all artifacts that are created by it. But why remove the config DB?
After removing all test artifacts is the DB sane, how do we know that it is?

If we remove the config DB (which btw isn't a test artifact) then most likely we are removing bugs as well. If the removal of test artifacts leaves the DB is a corrupt/inconsistent state - whether local or clustered - then shouldn't that be discovered in the testing?

We can't run without the config DB - customers are trying it out and hence it must be in all our tests.

c.Assert(accessible, Equals, true, Commentf("Volume %s is not available on [%s]", s.volName2, s.vm1))

accessible = verification.CheckVolumeAvailability(s.vm2, s.volName2)
c.Assert(accessible, Equals, false, Commentf("Volume %s is still available on [%s]", s.volName2, s.vm2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Volume %s is still available on => Volume %s is available on or Volume %s should not be available on

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrase "... should not be available on" looks fine to me. I searched our exiting test code - it looks like we are all using "is still available" or "is still attached/detached" - please refer to restart_test, vmlistener_test, basic_test, swarm_test, etc. So I'd prefer to keep it consistent.

Copy link
Contributor

@shuklanirdesh82 shuklanirdesh82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and contingent on the following things.

  1. Need to close on the open comment from @govint's
  2. Need to fix CI failure (not sure you have invoked the test run locally after rebasing with master).

Thanks!

Copy link
Contributor

@govint govint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue of how the config DB is handled in the tests is to be discussed, doesn't need to block this PR at all.

@govint govint mentioned this pull request Jun 12, 2017
@govint
Copy link
Contributor

govint commented Jun 12, 2017

@shaominchen, does the swarm test time delays added need to be explained in the docs? Can a customer face the same issue as whats seen with these tests?

Copy link
Contributor

@shuklanirdesh82 shuklanirdesh82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure you resolve conflict correctly. Contingent on CI result.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants