Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aborted pull request testing contains confusing message #981

Closed
pshipton opened this issue Jan 18, 2018 · 24 comments
Closed

Aborted pull request testing contains confusing message #981

pshipton opened this issue Jan 18, 2018 · 24 comments

Comments

@pshipton
Copy link
Member

Looking at a problematic pull request build https://ci.eclipse.org/openj9/job/PullRequest-Sanity-linux_ppc-64_cmprssptrs_le/336/ (due to #968), it contains the following output. The problem here is the Build finished. 1013 tests run, 351 skipped, 0 failed. message which might be mistaken has having completed all the tests. Perhaps this can be improved.

14:35:50 org.openj9.test.java.lang.
14:35:50 JVMDUMP039I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError" at 2018/01/18 19:35:50 - please wait.
14:35:50 JVMDUMP032I JVM requested System dump using '/home/jenkins/jenkins-agent/workspace/PullRequest-Sanity-linux_ppc-64_cmprssptrs_le/openj9/test/TestConfig/core.20180118.193550.30387.0001.dmp' in response to an event
[Pipeline] }
[Pipeline] // dir
[Pipeline] }
[Pipeline] // timestamps
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Declarative: Post Actions)
[Pipeline] echo
Publish Test Results...
[Pipeline] junit
Recording test results
[Pipeline] step
TestNG Reports Processing: START
Looking for TestNG results report in workspace using pattern: **/testng-results.xml
Saving reports...
Processing '/home/hudson/genie.openj9/.jenkins/jobs/PullRequest-Sanity-linux_ppc-64_cmprssptrs_le/builds/336/testng/testng-results-1.xml'
Processing '/home/hudson/genie.openj9/.jenkins/jobs/PullRequest-Sanity-linux_ppc-64_cmprssptrs_le/builds/336/testng/testng-results-2.xml'
Processing '/home/hudson/genie.openj9/.jenkins/jobs/PullRequest-Sanity-linux_ppc-64_cmprssptrs_le/builds/336/testng/testng-results-3.xml'
Processing '/home/hudson/genie.openj9/.jenkins/jobs/PullRequest-Sanity-linux_ppc-64_cmprssptrs_le/builds/336/testng/testng-results-4.xml'
Processing '/home/hudson/genie.openj9/.jenkins/jobs/PullRequest-Sanity-linux_ppc-64_cmprssptrs_le/builds/336/testng/testng-results.xml'
TestNG Reports Processing: FINISH
[Pipeline] step
TAP Reports Processing: START
Looking for TAP results report in workspace using pattern: **/*.tap
Did not find any matching files.
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
Adding one-line test results to commit status...
Setting status of b3256579fd0340b2828c82f9ceb64674bba2b01a to FAILURE with url https://ci.eclipse.org/openj9/job/PullRequest-Sanity-linux_ppc-64_cmprssptrs_le/336/ and message: 'Build finished. 1013 tests run, 351 skipped, 0 failed.'
``
@AdamBrousseau
Copy link
Contributor

AdamBrousseau commented Jan 18, 2018

The problem here is the jvm crashing, followed by the POST Jenkins actions which run after the build. This includes the PR plugin updating the PullRequest build status. Since only X tests ran, that is what the PR Plugin determines. I think we would have to make our test framework more robust or be able to recover from these crashes. The plugin cannot tell that the testing stopped early so it can only ananylze the test results that are there.

Edit:
Or somehow have the test framework print out something like 1013/21152 tests ran

@smlambert
Copy link
Contributor

I think there are two problems.

  1. One is with the test framework, that we should be reporting the crash or generation of the dmp files (if it is not an expected part of the test), related item Find and archive any core / dmp files in post test step  #969. At present, we do not pre-count and report the number of tests before we run them, in fact, we wrote the framework to easily pick up any tests that are present or newly added in the test directory (to make it easier for developers to add tests. I had adding a step at the beginning to count all of the test before we started the run, but decided against it for a number of reasons, including trying to keep execution time to a minimum. We can look at it again, to see what it will cost us to pre-count.

  2. The second part is that the plugin is printing the confusing message. Is there a way to turn off the verbosity of the plugin, that people may mistake for framework output?

@pshipton
Copy link
Member Author

Note in this case the dump files are expected (not sure the test should be creating them because this is a lot of overhead, but currently it does). The problem is the testing suddenly stops, i.e. the VM suddenly exits while in the process of creating the expected dump files (#968). This doesn't seem to result in the running test failing, and the test summary (showing no failures) could be mistaken for a passing run, although overall the jenkins build failed.

@DanHeidinga
Copy link
Member

Marking this as high priority as it is negatively affecting PR and nightly builds.

@smlambert
Copy link
Contributor

smlambert commented Feb 13, 2018

FYI @renfeiw

I think there are a couple of things going on:

  1. the JCL suite is likely be too large and occasionally causing an OOM error (a known testng bug, TestNG get java.lang.OutOfMemoryError: Java heap space ( OOME ) on testng-6.9.14-20161130 testng-team/testng#1259, where the workaround is to break the test suites up into smaller pieces so testng doesn't store so much test result output in memory before it writes the output to files) - when OOM is seen as reported above, its often in JCL tests - so we should proactively break that suite down into a few smaller sizes (this will help with parallelism later anyway) <-- ignore this for now, I originally made an assumption that the OOM was related to this (since the test framework exits, which is not typical behaviour in the case of core dumps in tests), but the cause of the OOM is known, it is intentionally thrown by the test

  2. testkitgen / resultsSummary should check if core dumps exist, and fail if so (we will need to make sure tests that create core dumps on purpose clean them up so they do not get included in the set being considered as proof of failure)

  3. archive core files in Jenkins scripts in event of failure (this can be done in a separate PR)

@pshipton
Copy link
Member Author

pshipton commented Feb 13, 2018

I believe the OOM which is occurring is intentionally caused by a test, see #968. The VM is exiting unexpectedly, which should not occur, but OpenJ9 doesn't yet have the diagnostics support to pursue the issue, although it is expected within a few weeks. Breaking up the tests suites may or may not help. Running the OOM test with -Xdump:none may or may not help.

renfeiw added a commit to renfeiw/openj9 that referenced this issue Feb 13, 2018
- Separate OutOfMemoryErrorTest from JCL_Test.
- Add -Xdump:none to OutOfMemoryErrorTest (issue eclipse-openj9#981)

Signed-off-by: Renfei Wang <renfeiw@ca.ibm.com>
renfeiw added a commit to renfeiw/openj9 that referenced this issue Feb 13, 2018
- Separate OutOfMemoryErrorTest from JCL_Test.
- Add -Xdump:none to OutOfMemoryErrorTest (issue eclipse-openj9#981)

Signed-off-by: Renfei Wang <renfeiw@ca.ibm.com>
@pshipton
Copy link
Member Author

I think we should re-open this Issue. The problem described hasn't been resolved, only the current issue causing this problem may have been resolved. The PR testing could be aborted for other reasons, and the messages would still be confusing.

@pshipton pshipton reopened this Feb 13, 2018
@smlambert
Copy link
Contributor

Yes, this issue auto-closed, but since it is describing several problems it should not be closed or better yet, separate issues should be raised.

@llxia
Copy link
Contributor

llxia commented Feb 20, 2018

TAP result or string (i.e., TOTAL: 102 EXECUTED: 100 PASSED: 100 FAILED: 0 SKIPPED: 2 ) from console contains all test results and is sufficient to use as success criteria for Jenkins build. testng result and junit result do not include all tests. And all test results can be fund in TAP result or console output.
I am wondering that do we really need to show testng result and junit result? Could we just remove them to avoid confusion?

@AdamBrousseau
Copy link
Contributor

do we really need to show testng result and junit result? Could we just remove them to avoid confusion?

We can. I'm not sure what the Add test result one liner to the PR will look like. Currently it is updating with the JUnit numbers. I think before we added JUnit we just had testNG which gave slightly different numbers. I can run a quick test to see.

@smlambert
Copy link
Contributor

We should definitely not be using JUnit numbers for the "Add test result one liner" value, as that is incorrect and the root cause of the confusion reported in this issue.

@smlambert
Copy link
Contributor

One potential value of showing Junit results via the Junit plugin is that its a nicer GUI for developers to view the test results of the tests that produce that type of output (which is all the testng tests, all the openjdk regression tests, but doesn't include the cmdlinetester tests or any STF / SV tests, or JCK tests).

AdamBrousseau added a commit to AdamBrousseau/openj9 that referenced this issue Feb 20, 2018
- Remove JUnit and TestNG result publishing

[ci skip]

Issue eclipse-openj9#981

Signed-off-by: Adam Brousseau <adam.brousseau88@gmail.com>
@AdamBrousseau
Copy link
Contributor

AdamBrousseau commented Feb 20, 2018

Build finished. No test results found.

#1214

@llxia
Copy link
Contributor

llxia commented Feb 21, 2018

You have to specify failIfNoResults:true. Otherwise, build will pass if there is no TAP result.

step([$class: “TapPublisher”, testResults: “**/*.tap”, failIfNoResults:true])])

@AdamBrousseau
Copy link
Contributor

ok, we should add that, however there were results. I guess the ghpr plugin doesn't recognise TAP?

@AdamBrousseau
Copy link
Contributor

supposed to have been solved
jenkinsci/ghprb-plugin#168

@smlambert
Copy link
Contributor

If ghprb-plugin#168 doesn't resolve it, I think we can/should just turn off that counter off to avoid confusion (as briefly discussed today).

I believe all other parts of this issue have been addressed.

Does anyone object to this issue being closed? I will plan to close it late tomorrow, if not.

@pshipton
Copy link
Member Author

A recent PR build https://ci.eclipse.org/openj9/job/PullRequest-Sanity-JDK9-linux_ppc-64_cmprssptrs_le-OpenJ9/183/ shows 'Build finished. 19982 tests run, 6 skipped, 0 failed.' although 1 test failed.

@AdamBrousseau
Copy link
Contributor

Right, because the failed test only generated TAP output and the 'test results one-liner' is junit results. Which is one of the reasons why we should turn it off as Shelley suggested in the previous comment.

@smlambert
Copy link
Contributor

@AdamBrousseau - can you remove the one-liner or show me where to do that?

@AdamBrousseau
Copy link
Contributor

remove the one-liner

Done

yanluo7 pushed a commit to yanluo7/openj9 that referenced this issue Mar 28, 2018
- Separate OutOfMemoryErrorTest from JCL_Test.
- Add -Xdump:none to OutOfMemoryErrorTest (issue eclipse-openj9#981)

Signed-off-by: Renfei Wang <renfeiw@ca.ibm.com>
@smlambert
Copy link
Contributor

@pshipton - please confirm that this issue can be close once you confirm that the confusing message is no longer present

@pshipton
Copy link
Member Author

pshipton commented Apr 5, 2018

I'll have to see a failing PR build to be 100% sure, but a passing build no longer has a passed/failed summary at the end. I'll close this assuming the problem is resolved and reopen later if necessary.

@pshipton pshipton closed this as completed Apr 5, 2018
@pshipton
Copy link
Member Author

pshipton commented Apr 5, 2018

I found a failed PR build, and it also looked good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants