test: make test-cluster-disconnect-leak reliable #4736

Trott · 2016-01-18T05:02:39Z

Previously, test-cluster-disconnect-leak had two issues:

Magic numbers: How many times to spawn a worker was determined through
empirical experimentation. This means that as new platforms and new
CPU/RAM configurations are tested, the magic numbers require more
and more refinement. This brings us to...
Non-determinism: The test seems to fail all the time when the bug
it tests for is present, but it's really a judgment based on sampling.
"Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try
16..."

This revised version of the test takes a different approach. The fix
for the bug that the test was written for means that the disconnect
event will fire reliably for a single worker. So we check for that and
the test still fails when the fix is not in the code base and succeeds
when it is.

Advantages of this approach include:

The test runs much faster.
The test now works on Windows. The previous version skipped Windows.
The test should be reliable on any new platform regardless of CPU and
RAM.

Ref: #4674

cc @santigimeno @iwuzhere

Previously, test-cluster-disconnect-leak had two issues: * Magic numbers: How many times to spawn a worker was determined through empirical experimentation. This means that as new platforms and new CPU/RAM configurations are tested, the magic numbers require more and more refinement. This brings us to... * Non-determinism: The test *seems* to fail all the time when the bug it tests for is present, but it's really a judgment based on sampling. "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try 16..." This revised version of the test takes a different approach. The fix for the bug that the test was written for means that the `disconnect` event will fire reliably for a single worker. So we check for that and the test still fails when the fix is not in the code base and succeeds when it is. Advantages of this approach include: * The test runs much faster. * The test now works on Windows. The previous version skipped Windows. * The test should be reliable on any new platform regardless of CPU and RAM. Ref: nodejs#4674

Trott · 2016-01-18T05:09:52Z

When I comment out the three-line fix that the test was originally written for, everything fails as expected:

https://ci.nodejs.org/job/node-test-commit/1799/

When I leave the fix in but use this test, everything succeeds as expected:

https://ci.nodejs.org/job/node-test-commit/1801/ (Windows failure is buildbot and unrelated, you can see this test passing on the three Windows variants as test number 237 in matrix host 1, buildbot failed on matrix 4 where this test wouldn't have run anyway)

Similarly, on my machine, when I run the test with Node 5.4.0, it fails (because that version has the bug) and when I run the test with Node 5.4.1, it passes.

Trott · 2016-01-18T05:14:17Z

CI: https://ci.nodejs.org/job/node-test-pull-request/1283/

Trott · 2016-01-18T05:15:57Z

If this lands, a next improvement might be to move this to parallel because it is no longer resource intensive.

Trott · 2016-01-18T06:32:33Z

That last CI had a buildbot failure, so a re-run just to be sure:

https://ci.nodejs.org/job/node-test-commit/1804/

And, it's all green! \o/

bnoordhuis · 2016-01-18T08:49:36Z

LGTM

santigimeno · 2016-01-18T10:10:33Z

test/sequential/test-cluster-disconnect-leak.js

  return;
 }

-var server = net.createServer();
+const server = net.createServer();

 server.listen(common.PORT, function() {
  process.send('listening');


Maybe just server.listen(common.PORT); should work?

santigimeno · 2016-01-18T10:15:05Z

LGTM with 2 suggestions. Thanks for improving both tests!

jbergstroem · 2016-01-18T10:36:33Z

LGTM. Could you move it to parallel as well?

cjihrig · 2016-01-18T17:14:14Z

LGTM. This is far simpler.

jasnell · 2016-01-18T17:24:11Z

LGTM

Trott · 2016-01-18T19:18:50Z

Made changes per the suggestions from @santigimeno

CI: https://ci.nodejs.org/job/node-test-commit/1829/

All green! \o/

Previously, test-cluster-disconnect-leak had two issues: * Magic numbers: How many times to spawn a worker was determined through empirical experimentation. This means that as new platforms and new CPU/RAM configurations are tested, the magic numbers require more and more refinement. This brings us to... * Non-determinism: The test *seems* to fail all the time when the bug it tests for is present, but it's really a judgment based on sampling. "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try 16..." This revised version of the test takes a different approach. The fix for the bug that the test was written for means that the `disconnect` event will fire reliably for a single worker. So we check for that and the test still fails when the fix is not in the code base and succeeds when it is. Advantages of this approach include: * The test runs much faster. * The test now works on Windows. The previous version skipped Windows. * The test should be reliable on any new platform regardless of CPU and RAM. Ref: #4674 PR-URL: #4736 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu>

jasnell · 2016-01-19T00:25:34Z

Landed in d5c525d

Previously, test-cluster-disconnect-leak had two issues: * Magic numbers: How many times to spawn a worker was determined through empirical experimentation. This means that as new platforms and new CPU/RAM configurations are tested, the magic numbers require more and more refinement. This brings us to... * Non-determinism: The test *seems* to fail all the time when the bug it tests for is present, but it's really a judgment based on sampling. "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try 16..." This revised version of the test takes a different approach. The fix for the bug that the test was written for means that the `disconnect` event will fire reliably for a single worker. So we check for that and the test still fails when the fix is not in the code base and succeeds when it is. Advantages of this approach include: * The test runs much faster. * The test now works on Windows. The previous version skipped Windows. * The test should be reliable on any new platform regardless of CPU and RAM. Ref: #4674 PR-URL: #4736 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu>

Two cluster tests have recently changed so that they are no longer resource intensive. Move them back to parallel. Ref: nodejs#4736 Ref: nodejs#4739

Previously, test-cluster-disconnect-leak had two issues: * Magic numbers: How many times to spawn a worker was determined through empirical experimentation. This means that as new platforms and new CPU/RAM configurations are tested, the magic numbers require more and more refinement. This brings us to... * Non-determinism: The test *seems* to fail all the time when the bug it tests for is present, but it's really a judgment based on sampling. "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try 16..." This revised version of the test takes a different approach. The fix for the bug that the test was written for means that the `disconnect` event will fire reliably for a single worker. So we check for that and the test still fails when the fix is not in the code base and succeeds when it is. Advantages of this approach include: * The test runs much faster. * The test now works on Windows. The previous version skipped Windows. * The test should be reliable on any new platform regardless of CPU and RAM. Ref: #4674 PR-URL: #4736 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu>

Two cluster tests have recently changed so that they are no longer resource intensive. Move them back to parallel. Ref: nodejs#4736 Ref: nodejs#4739 PR-URL: nodejs#4774 Reviewed-By: Johan Bergström <bugs@bergstroem.nu> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>

Two cluster tests have recently changed so that they are no longer resource intensive. Move them back to parallel. Ref: #4736 Ref: #4739 PR-URL: #4774 Reviewed-By: Johan Bergström <bugs@bergstroem.nu> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>

Previously, test-cluster-disconnect-leak had two issues: * Magic numbers: How many times to spawn a worker was determined through empirical experimentation. This means that as new platforms and new CPU/RAM configurations are tested, the magic numbers require more and more refinement. This brings us to... * Non-determinism: The test *seems* to fail all the time when the bug it tests for is present, but it's really a judgment based on sampling. "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try 16..." This revised version of the test takes a different approach. The fix for the bug that the test was written for means that the `disconnect` event will fire reliably for a single worker. So we check for that and the test still fails when the fix is not in the code base and succeeds when it is. Advantages of this approach include: * The test runs much faster. * The test now works on Windows. The previous version skipped Windows. * The test should be reliable on any new platform regardless of CPU and RAM. Ref: #4674 PR-URL: #4736 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu>

Previously, test-cluster-disconnect-leak had two issues: * Magic numbers: How many times to spawn a worker was determined through empirical experimentation. This means that as new platforms and new CPU/RAM configurations are tested, the magic numbers require more and more refinement. This brings us to... * Non-determinism: The test *seems* to fail all the time when the bug it tests for is present, but it's really a judgment based on sampling. "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try 16..." This revised version of the test takes a different approach. The fix for the bug that the test was written for means that the `disconnect` event will fire reliably for a single worker. So we check for that and the test still fails when the fix is not in the code base and succeeds when it is. Advantages of this approach include: * The test runs much faster. * The test now works on Windows. The previous version skipped Windows. * The test should be reliable on any new platform regardless of CPU and RAM. Ref: nodejs#4674 PR-URL: nodejs#4736 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu>

Two cluster tests have recently changed so that they are no longer resource intensive. Move them back to parallel. Ref: #4736 Ref: #4739 PR-URL: #4774 Reviewed-By: Johan Bergström <bugs@bergstroem.nu> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>

Previously, test-cluster-disconnect-leak had two issues: * Magic numbers: How many times to spawn a worker was determined through empirical experimentation. This means that as new platforms and new CPU/RAM configurations are tested, the magic numbers require more and more refinement. This brings us to... * Non-determinism: The test *seems* to fail all the time when the bug it tests for is present, but it's really a judgment based on sampling. "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try 16..." This revised version of the test takes a different approach. The fix for the bug that the test was written for means that the `disconnect` event will fire reliably for a single worker. So we check for that and the test still fails when the fix is not in the code base and succeeds when it is. Advantages of this approach include: * The test runs much faster. * The test now works on Windows. The previous version skipped Windows. * The test should be reliable on any new platform regardless of CPU and RAM. Ref: nodejs#4674 PR-URL: nodejs#4736 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu>

Two cluster tests have recently changed so that they are no longer resource intensive. Move them back to parallel. Ref: nodejs#4736 Ref: nodejs#4739 PR-URL: nodejs#4774 Reviewed-By: Johan Bergström <bugs@bergstroem.nu> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>

Trott added cluster Issues and PRs related to the cluster subsystem. test Issues and PRs related to the tests. lts-watch-v4.x labels Jan 18, 2016

Trott mentioned this pull request Jan 18, 2016

test: fix test-cluster-disconnect-leak.js for AIX #4674

Closed

Trott force-pushed the better-leak-test branch from a245cbf to 0fe44bc Compare January 18, 2016 05:09

santigimeno reviewed Jan 18, 2016
View reviewed changes

Trott mentioned this pull request Jan 18, 2016

test: improve test-cluster-disconnect-suicide-race #4739

Closed

fixup per Santi

5b7cf25

jasnell closed this Jan 19, 2016

evanlucas mentioned this pull request Jan 19, 2016

v5.5.0 Release Proposal (Stable) #4742

Merged

Trott mentioned this pull request Jan 20, 2016

test: move cluster tests to parallel #4774

Closed

Trott added a commit to Trott/io.js that referenced this pull request Jan 20, 2016

test: move cluster tests to parallel

097ea9f

Two cluster tests have recently changed so that they are no longer resource intensive. Move them back to parallel. Ref: nodejs#4736 Ref: nodejs#4739

MylesBorins added land-on-v4.x and removed lts-watch-v4.x labels Jan 28, 2016

MylesBorins mentioned this pull request Feb 11, 2016

V4.3.1 proposal #5200

Merged

This was referenced Oct 16, 2021

[Snyk] Security upgrade mocha from 7.2.0 to 9.1.2 baby636/node#32

Open

[Snyk] Security upgrade mocha from 7.2.0 to 9.1.2 ryan-ally/node#49

Open

[Snyk] Security upgrade mocha from 7.2.0 to 9.1.2 XirdigH/node-22#69

Open

Trott deleted the better-leak-test branch January 13, 2022 22:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: make test-cluster-disconnect-leak reliable #4736

test: make test-cluster-disconnect-leak reliable #4736

Trott commented Jan 18, 2016

Trott commented Jan 18, 2016

Trott commented Jan 18, 2016

Trott commented Jan 18, 2016

Trott commented Jan 18, 2016

bnoordhuis commented Jan 18, 2016

santigimeno Jan 18, 2016

santigimeno commented Jan 18, 2016

jbergstroem commented Jan 18, 2016

cjihrig commented Jan 18, 2016

jasnell commented Jan 18, 2016

Trott commented Jan 18, 2016

jasnell commented Jan 19, 2016

test: make test-cluster-disconnect-leak reliable #4736

test: make test-cluster-disconnect-leak reliable #4736

Conversation

Trott commented Jan 18, 2016

Trott commented Jan 18, 2016

Trott commented Jan 18, 2016

Trott commented Jan 18, 2016

Trott commented Jan 18, 2016

bnoordhuis commented Jan 18, 2016

santigimeno Jan 18, 2016

Choose a reason for hiding this comment

santigimeno commented Jan 18, 2016

jbergstroem commented Jan 18, 2016

cjihrig commented Jan 18, 2016

jasnell commented Jan 18, 2016

Trott commented Jan 18, 2016

jasnell commented Jan 19, 2016