Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: make test-cluster-disconnect-leak reliable #4736

Closed
wants to merge 2 commits into from

Conversation

Trott
Copy link
Member

@Trott Trott commented Jan 18, 2016

Previously, test-cluster-disconnect-leak had two issues:

  • Magic numbers: How many times to spawn a worker was determined through
    empirical experimentation. This means that as new platforms and new
    CPU/RAM configurations are tested, the magic numbers require more
    and more refinement. This brings us to...
  • Non-determinism: The test seems to fail all the time when the bug
    it tests for is present, but it's really a judgment based on sampling.
    "Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try
    16..."

This revised version of the test takes a different approach. The fix
for the bug that the test was written for means that the disconnect
event will fire reliably for a single worker. So we check for that and
the test still fails when the fix is not in the code base and succeeds
when it is.

Advantages of this approach include:

  • The test runs much faster.
  • The test now works on Windows. The previous version skipped Windows.
  • The test should be reliable on any new platform regardless of CPU and
    RAM.

Ref: #4674

cc @santigimeno @iwuzhere

@Trott Trott added cluster Issues and PRs related to the cluster subsystem. test Issues and PRs related to the tests. lts-watch-v4.x labels Jan 18, 2016
Previously, test-cluster-disconnect-leak had two issues:

* Magic numbers: How many times to spawn a worker was determined through
empirical experimentation. This means that as new platforms and new
CPU/RAM configurations are tested, the magic numbers require more
and more refinement. This brings us to...

* Non-determinism: The test *seems* to fail all the time when the bug
it tests for is present, but it's really a judgment based on sampling.
"Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try
16..."

This revised version of the test takes a different approach. The fix
for the bug that the test was written for means that the `disconnect`
event will fire reliably for a single worker. So we check for that and
the test still fails when the fix is not in the code base and succeeds
when it is.

Advantages of this approach include:

* The test runs much faster.
* The test now works on Windows. The previous version skipped Windows.
* The test should be reliable on any new platform regardless of CPU and
RAM.

Ref: nodejs#4674
@Trott
Copy link
Member Author

Trott commented Jan 18, 2016

When I comment out the three-line fix that the test was originally written for, everything fails as expected:

https://ci.nodejs.org/job/node-test-commit/1799/

When I leave the fix in but use this test, everything succeeds as expected:

https://ci.nodejs.org/job/node-test-commit/1801/ (Windows failure is buildbot and unrelated, you can see this test passing on the three Windows variants as test number 237 in matrix host 1, buildbot failed on matrix 4 where this test wouldn't have run anyway)

Similarly, on my machine, when I run the test with Node 5.4.0, it fails (because that version has the bug) and when I run the test with Node 5.4.1, it passes.

@Trott
Copy link
Member Author

Trott commented Jan 18, 2016

@Trott
Copy link
Member Author

Trott commented Jan 18, 2016

If this lands, a next improvement might be to move this to parallel because it is no longer resource intensive.

@Trott
Copy link
Member Author

Trott commented Jan 18, 2016

That last CI had a buildbot failure, so a re-run just to be sure:

https://ci.nodejs.org/job/node-test-commit/1804/

And, it's all green! \o/

@bnoordhuis
Copy link
Member

LGTM

return;
}

var server = net.createServer();
const server = net.createServer();

server.listen(common.PORT, function() {
process.send('listening');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just server.listen(common.PORT); should work?

@santigimeno
Copy link
Member

LGTM with 2 suggestions. Thanks for improving both tests!

@jbergstroem
Copy link
Member

LGTM. Could you move it to parallel as well?

@cjihrig
Copy link
Contributor

cjihrig commented Jan 18, 2016

LGTM. This is far simpler.

@jasnell
Copy link
Member

jasnell commented Jan 18, 2016

LGTM

@Trott
Copy link
Member Author

Trott commented Jan 18, 2016

Made changes per the suggestions from @santigimeno

CI: https://ci.nodejs.org/job/node-test-commit/1829/

All green! \o/

jasnell pushed a commit that referenced this pull request Jan 19, 2016
Previously, test-cluster-disconnect-leak had two issues:

* Magic numbers: How many times to spawn a worker was determined through
empirical experimentation. This means that as new platforms and new
CPU/RAM configurations are tested, the magic numbers require more
and more refinement. This brings us to...

* Non-determinism: The test *seems* to fail all the time when the bug
it tests for is present, but it's really a judgment based on sampling.
"Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try
16..."

This revised version of the test takes a different approach. The fix
for the bug that the test was written for means that the `disconnect`
event will fire reliably for a single worker. So we check for that and
the test still fails when the fix is not in the code base and succeeds
when it is.

Advantages of this approach include:

* The test runs much faster.
* The test now works on Windows. The previous version skipped Windows.
* The test should be reliable on any new platform regardless of CPU and
RAM.

Ref: #4674

PR-URL: #4736
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
@jasnell
Copy link
Member

jasnell commented Jan 19, 2016

Landed in d5c525d

@jasnell jasnell closed this Jan 19, 2016
evanlucas pushed a commit that referenced this pull request Jan 19, 2016
Previously, test-cluster-disconnect-leak had two issues:

* Magic numbers: How many times to spawn a worker was determined through
empirical experimentation. This means that as new platforms and new
CPU/RAM configurations are tested, the magic numbers require more
and more refinement. This brings us to...

* Non-determinism: The test *seems* to fail all the time when the bug
it tests for is present, but it's really a judgment based on sampling.
"Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try
16..."

This revised version of the test takes a different approach. The fix
for the bug that the test was written for means that the `disconnect`
event will fire reliably for a single worker. So we check for that and
the test still fails when the fix is not in the code base and succeeds
when it is.

Advantages of this approach include:

* The test runs much faster.
* The test now works on Windows. The previous version skipped Windows.
* The test should be reliable on any new platform regardless of CPU and
RAM.

Ref: #4674

PR-URL: #4736
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
Trott added a commit to Trott/io.js that referenced this pull request Jan 20, 2016
Two cluster tests have recently changed so that they are no longer
resource intensive. Move them back to parallel.

Ref: nodejs#4736
Ref: nodejs#4739
evanlucas pushed a commit that referenced this pull request Jan 20, 2016
Previously, test-cluster-disconnect-leak had two issues:

* Magic numbers: How many times to spawn a worker was determined through
empirical experimentation. This means that as new platforms and new
CPU/RAM configurations are tested, the magic numbers require more
and more refinement. This brings us to...

* Non-determinism: The test *seems* to fail all the time when the bug
it tests for is present, but it's really a judgment based on sampling.
"Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try
16..."

This revised version of the test takes a different approach. The fix
for the bug that the test was written for means that the `disconnect`
event will fire reliably for a single worker. So we check for that and
the test still fails when the fix is not in the code base and succeeds
when it is.

Advantages of this approach include:

* The test runs much faster.
* The test now works on Windows. The previous version skipped Windows.
* The test should be reliable on any new platform regardless of CPU and
RAM.

Ref: #4674

PR-URL: #4736
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
Trott added a commit to Trott/io.js that referenced this pull request Jan 21, 2016
Two cluster tests have recently changed so that they are no longer
resource intensive. Move them back to parallel.

Ref: nodejs#4736
Ref: nodejs#4739
PR-URL: nodejs#4774
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
rvagg pushed a commit that referenced this pull request Jan 25, 2016
Two cluster tests have recently changed so that they are no longer
resource intensive. Move them back to parallel.

Ref: #4736
Ref: #4739
PR-URL: #4774
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
MylesBorins pushed a commit that referenced this pull request Jan 28, 2016
Previously, test-cluster-disconnect-leak had two issues:

* Magic numbers: How many times to spawn a worker was determined through
empirical experimentation. This means that as new platforms and new
CPU/RAM configurations are tested, the magic numbers require more
and more refinement. This brings us to...

* Non-determinism: The test *seems* to fail all the time when the bug
it tests for is present, but it's really a judgment based on sampling.
"Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try
16..."

This revised version of the test takes a different approach. The fix
for the bug that the test was written for means that the `disconnect`
event will fire reliably for a single worker. So we check for that and
the test still fails when the fix is not in the code base and succeeds
when it is.

Advantages of this approach include:

* The test runs much faster.
* The test now works on Windows. The previous version skipped Windows.
* The test should be reliable on any new platform regardless of CPU and
RAM.

Ref: #4674

PR-URL: #4736
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
MylesBorins pushed a commit that referenced this pull request Feb 11, 2016
Previously, test-cluster-disconnect-leak had two issues:

* Magic numbers: How many times to spawn a worker was determined through
empirical experimentation. This means that as new platforms and new
CPU/RAM configurations are tested, the magic numbers require more
and more refinement. This brings us to...

* Non-determinism: The test *seems* to fail all the time when the bug
it tests for is present, but it's really a judgment based on sampling.
"Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try
16..."

This revised version of the test takes a different approach. The fix
for the bug that the test was written for means that the `disconnect`
event will fire reliably for a single worker. So we check for that and
the test still fails when the fix is not in the code base and succeeds
when it is.

Advantages of this approach include:

* The test runs much faster.
* The test now works on Windows. The previous version skipped Windows.
* The test should be reliable on any new platform regardless of CPU and
RAM.

Ref: #4674

PR-URL: #4736
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
MylesBorins pushed a commit to MylesBorins/node that referenced this pull request Feb 11, 2016
Previously, test-cluster-disconnect-leak had two issues:

* Magic numbers: How many times to spawn a worker was determined through
empirical experimentation. This means that as new platforms and new
CPU/RAM configurations are tested, the magic numbers require more
and more refinement. This brings us to...

* Non-determinism: The test *seems* to fail all the time when the bug
it tests for is present, but it's really a judgment based on sampling.
"Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try
16..."

This revised version of the test takes a different approach. The fix
for the bug that the test was written for means that the `disconnect`
event will fire reliably for a single worker. So we check for that and
the test still fails when the fix is not in the code base and succeeds
when it is.

Advantages of this approach include:

* The test runs much faster.
* The test now works on Windows. The previous version skipped Windows.
* The test should be reliable on any new platform regardless of CPU and
RAM.

Ref: nodejs#4674

PR-URL: nodejs#4736
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
@MylesBorins MylesBorins mentioned this pull request Feb 11, 2016
MylesBorins pushed a commit to MylesBorins/node that referenced this pull request Feb 13, 2016
Previously, test-cluster-disconnect-leak had two issues:

* Magic numbers: How many times to spawn a worker was determined through
empirical experimentation. This means that as new platforms and new
CPU/RAM configurations are tested, the magic numbers require more
and more refinement. This brings us to...

* Non-determinism: The test *seems* to fail all the time when the bug
it tests for is present, but it's really a judgment based on sampling.
"Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try
16..."

This revised version of the test takes a different approach. The fix
for the bug that the test was written for means that the `disconnect`
event will fire reliably for a single worker. So we check for that and
the test still fails when the fix is not in the code base and succeeds
when it is.

Advantages of this approach include:

* The test runs much faster.
* The test now works on Windows. The previous version skipped Windows.
* The test should be reliable on any new platform regardless of CPU and
RAM.

Ref: nodejs#4674

PR-URL: nodejs#4736
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
MylesBorins pushed a commit to MylesBorins/node that referenced this pull request Feb 15, 2016
Previously, test-cluster-disconnect-leak had two issues:

* Magic numbers: How many times to spawn a worker was determined through
empirical experimentation. This means that as new platforms and new
CPU/RAM configurations are tested, the magic numbers require more
and more refinement. This brings us to...

* Non-determinism: The test *seems* to fail all the time when the bug
it tests for is present, but it's really a judgment based on sampling.
"Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try
16..."

This revised version of the test takes a different approach. The fix
for the bug that the test was written for means that the `disconnect`
event will fire reliably for a single worker. So we check for that and
the test still fails when the fix is not in the code base and succeeds
when it is.

Advantages of this approach include:

* The test runs much faster.
* The test now works on Windows. The previous version skipped Windows.
* The test should be reliable on any new platform regardless of CPU and
RAM.

Ref: nodejs#4674

PR-URL: nodejs#4736
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
MylesBorins pushed a commit that referenced this pull request Mar 17, 2016
Two cluster tests have recently changed so that they are no longer
resource intensive. Move them back to parallel.

Ref: #4736
Ref: #4739
PR-URL: #4774
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
MylesBorins pushed a commit that referenced this pull request Mar 21, 2016
Two cluster tests have recently changed so that they are no longer
resource intensive. Move them back to parallel.

Ref: #4736
Ref: #4739
PR-URL: #4774
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
scovetta pushed a commit to scovetta/node that referenced this pull request Apr 2, 2016
Previously, test-cluster-disconnect-leak had two issues:

* Magic numbers: How many times to spawn a worker was determined through
empirical experimentation. This means that as new platforms and new
CPU/RAM configurations are tested, the magic numbers require more
and more refinement. This brings us to...

* Non-determinism: The test *seems* to fail all the time when the bug
it tests for is present, but it's really a judgment based on sampling.
"Oh, with 8 workers per CPU, it fails about 80% of the time. Let's try
16..."

This revised version of the test takes a different approach. The fix
for the bug that the test was written for means that the `disconnect`
event will fire reliably for a single worker. So we check for that and
the test still fails when the fix is not in the code base and succeeds
when it is.

Advantages of this approach include:

* The test runs much faster.
* The test now works on Windows. The previous version skipped Windows.
* The test should be reliable on any new platform regardless of CPU and
RAM.

Ref: nodejs#4674

PR-URL: nodejs#4736
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
scovetta pushed a commit to scovetta/node that referenced this pull request Apr 2, 2016
Two cluster tests have recently changed so that they are no longer
resource intensive. Move them back to parallel.

Ref: nodejs#4736
Ref: nodejs#4739
PR-URL: nodejs#4774
Reviewed-By: Johan Bergström <bugs@bergstroem.nu>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
@Trott Trott deleted the better-leak-test branch January 13, 2022 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cluster Issues and PRs related to the cluster subsystem. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants