Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: fix flaky test-http2-server-rst-stream.js #16690

Conversation

apapirovski
Copy link
Member

I believe this test has some problems with concurrency but not 100% certain yet. Previous version was failing when I ran a heavy duty stress test on my machine but it's also possible there's something wrong in the http2 code. (Will run stress test CIs to try & confirm.)

Please don't approve until stress tests are done. I'm not even sure myself if this is correct. 😆

Fixes: #16688

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

test

@nodejs-github-bot nodejs-github-bot added the test Issues and PRs related to the tests. label Nov 2, 2017
@apapirovski
Copy link
Member Author

apapirovski commented Nov 2, 2017

@apapirovski
Copy link
Member Author

@mscdex mscdex added the http2 Issues or PRs related to the http2 subsystem. label Nov 2, 2017
@jasnell
Copy link
Member

jasnell commented Nov 2, 2017

Definitely still flaking out.

@apapirovski
Copy link
Member Author

It's a lot less flaky now but yeah. I have another version that does process.nextTick(runTest); which I think will always pass but the real question here is why in the world is it flaky in the first place. The sequence of events here should never lead to the wrong rstCode being returned.

@jasnell
Copy link
Member

jasnell commented Nov 2, 2017

I suspect that it's just a problem with the test logic itself. I'll play around with it a bit also

@apapirovski
Copy link
Member Author

apapirovski commented Nov 2, 2017

That's what I thought but I'm not sure after changing it up. I feel like there might be a race condition in the http2 code itself that leads to a stream destruction before onSessionStreamClose.

Maybe there's a goaway that gets processed first or something similar. Edit: But that doesn't make much sense either...

@jasnell
Copy link
Member

jasnell commented Nov 2, 2017

I don't believe so... I think the logic in the test is just too overcomplicated.... try something like this:

'use strict';

const common = require('../common');
if (!common.hasCrypto)
  common.skip('missing crypto');
const assert = require('assert');
const http2 = require('http2');
const Countdown = require('../common/countdown');

const {
  HTTP2_HEADER_METHOD,
  HTTP2_HEADER_PATH,
  HTTP2_METHOD_POST,
  NGHTTP2_CANCEL,
  NGHTTP2_NO_ERROR,
  NGHTTP2_PROTOCOL_ERROR,
  NGHTTP2_REFUSED_STREAM,
  NGHTTP2_INTERNAL_ERROR
} = http2.constants;

const tests = [
  ['rstStream', NGHTTP2_NO_ERROR, false],
  ['rstWithNoError', NGHTTP2_NO_ERROR, false],
  ['rstWithProtocolError', NGHTTP2_PROTOCOL_ERROR, true],
  ['rstWithCancel', NGHTTP2_CANCEL, false],
  ['rstWithRefuse', NGHTTP2_REFUSED_STREAM, true],
  ['rstWithInternalError', NGHTTP2_INTERNAL_ERROR, true]
];

const server = http2.createServer();
server.on('stream', (stream, headers) => {
  const method = headers['rstmethod'];
  stream[method]();
});

server.listen(0, common.mustCall(() => {
  const client = http2.connect(`http://localhost:${server.address().port}`);

  const countdown = new Countdown(tests.length, common.mustCall(() => {
    client.destroy();
    server.close();
  }));

  tests.forEach((i) => {
    const req = client.request({
      ':method': 'POST',
      rstmethod: i[0]
    });
    req.on('streamClosed', common.mustCall((code) => {
      assert.strictEqual(code, i[1]);
      countdown.dec();
    }));
    req.on('aborted', common.mustCall());
    if (i[2]) req.on('error', common.mustCall());
  });
}));

@jasnell
Copy link
Member

jasnell commented Nov 2, 2017

That needs a bit of linting :-)

@apapirovski
Copy link
Member Author

@apapirovski
Copy link
Member Author

apapirovski commented Nov 2, 2017

So are we assuming that the problem with the previous version was related to the unique connections for each test run (presumably it didn't always connect and went straight to destroy, hence streamClosed with code 0)? Because it doesn't feel like the respond, write, etc. should have any impact on the test... even though I agree they're not core to what its testing (and make it unnecessarily complicated).

@jasnell
Copy link
Member

jasnell commented Nov 2, 2017

I think the issue was the way the expected rstCode was being set in the test and a race condition in the streams. (a race condition in the test, that is, not in the implementation)

@apapirovski
Copy link
Member Author

apapirovski commented Nov 2, 2017

@jasnell hmmm... but the next test would only run once the streamClosed fired and the codes were compared (in my version, anyway). So it was very much sequential in that way. That's why the only thing that makes sense to me (if this version succeeds) is that it has to do with establishing the connection itself.

Also in both tests the failure was always with rstCode 0 and not any other. One would expect different rstCodes to be mixed in there if it was as a result of race conditions.

Sorry to keep digging into this, I just want to make sure we're clear on the exact issue we're addressing so that we're not potentially overlooking something.

@apapirovski
Copy link
Member Author

Stress test looks good so far, seems a good bet that it'll succeed.

@apapirovski
Copy link
Member Author

apapirovski commented Nov 2, 2017

Makes me think I should go through the rest of the tests and check for others that are trying to connect too many times (or create server multiple times in parallel). Doesn't seem like a good idea.

There's also at least one other test that's flaky on node-chakracore.

Copy link
Member

@jasnell jasnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the stress test comes back good, LGTM.
And +1 for fast tracking so we can get CI fixed. This has a pretty high failure rate.

@apapirovski apapirovski force-pushed the fix-test-http2-server-rst-stream branch from bd9c52c to bfbe413 Compare November 2, 2017 16:46
@apapirovski
Copy link
Member Author

Just tidied up a little bit and squashed everything.

Here's the full CI: https://ci.nodejs.org/job/node-test-pull-request/11157/

@jasnell
Copy link
Member

jasnell commented Nov 2, 2017

CI looks good on this one.
@mcollina @sebdeckers ... can I ask one or both of you to take a quick look. I'd like to get this landed early so we can stop seeing failures in CI

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

jasnell pushed a commit that referenced this pull request Nov 2, 2017
PR-URL: #16690
Fixes: #16688
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
@jasnell
Copy link
Member

jasnell commented Nov 2, 2017

Landed in 3a977fc

@jasnell jasnell closed this Nov 2, 2017
@yhwang yhwang mentioned this pull request Nov 2, 2017
2 tasks
cjihrig pushed a commit to cjihrig/node that referenced this pull request Nov 6, 2017
PR-URL: nodejs#16690
Fixes: nodejs#16688
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
@cjihrig cjihrig mentioned this pull request Nov 6, 2017
gibfahn pushed a commit that referenced this pull request Nov 14, 2017
PR-URL: #16690
Fixes: #16688
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
@gibfahn gibfahn mentioned this pull request Nov 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http2 Issues or PRs related to the http2 subsystem. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate flaky test-http2-server-rst-stream
7 participants