Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Allow all valid AIX rc in test-stdio-closed #10239

Closed
wants to merge 1 commit into from

Conversation

sxa
Copy link
Member

@sxa sxa commented Dec 12, 2016

Checklist
  • make -j4 test (UNIX), or vcbuild test nosign (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

test

Description of change

Fixes: #10234

Allow either of the two possible return codes on AIX to
be considered successful on test-stdio-closed.js. This is likely
an interim solution until we can be sure when AIX returns
one or the other, but I currently have systems that have both
and I don't want this failing on any of them.

Fixes: nodejs#10234

Allow either of the two possible return codes on AIX to
be considered successful on test-stdio-closed.js
@nodejs-github-bot nodejs-github-bot added lts-watch-v4.x test Issues and PRs related to the tests. labels Dec 12, 2016
@sam-github
Copy link
Contributor

sam-github commented Dec 12, 2016

LGTM, to get things stable.

The test could use a comment on what it is testing, but I git blamed it. It looks like its attempting to ensure that fd 1 and 2 are always valid, and get opened on /dev/null if they aren't valid at startup: b5f25a9

I wonder if the test would be more robust if instead of calling process.exit(), it let the output data drain, had not try/catches, and asserted an exit code of zero, to guarantee node flushed all the data. Is it possible that sometimes stream.write() isn't touching the fd right away, so isn't noticing whether its valid or not until later, by which time the process has exited already?

/cc @bnoordhuis

Copy link
Member

@Trott Trott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully only an interim solution, but LGTM as both results comply with spec, as far as I am able to tell.

@sam-github
Copy link
Contributor

@Trott 126 means stdout had a EBADF, which means its not working as intended, b5f25a9 is supposed to replace EBADF descriptors with a new descriptor open on /dev/null

I guess there is no way to just mark a test as skipped, so its more clearly something to come back to later?

@Trott
Copy link
Member

Trott commented Dec 12, 2016

@sam-github Not working as the Node.js code expects, but working in a correct fashion nonetheless. At least, that's my interpretation. Correction welcome. It seems to me that Node.js is trying to take advantage of a behavior that is permitted on other operating systems, but is not the only correct behavior.

@Trott
Copy link
Member

Trott commented Dec 12, 2016

@sam-github To get a little more specific about my understanding (and hopefully not my misunderstanding) about this issue: Node.js attempts to re-open the EBADF fd on /dev/null. POSIX allows the operating system to permit opening the closed stdio fd on /dev/null, but it does not require the operating system to permit it. So the test should accommodate the unexpected-by-Node.js behavior on AIX, at least as a minimal workaround.

(I'm certainly open to more robust solutions that might involve either addressing the issue on the C++ side or finding a way for the test to query the operating system ahead of time to see what the expected result should be.)

@Trott
Copy link
Member

Trott commented Dec 12, 2016

@sam-github wrote:

I guess there is no way to just mark a test as skipped, so its more clearly something to come back to later?

I think we could do something like this perhaps:

if (common.isAix and exitCode === 126) {
  common.skip('Skipping this test on AIX because blah blah something something.');
  return;
}

assert.strictEqual(exitCode, 42);

Not sure if there's an issue where that won't work in an exit handler, but I think it should.

@sam-github
Copy link
Contributor

POSIX allows the operating system to permit opening the closed stdio fd on /dev/null, but it does not require the operating system to permit it.

Its not a spec corner, its just a normal open of a file, and it works on AIX usually. Whatever is going on here smells more like a race condition. Or possibly a difference in allocation pattern of new descriptors, but it only fails sometimes.

date > /dev/null is an example of something that would be broken if you couldn't redirect stdout to dev/null, and that's POSIX shell syntax.

@mscdex mscdex added the aix Issues and PRs related to the AIX platform. label Dec 12, 2016
@Trott
Copy link
Member

Trott commented Dec 12, 2016

date > /dev/null is an example of something that would be broken if you couldn't redirect stdout to dev/null

I'm not saying you can't redirect stdout to /dev/null. I'm saying that it's not clear that you must be permitted to re-open the stdout fd on /dev/null after that fd has been explicitly closed. Or perhaps a bit more specifically, it's not clear that open("/dev/null", O_RDWR) is required to return fd 1 after fd 1 has been closed. Maybe the OS considers fd 1 unavailable and instead returns 3 or whatever.

Interestingly, the open group man page for open() says it returns the lowest FD not currently open for the process. But AIX's man page for open() says it returns the lowest FD not previously open for the process. The former suggests it should return 1 if stdout is closed, but the latter suggests that it may never return 1 after stdout is closed.

(If I'm betraying profound ignorance and I should just stop already, you can say, "That's not how it works." and I'll stop. Just trying to be helpful.)

@mhdawson
Copy link
Member

I would be happier with marking the test as flaky until we figure out the right answer.

@sxa
Copy link
Member Author

sxa commented Dec 13, 2016

Bear in mind it's generally completely deterministic on any given machine/environment in terms of which of the two results it gives, so marking it flaky on AIX (thus effectively ignoring the test) might be overkill and devalue its execution

@sam-github
Copy link
Contributor

@Trott I'll talk about UNIX systems programming as long as you want, probably longer ;-). In the man pages you reference, previously and current mean the same thing, effectively. Previous means before open() was called, but current means at the moment open() was called. Since no change occurs between "before" a function is called and "current" to a function being called, they mean the same thing: that open is required to return the lowest numerically valued fd that it can. The text is confusing when compared to each other as you did, because it seems to imply a difference, but so it goes.

The use-lowest available fd behaviour is ancient UNIX behaviour. A system that didn't do that would break the world, whether POSIX speced it or not (though it looks like they did from what you posted, or at least the opengroup did later, when they cleaned up the stuff POSIX argued about).

@Trott
Copy link
Member

Trott commented Dec 13, 2016

@sam-github Yeah, it's all coming back to me now that this issue is more subtle than "AIX behaves differently". I mean, it does, but it's something subtle. Probably not going to discover the source of the issue by comparing man page texts. On the upside, this made me write a quick little C++ program to confirm my understanding and everything you're saying. Hooray for 10 more lines of C++ than I've written the entire rest of the year.

@mhdawson
Copy link
Member

@sxa555 I see your point, my one concern is that its easier to forget once the test has changed. Marked flaky in the status file it is obvious that something needs to be fixed. I'm willing to defer to the others on the thread if they want to chime in on which way is best.

@sam-github the status file in the directory with the tests can be used to mark a test as either:

  1. skip - don't run at all
  2. run - but accept failure as ok. In the the job shows up as yellow instead of read in our CI when a failure occurs.

@gibfahn
Copy link
Member

gibfahn commented Dec 15, 2016

Refs: #8375

@mhdawson
Copy link
Member

@sam-github @Trott if you 2 prefer landing with the option to accept either lets just do that. Can you comment here to confirm that.

@Trott
Copy link
Member

Trott commented Dec 15, 2016

It looks like @jBarz has determined the problem point and that it appears to be a ksh bug (AIX-specific or otherwise). See #10234 (comment)

If a workaround like I propose in #10234 (comment) is feasible, I think I'd prefer that to the change in this PR or skipping the test. The change I propose there has the benefit of still testing the C++ code that opens the fd on /dev/null, which is the point of the test.

@mhdawson
Copy link
Member

@Trott, If running under bash allows the test to pass reliably that sounds good to me.

Copy link
Member

@gibfahn gibfahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion is ongoing in #10234, not approving to make sure no-one lands this by mistake.

@italoacasas italoacasas added the blocked PRs that are blocked by other issues or PRs. label Dec 18, 2016
@jasnell jasnell added the wip Issues and PRs that are still a work in progress. label Dec 27, 2016
@richardlau
Copy link
Member

This was superseded and addressed by #10339

@gibfahn gibfahn closed this Jan 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aix Issues and PRs related to the AIX platform. blocked PRs that are blocked by other issues or PRs. test Issues and PRs related to the tests. wip Issues and PRs that are still a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test-stdio-closed failing on some AIX environments
10 participants