Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: avoid stopping machine upon running env operations in isolation #1797

Merged
merged 2 commits into from
Jun 9, 2021

Conversation

atsmtat
Copy link
Contributor

@atsmtat atsmtat commented May 14, 2021

get and set current dir operations used to halt the machine by
throwing an exception in isolation mode. This change updates them to
return a dummy NotFound error instead, and keep the machine running.

I started with a custom error using ErrorKind::Other, but since it
can't be mapped to a raw OS error, I dropped it. NotFound kind of make
sense for get operations, but not much for set operations. But that's
the only error supported for windows currently.

@RalfJung
Copy link
Member

RalfJung commented May 14, 2021

Thanks for the PR!

I recall suggesting something similar in the past, but the concern back then was that we should do this consistently for all operations. So this certainly does not fix #1034, as there are more operations that would need adjustment (getenv/setenv, all the file system operations -- basically everything that checks the communicate flag). We don't necessarily have to do all of them at once, but we should not close #1034 before they are all done.

The other concern was discoverability: when such an operation fails, it can be hard for users to figure out why, since they might not expect the "isolation" Miri provides. I am not sure what is the best solution to this. Maybe keep the errors by default but offer a flag to let evaluation continue instead? I am not sure what your usecase is here so I cannot tell if that would help you or not.

Cc @oli-obk

@RalfJung
Copy link
Member

Cc @Aaron1011

@oli-obk
Copy link
Contributor

oli-obk commented May 14, 2021

Maybe keep the errors by default but offer a flag to let evaluation continue instead? I am not sure what your usecase is here so I cannot tell if that would help you or not.

We could turn the errors into warnings and possibly add a flag to turn them off entirely (which the warning could then mention)

@atsmtat
Copy link
Contributor Author

atsmtat commented May 14, 2021

What to do with the rest of the operations was gonna be my next question based on the feedback 🙂 I'll drop "fixes" line so that issue is not closed.
This PR is not to fix any use case I have, so I'm not really blocked or anything. Having fun learning Rust, and out of interest in compilers, I'm trying to learn about rustc and tools around it. I was digging into miri code, and found this easy issue to start with.

As for the question of discoverability, I'm trying to understand the problem:

  • miri should never execute system calls in isolation mode, right?
  • if miri were to continue execution of user program, it would have to either return an error or mimic the success call?
  • if miri were to return a fake but sensible error for a given op (PermissionDenied for getcwd for example), with explicit message about "isolation" mode, isn't that OK? As user program might actually run into such error while running in the field.
  • i assume that faking a success by returning a dummy for getter and no-op for setter is out of question, as it's just too evil?
  • with a fake error, it's possible that user code doesn't take a path which user wants to exercise, and this might go unnoticed by the user. So we use warnings to point this out?
  • i guess we don't expect users to refactor the code around these ops (handle an error and continue with dummy value for example) in order to run it in isolation?

@RalfJung
Copy link
Member

RalfJung commented May 15, 2021

miri should never execute system calls in isolation mode, right?

Yes. (Even more strongly, Miri in isolation mode should be completely pure, deterministic, repeatable.)

i assume that faking a success by returning a dummy for getter and no-op for setter is out of question, as it's just too evil?

Yes.

if miri were to return a fake but sensible error for a given op (PermissionDenied for getcwd for example), with explicit message about "isolation" mode, isn't that OK? As user program might actually run into such error while running in the field.

Indeed I think this is a reasonable approach -- but I don't think we can return an "explicit message" to the caller, all we have is an error code. I assume a lot of code out there will unwrap getcwd immediately, or ? it and bubble up the error in a nice way, but either way the program aborts quickly. It will look like the program is just broken under Miri, and the user has no way to know that they just have to pass the flag to make things work.

So I like @oli-obk's proposal of having Miri itself print a warning (we already have infrastructure for that, see NonHaltingDiagnostic in diag.rs and, alternatively, this warning upon spawning a thread). This is not something a "real" failing getcwd would ever do, but it seems okay.

So:

  • Default behavior (with isolation on): print a warning (only on the first such call, ideally -- sess.warn does that automatically I think), then continue interpretation with a suitable error code.
  • Isolation off: behavior unchanged compared to now.

At some point we might add a flag to suppress the warning, but that does not have to happen immediately.

@RalfJung
Copy link
Member

RalfJung commented May 15, 2021

But that's the only error supported for windows currently.

Supported where, in the Miri mapping to raw error codes? We can easily extend that. Which error code might a real Windows emit if the permission to change the current dir was denied?

@bors
Copy link
Collaborator

bors commented May 16, 2021

☔ The latest upstream changes (presumably #1801) made this pull request unmergeable. Please resolve the merge conflicts.

@RalfJung RalfJung added the S-waiting-on-author Status: Waiting for the PR author to address review comments label May 17, 2021
@RalfJung
Copy link
Member

@atsmtat I see you did a rebase; I am still setting the "waiting-on-author" flag based on the discussion above. If anything is unclear or if you think the ball is in our court and you need more feedback/review/help, please let me know. :)

@atsmtat
Copy link
Contributor Author

atsmtat commented May 18, 2021

miri should never execute system calls in isolation mode, right?

Yes. (Even more strongly, Miri in isolation mode should be completely pure, deterministic, repeatable.)

i assume that faking a success by returning a dummy for getter and no-op for setter is out of question, as it's just too evil?

Yes.

if miri were to return a fake but sensible error for a given op (PermissionDenied for getcwd for example), with explicit message about "isolation" mode, isn't that OK? As user program might actually run into such error while running in the field.

Indeed I think this is a reasonable approach -- but I don't think we can return an "explicit message" to the caller, all we have is an error code. I assume a lot of code out there will unwrap getcwd immediately, or ? it and bubble up the error in a nice way, but either way the program aborts quickly. It will look like the program is just broken under Miri, and the user has no way to know that they just have to pass the flag to make things work.

So I like @oli-obk's proposal of having Miri itself print a warning (we already have infrastructure for that, see NonHaltingDiagnostic in diag.rs and, alternatively, this warning upon spawning a thread). This is not something a "real" failing getcwd would ever do, but it seems okay.

So:

* Default behavior (with isolation on): print a warning (only on the first such call, ideally -- `sess.warn` does that automatically I think), then continue interpretation with a suitable error code.

Yes, I tried out sess.warn, and it prints warning only on the first call (does session mean a single run of a user program?). One downside of sess.warn is that it doesn't give user much idea about which code/line caused that call. I'm trying to see if I can print more info in the warning.

* Isolation off: behavior unchanged compared to now.

At some point we might add a flag to suppress the warning, but that does not have to happen immediately.

@atsmtat
Copy link
Contributor Author

atsmtat commented May 18, 2021

But that's the only error supported for windows currently.

Supported where, in the Miri mapping to raw error codes? We can easily extend that. Which error code might a real Windows emit if the permission to change the current dir was denied?

Right, Miri mapping to raw error codes. I'll add appropriate Windows error codes.

@atsmtat
Copy link
Contributor Author

atsmtat commented May 18, 2021

What do you think about a warning with a span like this:

warning: `chdir` called in isolation mode returns a dummy error
   --> /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:158:17
    |
158 |     if unsafe { libc::chdir(p.as_ptr()) } != 0 {
    |                 ^^^^^^^^^^^^^^^^^^^^^^^

I got the span from the top frame. Still not very helpful. Good to print user frame which calls into std. Is there a way to get such frame or it doesn't make sense? If you can point out an example where something like this is done, it'll be great.

@bjorn3
Copy link
Member

bjorn3 commented May 18, 2021

Yes, I tried out sess.warn, and it prints warning only on the first call

Diagnostics are deduplicated, so if the warning message is identical, it will only be printed once.

@RalfJung
Copy link
Member

Yes, I tried out sess.warn, and it prints warning only on the first call

Yes that was deliberate -- we don't want tons of warnings when a program does such a call in a loop.
However, it looks like you also want to show a span. In that case you'll want to register a NonHaltingDiagnostic with Miri's diagnostic infrastructure (see diagnostic.rs); however, then you'll have to add your own logic to avoid showing the error many times.

@atsmtat
Copy link
Contributor Author

atsmtat commented May 19, 2021

Yes, I tried out sess.warn, and it prints warning only on the first call

Yes that was deliberate -- we don't want tons of warnings when a program does such a call in a loop.
However, it looks like you also want to show a span. In that case you'll want to register a NonHaltingDiagnostic with Miri's diagnostic infrastructure (see diagnostic.rs); however, then you'll have to add your own logic to avoid showing the error many times.

Looks like NonHaltingDiagnostic uses the same diagnostic infra, which dedups the messages. So it shows a single message per line in src. But because it's too verbose to print the backtrace by default, I pushed a change to throw a warning without a span or backtrace. This in fact prints a warning only once per a given op call. I'm considering adding two command line options on top of this -- 1. track dummy errors in isolation, which prints backtraces using NonHaltingDiagnostic 2. ignore dummy errors in isolation, which doesn't print a warning.

I'll push these changes soon. Let me know what you think :)

@atsmtat
Copy link
Contributor Author

atsmtat commented May 20, 2021

@RalfJung I updated the PR with options for user visibility.
For input:

    assert_eq!(env::current_dir().unwrap_err().kind(), ErrorKind::NotFound);
    for _i in 0..3 {
        assert_eq!(env::current_dir().unwrap_err().kind(), ErrorKind::NotFound);
    }

Output of default run (with isolation, without any flags):

warning: `getcwd` in isolation mode produced a dummy error
  |
  = note: run with -Zmiri-track-dummy-op to track with a backtrace
  = note: run with -Zmiri-ignore-dummy-op to ignore warning

Output of run with -Zmiri-track-dummy-op:

note: tracking was triggered
   --> /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:134:17
    |
134 |             if !libc::getcwd(ptr, buf.capacity()).is_null() {
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ produced dummy error for `getcwd`
    |
    = note: inside `std::sys::unix::os::getcwd` at /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:134:17
    = note: inside `std::env::current_dir` at /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/env.rs:47:5
note: inside `main` at tests/run-pass/current_dir_with_isolation.rs:7:16
   --> tests/run-pass/current_dir_with_isolation.rs:7:16
    |
7   |     assert_eq!(env::current_dir().unwrap_err().kind(), ErrorKind::NotFound);
    |                ^^^^^^^^^^^^^^^^^^
(frames omitted for brevity)

note: tracking was triggered
   --> /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:134:17
    |
134 |             if !libc::getcwd(ptr, buf.capacity()).is_null() {
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ produced dummy error for `getcwd`
    |
    = note: inside `std::sys::unix::os::getcwd` at /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:134:17
    = note: inside `std::env::current_dir` at /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/env.rs:47:5
note: inside `main` at tests/run-pass/current_dir_with_isolation.rs:9:20
   --> tests/run-pass/current_dir_with_isolation.rs:9:20
    |
9   |         assert_eq!(env::current_dir().unwrap_err().kind(), ErrorKind::NotFound);
    |                    ^^^^^^^^^^^^^^^^^^

Running with -Zmiri-ignore-dummy-op doesn't produce any messaged related to dummy ops

@atsmtat
Copy link
Contributor Author

atsmtat commented May 20, 2021

I wasn't sure about running rustfmt, so I haven't run it on my changes.
I'll update README.md once you give a green light to the new flags :)

src/bin/miri.rs Outdated Show resolved Hide resolved
@@ -279,6 +280,7 @@ pub trait EvalContextExt<'mir, 'tcx: 'mir>: crate::MiriEvalContextExt<'mir, 'tcx
CreatedCallId(id) => format!("function call with id {}", id),
CreatedAlloc(AllocId(id)) => format!("created allocation with id {}", id),
FreedAlloc(AllocId(id)) => format!("freed allocation with id {}", id),
DummyOpInIsolation(op) => format!("produced dummy error for `{}`", op),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what you mean by "dummy" error. I suggest to say something more like "{} was made to return an error due to isolation" or so (and print a hint mentioning -Zmiri-disable-isolation).

@RalfJung
Copy link
Member

RalfJung commented May 22, 2021

I wasn't sure about running rustfmt, so I haven't run it on my changes.

After rebasing onto latest master, running rustfmt should be fine. We just recently made the repository rustfmt-compatible.

Output of default run (with isolation, without any flags):

In terms of deduplication, this looks great. :) I think we should adjust the messages a little. I don't think "note: tracking was triggered" makes sense, I'd more like to see something like "warning: operation rejected by isolation" or so.

@atsmtat
Copy link
Contributor Author

atsmtat commented May 23, 2021

Updated output:
Default:

warning: `getcwd` was made to return an error due to isolation
  |
  = note: run with -Zmiri-isolated-op=warn to get warning with a backtrace
  = note: run with -Zmiri-isolated-op=hide to hide warning
  = note: run with -Zmiri-isolated-op=allow to disable isolation

With warn:

warning: operation rejected by isolation
   --> /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:134:17
    |
134 |             if !libc::getcwd(ptr, buf.capacity()).is_null() {
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `getcwd` was made to return an error due to isolation
    |
    = note: inside `std::sys::unix::os::getcwd` at /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:134:17
    = note: inside `std::env::current_dir` at /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/env.rs:47:5
note: inside `main` at tests/run-pass/current_dir_with_isolation.rs:7:16
   --> tests/run-pass/current_dir_with_isolation.rs:7:16
    |
7   |     assert_eq!(env::current_dir().unwrap_err().kind(), ErrorKind::NotFound);
    |                ^^^^^^^^^^^^^^^^^^
...
warning: operation rejected by isolation
   --> /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:134:17
    |
134 |             if !libc::getcwd(ptr, buf.capacity()).is_null() {
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `getcwd` was made to return an error due to isolation
    |
    = note: inside `std::sys::unix::os::getcwd` at /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:134:17
    = note: inside `std::env::current_dir` at /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/env.rs:47:5
note: inside `main` at tests/run-pass/current_dir_with_isolation.rs:9:20
   --> tests/run-pass/current_dir_with_isolation.rs:9:20
    |
9   |         assert_eq!(env::current_dir().unwrap_err().kind(), ErrorKind::NotFound);
    |                    ^^^^^^^^^^^^^^^^^^

@atsmtat
Copy link
Contributor Author

atsmtat commented May 23, 2021

I wasn't sure about running rustfmt, so I haven't run it on my changes.

After rebasing onto latest master, running rustfmt should be fine. We just recently made the repository rustfmt-compatible.

How do I run rustfmt on miri? I couldn't add a component using rustup as it complains about a custom toolchain, miri.

Output of default run (with isolation, without any flags):

In terms of deduplication, this looks great. :) I think we should adjust the messages a little. I don't think "note: tracking was triggered" makes sense, I'd more like to see something like "warning: operation rejected by isolation" or so.

@RalfJung
Copy link
Member

How do I run rustfmt on miri? I couldn't add a component using rustup as it complains about a custom toolchain, miri.

cargo +nightly fmt should work, if you also have a nightly toolchain installed.

src/bin/miri.rs Outdated Show resolved Hide resolved
src/bin/miri.rs Outdated Show resolved Hide resolved
src/eval.rs Outdated Show resolved Hide resolved
src/helpers.rs Outdated Show resolved Hide resolved
src/shims/env.rs Outdated Show resolved Hide resolved
tests/run-pass/current_dir_with_isolation.rs Outdated Show resolved Hide resolved
src/diagnostics.rs Outdated Show resolved Hide resolved
src/shims/env.rs Outdated Show resolved Hide resolved
src/shims/env.rs Outdated Show resolved Hide resolved
src/shims/env.rs Outdated Show resolved Hide resolved
src/shims/env.rs Outdated Show resolved Hide resolved
src/bin/miri.rs Outdated Show resolved Hide resolved
@bors
Copy link
Collaborator

bors commented Jun 3, 2021

☔ The latest upstream changes (presumably #1776) made this pull request unmergeable. Please resolve the merge conflicts.

@atsmtat
Copy link
Contributor Author

atsmtat commented Jun 5, 2021

@RalfJung I've addressed your latest comments in recent commits. I believe you'd prefer I squash my branch commits. If so, let me know.

@RalfJung
Copy link
Member

RalfJung commented Jun 5, 2021

LGTM, assuming CI passes and pending the rename of "ignore". :)

@RalfJung
Copy link
Member

RalfJung commented Jun 5, 2021

Ah, and yes please squash the commits.

@atsmtat
Copy link
Contributor Author

atsmtat commented Jun 7, 2021

I renamed "ignore" and squashed the commits. Waiting for your approval to run CI.

src/shims/env.rs Outdated
this.check_no_isolation("`getcwd`")?;
if let IsolatedOp::Reject(reject_with) = this.machine.isolated_op {
this.reject_in_isolation("getcwd", reject_with)?;
let err = Error::new(ErrorKind::NotFound, "rejected due to isolation");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I forgot about one discussion we had: didn't we say this should be PermissionDenied? And some match for windows error conversions should be extended?

If you don't want to do this now, please leave FIXME in the code to change the error type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we should probably change set_last_error_from_io_error to take an error kind so that we don't have to make up a string here. But that, too, could be a separate PR if you prefer.

Sorry I forgot about these things in my last round of review.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so that we don't have to make up a string here.

I believe you could also use Error::from(ErrorKind::...) or ErrorKind::....into() to avoid the string:
https://doc.rust-lang.org/nightly/std/io/struct.Error.html#impl-From%3CErrorKind%3E

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But is there even any reason that set_last_error_from_io_error would take more than an ErrorKind?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I specifically used Error::new as it allows custom payload for error not originated from OS. Quoting its documentation -- This function is used to generically create I/O errors which do not originate from the OS itself.. Shouldn't we use a custom message indicating that the problem is due to isolation so that user doesn't confuse it with an actual OS error? Or the intention is to make the distinction not clear?

As for which code to generate, yes, I mentioned using PermissionDenied earlier. But then, if it's a bogus error, does it matter?

For Windows code, I didn't want to mix it with this PR which is mainly about adding this new setup. I'll create a new PR for it. Also, we still have quite a few shims using old "always abort" setup, which I was planning to convert in next PR(s). Otherwise this new flag isn't very useful 🙂

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But is there even any reason that set_last_error_from_io_error would take more than an ErrorKind?

No, I don't think so. I'm just pointing out an option if that function is not going to be changed as part of this PR.

Shouldn't we use a custom message indicating that the problem is due to isolation so that user doesn't confuse it with an actual OS error?

The set_last_error_from_io_error() function only uses e.kind(), so I think the message will be discarded anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't see that. Thanks for pointing out. I guess I can change it to take ErrorKind. @RalfJung Do you prefer I do it in this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you prefer I do it in this PR?

Yeah I think that would be better. Thanks. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the parameter in a new commit, which I didn't squash as it's independent from the first commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot. :)

I think we probably want to change to a different error code, but we can do that later.

atsmtat and others added 2 commits June 9, 2021 05:50
In user interface, added a new flag `-Zmiri-isolation-error` which
takes one of the four values -- hide, warn, warn-nobacktrace, and
abort. This option can be used to configure Miri to either abort or
return an error code upon executing isolated op. If not aborted, Miri
prints a warning, whose verbosity can be configured using this flag.

In implementation, added a new enum `IsolatedOp` to capture all the
settings related to ops requiring communication with the
host. Old `communicate` flag in both miri configs and machine
stats is replaced with a new helper function `communicate()` which
checks `isolated_op` internally.

Added a new helper function `reject_in_isolation` which can be called
by shims to reject ops according to the reject_with settings. Use miri
specific diagnostics function `report_msg` to print backtrace in the
warning. Update it to take an enum value instead of a bool, indicating
the level of diagnostics.

Updated shims related to current dir to use the new APIs. Added a new
test for current dir ops in isolation without halting machine.
`set_last_error_from_io_error` works with only the error kind, and
discards the payload. Fix its signature to make it explicit.
@RalfJung
Copy link
Member

RalfJung commented Jun 9, 2021

Thanks a lot @atsmtat for working with me through these many rounds of review. :-)
@bors r+

@bors
Copy link
Collaborator

bors commented Jun 9, 2021

📌 Commit ba64f48 has been approved by RalfJung

@bors
Copy link
Collaborator

bors commented Jun 9, 2021

⌛ Testing commit ba64f48 with merge 31c1afa...

@bors
Copy link
Collaborator

bors commented Jun 9, 2021

☀️ Test successful - checks-actions
Approved by: RalfJung
Pushing 31c1afa to master...

@bors bors merged commit 31c1afa into rust-lang:master Jun 9, 2021
@atsmtat
Copy link
Contributor Author

atsmtat commented Jun 9, 2021

Thanks a lot @atsmtat for working with me through these many rounds of review. :-)

No problem at all. Thanks for the reviews!

@bors r+

bors added a commit that referenced this pull request Jun 9, 2021
isolated operations return EPERM; tweak isolation hint

Follow-up to #1797
@atsmtat atsmtat deleted the env-isolation branch June 11, 2021 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: Waiting for the PR author to address review comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants