fix: avoid stopping machine upon running env operations in isolation #1797

atsmtat · 2021-05-14T07:25:08Z

get and set current dir operations used to halt the machine by
throwing an exception in isolation mode. This change updates them to
return a dummy NotFound error instead, and keep the machine running.

I started with a custom error using ErrorKind::Other, but since it
can't be mapped to a raw OS error, I dropped it. NotFound kind of make
sense for get operations, but not much for set operations. But that's
the only error supported for windows currently.

RalfJung · 2021-05-14T07:32:33Z

Thanks for the PR!

I recall suggesting something similar in the past, but the concern back then was that we should do this consistently for all operations. So this certainly does not fix #1034, as there are more operations that would need adjustment (getenv/setenv, all the file system operations -- basically everything that checks the communicate flag). We don't necessarily have to do all of them at once, but we should not close #1034 before they are all done.

The other concern was discoverability: when such an operation fails, it can be hard for users to figure out why, since they might not expect the "isolation" Miri provides. I am not sure what is the best solution to this. Maybe keep the errors by default but offer a flag to let evaluation continue instead? I am not sure what your usecase is here so I cannot tell if that would help you or not.

Cc @oli-obk

RalfJung · 2021-05-14T07:33:54Z

Cc @Aaron1011

oli-obk · 2021-05-14T08:55:24Z

Maybe keep the errors by default but offer a flag to let evaluation continue instead? I am not sure what your usecase is here so I cannot tell if that would help you or not.

We could turn the errors into warnings and possibly add a flag to turn them off entirely (which the warning could then mention)

atsmtat · 2021-05-14T20:18:37Z

What to do with the rest of the operations was gonna be my next question based on the feedback 🙂 I'll drop "fixes" line so that issue is not closed.
This PR is not to fix any use case I have, so I'm not really blocked or anything. Having fun learning Rust, and out of interest in compilers, I'm trying to learn about rustc and tools around it. I was digging into miri code, and found this easy issue to start with.

As for the question of discoverability, I'm trying to understand the problem:

miri should never execute system calls in isolation mode, right?
if miri were to continue execution of user program, it would have to either return an error or mimic the success call?
if miri were to return a fake but sensible error for a given op (PermissionDenied for getcwd for example), with explicit message about "isolation" mode, isn't that OK? As user program might actually run into such error while running in the field.
i assume that faking a success by returning a dummy for getter and no-op for setter is out of question, as it's just too evil?
with a fake error, it's possible that user code doesn't take a path which user wants to exercise, and this might go unnoticed by the user. So we use warnings to point this out?
i guess we don't expect users to refactor the code around these ops (handle an error and continue with dummy value for example) in order to run it in isolation?

RalfJung · 2021-05-15T13:00:21Z

miri should never execute system calls in isolation mode, right?

Yes. (Even more strongly, Miri in isolation mode should be completely pure, deterministic, repeatable.)

i assume that faking a success by returning a dummy for getter and no-op for setter is out of question, as it's just too evil?

Yes.

if miri were to return a fake but sensible error for a given op (PermissionDenied for getcwd for example), with explicit message about "isolation" mode, isn't that OK? As user program might actually run into such error while running in the field.

Indeed I think this is a reasonable approach -- but I don't think we can return an "explicit message" to the caller, all we have is an error code. I assume a lot of code out there will unwrap getcwd immediately, or ? it and bubble up the error in a nice way, but either way the program aborts quickly. It will look like the program is just broken under Miri, and the user has no way to know that they just have to pass the flag to make things work.

So I like @oli-obk's proposal of having Miri itself print a warning (we already have infrastructure for that, see NonHaltingDiagnostic in diag.rs and, alternatively, this warning upon spawning a thread). This is not something a "real" failing getcwd would ever do, but it seems okay.

So:

Default behavior (with isolation on): print a warning (only on the first such call, ideally -- sess.warn does that automatically I think), then continue interpretation with a suitable error code.
Isolation off: behavior unchanged compared to now.

At some point we might add a flag to suppress the warning, but that does not have to happen immediately.

RalfJung · 2021-05-15T13:03:13Z

But that's the only error supported for windows currently.

Supported where, in the Miri mapping to raw error codes? We can easily extend that. Which error code might a real Windows emit if the permission to change the current dir was denied?

bors · 2021-05-16T23:48:01Z

☔ The latest upstream changes (presumably #1801) made this pull request unmergeable. Please resolve the merge conflicts.

RalfJung · 2021-05-17T17:08:08Z

@atsmtat I see you did a rebase; I am still setting the "waiting-on-author" flag based on the discussion above. If anything is unclear or if you think the ball is in our court and you need more feedback/review/help, please let me know. :)

atsmtat · 2021-05-18T01:04:02Z

miri should never execute system calls in isolation mode, right?

Yes. (Even more strongly, Miri in isolation mode should be completely pure, deterministic, repeatable.)

i assume that faking a success by returning a dummy for getter and no-op for setter is out of question, as it's just too evil?

Yes.

if miri were to return a fake but sensible error for a given op (PermissionDenied for getcwd for example), with explicit message about "isolation" mode, isn't that OK? As user program might actually run into such error while running in the field.

Indeed I think this is a reasonable approach -- but I don't think we can return an "explicit message" to the caller, all we have is an error code. I assume a lot of code out there will unwrap getcwd immediately, or ? it and bubble up the error in a nice way, but either way the program aborts quickly. It will look like the program is just broken under Miri, and the user has no way to know that they just have to pass the flag to make things work.

So I like @oli-obk's proposal of having Miri itself print a warning (we already have infrastructure for that, see NonHaltingDiagnostic in diag.rs and, alternatively, this warning upon spawning a thread). This is not something a "real" failing getcwd would ever do, but it seems okay.

So:
* Default behavior (with isolation on): print a warning (only on the first such call, ideally -- `sess.warn` does that automatically I think), then continue interpretation with a suitable error code.

Yes, I tried out sess.warn, and it prints warning only on the first call (does session mean a single run of a user program?). One downside of sess.warn is that it doesn't give user much idea about which code/line caused that call. I'm trying to see if I can print more info in the warning.

* Isolation off: behavior unchanged compared to now.
At some point we might add a flag to suppress the warning, but that does not have to happen immediately.

atsmtat · 2021-05-18T01:05:45Z

But that's the only error supported for windows currently.

Supported where, in the Miri mapping to raw error codes? We can easily extend that. Which error code might a real Windows emit if the permission to change the current dir was denied?

Right, Miri mapping to raw error codes. I'll add appropriate Windows error codes.

atsmtat · 2021-05-18T02:06:23Z

What do you think about a warning with a span like this:

warning: `chdir` called in isolation mode returns a dummy error
   --> /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:158:17
    |
158 |     if unsafe { libc::chdir(p.as_ptr()) } != 0 {
    |                 ^^^^^^^^^^^^^^^^^^^^^^^

I got the span from the top frame. Still not very helpful. Good to print user frame which calls into std. Is there a way to get such frame or it doesn't make sense? If you can point out an example where something like this is done, it'll be great.

bjorn3 · 2021-05-18T07:40:56Z

Yes, I tried out sess.warn, and it prints warning only on the first call

Diagnostics are deduplicated, so if the warning message is identical, it will only be printed once.

RalfJung · 2021-05-18T09:37:46Z

Yes, I tried out sess.warn, and it prints warning only on the first call

Yes that was deliberate -- we don't want tons of warnings when a program does such a call in a loop.
However, it looks like you also want to show a span. In that case you'll want to register a NonHaltingDiagnostic with Miri's diagnostic infrastructure (see diagnostic.rs); however, then you'll have to add your own logic to avoid showing the error many times.

atsmtat · 2021-05-19T06:07:13Z

Yes, I tried out sess.warn, and it prints warning only on the first call

Yes that was deliberate -- we don't want tons of warnings when a program does such a call in a loop.
However, it looks like you also want to show a span. In that case you'll want to register a NonHaltingDiagnostic with Miri's diagnostic infrastructure (see diagnostic.rs); however, then you'll have to add your own logic to avoid showing the error many times.

Looks like NonHaltingDiagnostic uses the same diagnostic infra, which dedups the messages. So it shows a single message per line in src. But because it's too verbose to print the backtrace by default, I pushed a change to throw a warning without a span or backtrace. This in fact prints a warning only once per a given op call. I'm considering adding two command line options on top of this -- 1. track dummy errors in isolation, which prints backtraces using NonHaltingDiagnostic 2. ignore dummy errors in isolation, which doesn't print a warning.

I'll push these changes soon. Let me know what you think :)

atsmtat · 2021-05-20T15:23:53Z

@RalfJung I updated the PR with options for user visibility.
For input:

    assert_eq!(env::current_dir().unwrap_err().kind(), ErrorKind::NotFound);
    for _i in 0..3 {
        assert_eq!(env::current_dir().unwrap_err().kind(), ErrorKind::NotFound);
    }

Output of default run (with isolation, without any flags):

warning: `getcwd` in isolation mode produced a dummy error
  |
  = note: run with -Zmiri-track-dummy-op to track with a backtrace
  = note: run with -Zmiri-ignore-dummy-op to ignore warning

Output of run with -Zmiri-track-dummy-op:

note: tracking was triggered
   --> /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:134:17
    |
134 |             if !libc::getcwd(ptr, buf.capacity()).is_null() {
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ produced dummy error for `getcwd`
    |
    = note: inside `std::sys::unix::os::getcwd` at /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:134:17
    = note: inside `std::env::current_dir` at /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/env.rs:47:5
note: inside `main` at tests/run-pass/current_dir_with_isolation.rs:7:16
   --> tests/run-pass/current_dir_with_isolation.rs:7:16
    |
7   |     assert_eq!(env::current_dir().unwrap_err().kind(), ErrorKind::NotFound);
    |                ^^^^^^^^^^^^^^^^^^
(frames omitted for brevity)

note: tracking was triggered
   --> /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:134:17
    |
134 |             if !libc::getcwd(ptr, buf.capacity()).is_null() {
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ produced dummy error for `getcwd`
    |
    = note: inside `std::sys::unix::os::getcwd` at /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:134:17
    = note: inside `std::env::current_dir` at /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/env.rs:47:5
note: inside `main` at tests/run-pass/current_dir_with_isolation.rs:9:20
   --> tests/run-pass/current_dir_with_isolation.rs:9:20
    |
9   |         assert_eq!(env::current_dir().unwrap_err().kind(), ErrorKind::NotFound);
    |                    ^^^^^^^^^^^^^^^^^^

Running with -Zmiri-ignore-dummy-op doesn't produce any messaged related to dummy ops

atsmtat · 2021-05-20T15:26:10Z

I wasn't sure about running rustfmt, so I haven't run it on my changes.
I'll update README.md once you give a green light to the new flags :)

src/bin/miri.rs

RalfJung · 2021-05-22T08:00:25Z

src/diagnostics.rs

@@ -279,6 +280,7 @@ pub trait EvalContextExt<'mir, 'tcx: 'mir>: crate::MiriEvalContextExt<'mir, 'tcx
                    CreatedCallId(id) => format!("function call with id {}", id),
                    CreatedAlloc(AllocId(id)) => format!("created allocation with id {}", id),
                    FreedAlloc(AllocId(id)) => format!("freed allocation with id {}", id),
+                    DummyOpInIsolation(op) => format!("produced dummy error for `{}`", op),


I don't know what you mean by "dummy" error. I suggest to say something more like "{} was made to return an error due to isolation" or so (and print a hint mentioning -Zmiri-disable-isolation).

RalfJung · 2021-05-22T08:01:58Z

I wasn't sure about running rustfmt, so I haven't run it on my changes.

After rebasing onto latest master, running rustfmt should be fine. We just recently made the repository rustfmt-compatible.

Output of default run (with isolation, without any flags):

In terms of deduplication, this looks great. :) I think we should adjust the messages a little. I don't think "note: tracking was triggered" makes sense, I'd more like to see something like "warning: operation rejected by isolation" or so.

atsmtat · 2021-05-23T21:05:32Z

Updated output:
Default:

warning: `getcwd` was made to return an error due to isolation
  |
  = note: run with -Zmiri-isolated-op=warn to get warning with a backtrace
  = note: run with -Zmiri-isolated-op=hide to hide warning
  = note: run with -Zmiri-isolated-op=allow to disable isolation

With warn:

warning: operation rejected by isolation
   --> /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:134:17
    |
134 |             if !libc::getcwd(ptr, buf.capacity()).is_null() {
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `getcwd` was made to return an error due to isolation
    |
    = note: inside `std::sys::unix::os::getcwd` at /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:134:17
    = note: inside `std::env::current_dir` at /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/env.rs:47:5
note: inside `main` at tests/run-pass/current_dir_with_isolation.rs:7:16
   --> tests/run-pass/current_dir_with_isolation.rs:7:16
    |
7   |     assert_eq!(env::current_dir().unwrap_err().kind(), ErrorKind::NotFound);
    |                ^^^^^^^^^^^^^^^^^^
...
warning: operation rejected by isolation
   --> /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:134:17
    |
134 |             if !libc::getcwd(ptr, buf.capacity()).is_null() {
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `getcwd` was made to return an error due to isolation
    |
    = note: inside `std::sys::unix::os::getcwd` at /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/unix/os.rs:134:17
    = note: inside `std::env::current_dir` at /Users/atsmtat/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/env.rs:47:5
note: inside `main` at tests/run-pass/current_dir_with_isolation.rs:9:20
   --> tests/run-pass/current_dir_with_isolation.rs:9:20
    |
9   |         assert_eq!(env::current_dir().unwrap_err().kind(), ErrorKind::NotFound);
    |                    ^^^^^^^^^^^^^^^^^^

atsmtat · 2021-05-23T21:08:32Z

I wasn't sure about running rustfmt, so I haven't run it on my changes.

After rebasing onto latest master, running rustfmt should be fine. We just recently made the repository rustfmt-compatible.

How do I run rustfmt on miri? I couldn't add a component using rustup as it complains about a custom toolchain, miri.

Output of default run (with isolation, without any flags):

In terms of deduplication, this looks great. :) I think we should adjust the messages a little. I don't think "note: tracking was triggered" makes sense, I'd more like to see something like "warning: operation rejected by isolation" or so.

RalfJung · 2021-05-23T21:43:59Z

How do I run rustfmt on miri? I couldn't add a component using rustup as it complains about a custom toolchain, miri.

cargo +nightly fmt should work, if you also have a nightly toolchain installed.

src/bin/miri.rs

src/eval.rs

src/helpers.rs

src/shims/env.rs

tests/run-pass/current_dir_with_isolation.rs

src/bin/miri.rs

src/diagnostics.rs

src/shims/env.rs

tests/run-pass/current_dir_with_isolation.stderr

src/bin/miri.rs

tests/run-pass/current_dir_with_isolation.stderr

bors · 2021-06-03T16:16:37Z

☔ The latest upstream changes (presumably #1776) made this pull request unmergeable. Please resolve the merge conflicts.

atsmtat · 2021-06-05T07:08:26Z

@RalfJung I've addressed your latest comments in recent commits. I believe you'd prefer I squash my branch commits. If so, let me know.

RalfJung · 2021-06-05T12:01:10Z

LGTM, assuming CI passes and pending the rename of "ignore". :)

RalfJung · 2021-06-05T12:02:13Z

Ah, and yes please squash the commits.

atsmtat · 2021-06-07T16:52:38Z

I renamed "ignore" and squashed the commits. Waiting for your approval to run CI.

RalfJung · 2021-06-08T17:52:33Z

src/shims/env.rs

-        this.check_no_isolation("`getcwd`")?;
+        if let IsolatedOp::Reject(reject_with) = this.machine.isolated_op {
+            this.reject_in_isolation("getcwd", reject_with)?;
+            let err = Error::new(ErrorKind::NotFound, "rejected due to isolation");


Ah, I forgot about one discussion we had: didn't we say this should be PermissionDenied? And some match for windows error conversions should be extended?

If you don't want to do this now, please leave FIXME in the code to change the error type.

Also we should probably change set_last_error_from_io_error to take an error kind so that we don't have to make up a string here. But that, too, could be a separate PR if you prefer.

Sorry I forgot about these things in my last round of review.

so that we don't have to make up a string here.

I believe you could also use Error::from(ErrorKind::...) or ErrorKind::....into() to avoid the string:
https://doc.rust-lang.org/nightly/std/io/struct.Error.html#impl-From%3CErrorKind%3E

But is there even any reason that set_last_error_from_io_error would take more than an ErrorKind?

I specifically used Error::new as it allows custom payload for error not originated from OS. Quoting its documentation -- This function is used to generically create I/O errors which do not originate from the OS itself.. Shouldn't we use a custom message indicating that the problem is due to isolation so that user doesn't confuse it with an actual OS error? Or the intention is to make the distinction not clear?

As for which code to generate, yes, I mentioned using PermissionDenied earlier. But then, if it's a bogus error, does it matter?

For Windows code, I didn't want to mix it with this PR which is mainly about adding this new setup. I'll create a new PR for it. Also, we still have quite a few shims using old "always abort" setup, which I was planning to convert in next PR(s). Otherwise this new flag isn't very useful 🙂

But is there even any reason that set_last_error_from_io_error would take more than an ErrorKind?

No, I don't think so. I'm just pointing out an option if that function is not going to be changed as part of this PR.

Shouldn't we use a custom message indicating that the problem is due to isolation so that user doesn't confuse it with an actual OS error?

The set_last_error_from_io_error() function only uses e.kind(), so I think the message will be discarded anyway.

Ah, I didn't see that. Thanks for pointing out. I guess I can change it to take ErrorKind. @RalfJung Do you prefer I do it in this PR?

Do you prefer I do it in this PR?

Yeah I think that would be better. Thanks. :)

Fixed the parameter in a new commit, which I didn't squash as it's independent from the first commit.

Thanks a lot. :)

I think we probably want to change to a different error code, but we can do that later.

In user interface, added a new flag `-Zmiri-isolation-error` which takes one of the four values -- hide, warn, warn-nobacktrace, and abort. This option can be used to configure Miri to either abort or return an error code upon executing isolated op. If not aborted, Miri prints a warning, whose verbosity can be configured using this flag. In implementation, added a new enum `IsolatedOp` to capture all the settings related to ops requiring communication with the host. Old `communicate` flag in both miri configs and machine stats is replaced with a new helper function `communicate()` which checks `isolated_op` internally. Added a new helper function `reject_in_isolation` which can be called by shims to reject ops according to the reject_with settings. Use miri specific diagnostics function `report_msg` to print backtrace in the warning. Update it to take an enum value instead of a bool, indicating the level of diagnostics. Updated shims related to current dir to use the new APIs. Added a new test for current dir ops in isolation without halting machine.

`set_last_error_from_io_error` works with only the error kind, and discards the payload. Fix its signature to make it explicit.

RalfJung · 2021-06-09T15:30:46Z

Thanks a lot @atsmtat for working with me through these many rounds of review. :-)
@bors r+

bors · 2021-06-09T15:30:47Z

📌 Commit ba64f48 has been approved by RalfJung

bors · 2021-06-09T15:30:55Z

⌛ Testing commit ba64f48 with merge 31c1afa...

bors · 2021-06-09T15:48:25Z

☀️ Test successful - checks-actions
Approved by: RalfJung
Pushing 31c1afa to master...

atsmtat · 2021-06-09T16:07:43Z

Thanks a lot @atsmtat for working with me through these many rounds of review. :-)

No problem at all. Thanks for the reviews!

@bors r+

isolated operations return EPERM; tweak isolation hint Follow-up to #1797

atsmtat force-pushed the env-isolation branch from 827c42e to a583be7 Compare May 17, 2021 15:02

RalfJung added the S-waiting-on-author Status: Waiting for the PR author to address review comments label May 17, 2021

atsmtat force-pushed the env-isolation branch from a583be7 to a7a6627 Compare May 18, 2021 00:42

RalfJung reviewed May 22, 2021

View reviewed changes

src/bin/miri.rs Outdated Show resolved Hide resolved

RalfJung reviewed May 22, 2021

View reviewed changes

atsmtat force-pushed the env-isolation branch from 9c7a637 to 6bd7556 Compare May 23, 2021 17:15

RalfJung reviewed May 24, 2021

View reviewed changes

src/bin/miri.rs Outdated Show resolved Hide resolved

RalfJung reviewed May 24, 2021

View reviewed changes

src/bin/miri.rs Outdated Show resolved Hide resolved

src/eval.rs Outdated Show resolved Hide resolved

src/helpers.rs Outdated Show resolved Hide resolved

src/shims/env.rs Outdated Show resolved Hide resolved

tests/run-pass/current_dir_with_isolation.rs Outdated Show resolved Hide resolved

RalfJung reviewed May 25, 2021

View reviewed changes

src/bin/miri.rs Show resolved Hide resolved

atsmtat force-pushed the env-isolation branch from 18e628e to fed2cc3 Compare May 26, 2021 16:07