Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic tempfile API to stdlib #420

Open
NobodyXu opened this issue Jul 29, 2024 · 12 comments
Open

Add basic tempfile API to stdlib #420

NobodyXu opened this issue Jul 29, 2024 · 12 comments
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@NobodyXu
Copy link

NobodyXu commented Jul 29, 2024

Proposal

Problem statement

Tempfile is a widely used functionality, with the previous breaking change of the tempfile crate being about 7 years ago, I believe its API is stable enough that it should be part of the stdlib.

Motivating examples or use cases

The crate tempfile has 168,118,646 downloads and is used very often in the ecosystem, I believe it is quite a fundamental functionality that the ecosystem depends on.

Having it in stdlib would make it

  • easier to access
  • more trusted than third-party dependencies, one less crate to audit on
  • faster to compile for low-level crates depending on it, which might block compilation of other crates

Solution sketch

mod fs {
    impl File {
        /// Create a new temporary file.
        /// The new file could be in memory or on disk, there's no guarantee about it.
        ///
        /// The temporary file will be automatically removed by the OS when the last handle to it is closed.
        /// This doesn’t rely on Rust destructors being run, so will (almost) never fail to clean up the temporary file.
        ///
        /// # Security
        /// NOTE that other processes might be able to get access to your tempfile (via ptrace, procfs, kernel bug, etc).
        /// It does not provide any security guarantee on that.
        pub fn tempfile() -> Result<File>;

        /// Create a new temporary file under the path `dir`, the file will be created on the filesystem
        /// `dir` is in.
        ///
        /// # Security
        /// NOTE that other processes might be able to get access to your tempfile (via ptrace, procfs, kernel bug, etc).
        /// It does not provide any security guarantee on that.
        pub fn tempfile_in<P: AsRef<Path>>(dir: P) -> Result<File>;

        /// Create a hardlink to `path`.
        /// If `self` is a tempfile, then it will be visible at `path`.
        ///
        /// NOTE that if `self` is a tempfile, then this operation can fail with `io::ErrorKind::Unsupported`.
        pub fn hardlink<P: AsRef<Path>>(&self, path: P) -> Result<()>;
    }

    impl OpenOptions {
        /// Open a tempfile under `dir`, the `File` returned will be writable even if `.write(true)` is not called.
        /// If `create_new` is set to `false`, then you can try using [`File::hardlink`] to persist the [`File`].
        ///
        /// NOTE that if `self` is a tempfile, then this operation can fail with `io::ErrorKind::Unsupported`.
        pub fn open_tempfile<P: AsRef<Path>>(&self, dir: P) -> Result<File>;
    }
}

mod path {
    pub struct PersistError<O: TempObject> {
        pub error: Error,
        pub temp_path: TempPath<O>,
    }
    impl<O> std::error::Error for PersistError<O> {}
    impl<O> From<PersistError<O>> for Error {}
    impl<O> From<PersistError<O>> for TempPath<O> {}

    pub trait TempObject {
        /// [`io::ErrorKind::AlreadyExists`] should be returned if the `path` already has an object.
        fn create(path: &Path) -> Result<Self>;
        fn keep(&self, path: &Path) -> Result<()>;
        fn persist(&self, new_path: &Path) -> Result<()>;
        fn remove(&mut self, path: &Path) -> Result<()>;
    }
    impl TempObject for File {}
    impl TempObject for DirEntry {}

    #[derive(Debug)]
    pub struct TempPath<O: TempObject> {
        path: PathBuf, 
        object: O,
    }
    impl<O: TempObject> Drop for TempPath<O>  {}
    impl<O: TempObject> TempPath<O>  {
        pub fn new() -> Result<Self>;
        pub fn new_in<P: AsRef<Path>>(p: P) -> Result<Self>;

        pub fn with_prefix<S: AsRef<OsStr>>(prefix: S) -> Result<Self>;
        pub fn with_prefix_in<S: AsRef<OsStr>, P: AsRef<Path>>(
            prefix: S,
            dir: P
        ) -> Result<Self>;

        pub fn path(&self) -> &Path;

        pub fn close(self) -> Result<()>;

        /// Persist temp path with name `new_path`, if `overwrite` is true then the existing object at `new_path`
        /// will be overwritten.
        pub fn persist<P: AsRef<Path>>(
            self,
           new_path: P,
           overwrite: bool,
        ) -> Result<O, PersistError<O>>;

        /// This function could fail since we need to mark the file as non-temporary on some platforms (e.g. windows)
        pub fn keep(self) -> Result<(PathBuf, O), PersistError<O>>;
    }
    impl<O: TempObject> Deref for TempPath<O>  {}
    impl<O: TempObject> DerefMut for TempPath<O>  {}

    pub struct Builder;

    impl<'a, 'b> Builder<'a, 'b> {
        pub fn new() -> Self;

        pub fn prefix<S: AsRef<OsStr> + ?Sized>(&mut self, prefix: &'a S) -> &mut Self;
        pub fn suffix<S: AsRef<OsStr> + ?Sized>(&mut self, suffix: &'b S) -> &mut Self;
        pub fn rand_bytes(&mut self, rand: usize) -> &mut Self;

        /// [`io::ErrorKind::AlreadyExists`] should be returned if the `path` already has an object.
        pub fn make<F, O>(&self, f: F) -> Result<TempPath<O>>
        where
            F: FnMut(&Path) -> Result<O>,
            O: TempObject;

        /// [`io::ErrorKind::AlreadyExists`] should be returned if the `path` already has an object.
        pub fn make_in<F, O, P>(&self, dir: P, f: F) -> Result<TempPath<O>>
        where
            F: FnMut(&Path) -> Result<O>,
            P: AsRef<Path>,
            O: TempObject;
    }
}

Once we have the basics done, we can add it more methods from tempfile, i.e. the named tempfile, the temp dir, etc.

Links and related work

tempfile, the existing popular implementation of tempfile.

@NobodyXu NobodyXu added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Jul 29, 2024
@kennytm
Copy link
Member

kennytm commented Jul 29, 2024

The crate tempfile has 168,118,646 downloads and is used very often in the ecosystem, I believe it is quite a fundamental functionality that the ecosystem depends on.

Yes the tempfile package is very popular, but I'm not sure if the two functions tempfile() and tempfile_in() are the right "fundamental functionality that the ecosystem depends on". In my experience I've only use tempfile::{Builder, TempDir} because most of the time we do need the Path to interact with external interfaces that expected a file. You may get a feel of how the tempfile APIs are used with https://github.com/search?q=language%3ARust+%2F%5Cbuse+tempfile%5Cb%2F&type=code.

Also the OS-managed tempfile()/SpooledTempFile and Rust-managed NamedTempFile/TempDir are implemented totally differently, so one can't say uplifting tempfile() into std help the case of NamedTempFile/TempDir either.

In any case if the team feels that tempfile() is really the fundamental one I wish it is available as std::fs::File::temporary() (or File::temp()) instead.

@NobodyXu
Copy link
Author

Yes the tempfile package is very popular, but I'm not sure if the two functions tempfile() and tempfile_in() are the right "fundamental functionality that the ecosystem depends on".

I'd want to do this incrementally, by first adding the most simple APIs.

I wish it is available as std::fs::File::temporary() (or File::temp()) instead.

Thanks, that is indeed better!

@NobodyXu NobodyXu changed the title Add tempfile API to stdlib Add basic tempfile API to stdlib Jul 30, 2024
@NobodyXu
Copy link
Author

cc @kennytm I just added the API you wanted for named tempfile/tempdir.

I think the core of it is TempPath, so I created a version of it that is generic and can serve file/dir or anything user implemented.

@programmerjake
Copy link
Member

        /// [`io::ErrorKind::AlreadyExists`] should be returned if the `path` already has an object.
        pub fn make<F, O>(&self, f: F) -> Result<TempPath<O>>
        where
            F: FnMut(&Path) -> Result<R>,
            O: TempObject;

where's R defined, in F: FnMut(&Path) -> Result<R>,?
same for make_in

@NobodyXu
Copy link
Author

Thanks @programmerjake updated, it should be O.

@lolbinarycat
Copy link

also note that os-managed tempfiles may not be available on every platform, while rust-managed tempfiles can work on any platform where files can be deleted.

@scottmcm
Copy link
Member

scottmcm commented Aug 8, 2024

Having it in stdlib would make it

  • easier to access

  • more trusted than third-party dependencies, one less crate to audit on

  • faster to compile for low-level crates depending on it, which might block compilation of other crates

Note that all of these are true of literally every piece of code ever, which makes them weak motivation for this change since it's not true that all code should be in the standard library.

What makes it important that this specifically be in std, vs being just a well-known community crate (like how regex is)?

@NobodyXu
Copy link
Author

NobodyXu commented Aug 10, 2024

What makes it important that this specifically be in std, vs being just a well-known community crate (like how regex is)?

@scottmcm tl;dr tempfile API is a missing pieces in stdlib

  • it has a stable interface, the API for tempfile is unlikely to change, even if new syscalls are added to the OSes or some convention has been changed, whereas regex API could still change due to its sheer complexity and requirements to be compatible with other regex implementation, i.e. new regex syntax favor
  • it currently has one popular implementation, other implementation, such as the implementation in C, is unlikely to diverge too much, the basics is the same, just how random filename is generated is different, where as regex has two different implementations based on compile-speed requirements (regex-lite vs regex)

In additional to that:

  • The APIs of tempfile can fit in the existing std::fs nicely, one could say that this is actually an API missing in stdlib, i.e. we provide file/fs abstraction and a lot of convenient methods (io::copy, create_dir_all, remove_dir_all, fs path parsing, std::env::tempdir), but stdlib is missing an API for tempfile/tempdir, @scottmcm I'd argue that it's actually a missing API in stdlib
  • Having such API in stdlib would make it easier to write new binary from scratch with smoother experience, currently tempfile not in stdlib, you'd have to add them to dependencies for a very basic, very fundamental functionality that you'd assume it's in stdlib

@NobodyXu
Copy link
Author

NobodyXu commented Aug 10, 2024

@scottmcm Let's think about this the other way around:

Why tempfile should not be part of stdlib?

If we are developing a new PL, if there any reason tempfile should not be in stdlib?

It is a quite fundamental functionality, abstracts over platform details, and has a stable API, and there's no different way to implement it, except for some tunables.

@lolbinarycat
Copy link

if tempfile should be in stdlib, doesn't require a portable interface to the system RNG?

@NobodyXu
Copy link
Author

if tempfile should be in stdlib, doesn't require a portable interface to the system RNG?

stdlib already has it internally, and there's proposal to expose that.

@joshtriplett
Copy link
Member

We did a first-pass discussion about this in today's @rust-lang/libs-api meeting. We didn't come to a consensus yet, but some initial reactions were that some of the methods (e.g. open_tempfile) seemed non-portable, and that the full surface area seemed a little larger than necessary for a first pass (though we didn't come to a consensus yet on which bits).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

6 participants