Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite docs for std::ptr (Take #2) #51016

Closed
wants to merge 13 commits into from
180 changes: 146 additions & 34 deletions src/libcore/intrinsics.rs
Original file line number Diff line number Diff line change
Expand Up @@ -962,59 +962,129 @@ extern "rust-intrinsic" {
/// value is not necessarily valid to be used to actually access memory.
pub fn arith_offset<T>(dst: *const T, offset: isize) -> *const T;

/// Copies `count * size_of<T>` bytes from `src` to `dst`. The source
/// and destination may *not* overlap.
/// Copies `count * size_of::<T>()` bytes from `src` to `dst`. The source
/// and destination must *not* overlap.
///
/// `copy_nonoverlapping` is semantically equivalent to C's `memcpy`.
/// For regions of memory which might overlap, use [`copy`] instead.
///
/// `copy_nonoverlapping` is semantically equivalent to C's [`memcpy`].
///
/// [`copy`]: ./fn.copy.html
/// [`memcpy`]: https://www.gnu.org/software/libc/manual/html_node/Copying-Strings-and-Arrays.html#index-memcpy
///
/// # Safety
///
/// Beyond requiring that the program must be allowed to access both regions
/// of memory, it is Undefined Behavior for source and destination to
/// overlap. Care must also be taken with the ownership of `src` and
/// `dst`. This method semantically moves the values of `src` into `dst`.
/// However it does not drop the contents of `dst`, or prevent the contents
/// of `src` from being dropped or used.
/// Behavior is undefined if any of the following conditions are violated:
///
/// * Both `src` and `dst` must be [valid].
///
/// * Both `src` and `dst` must be properly aligned.
///
/// * `src.offset(count-1)` must be [valid]. In other words, the region of
/// memory which begins at `src` and has a length of `count *
/// size_of::<T>()` bytes must belong to a single, live allocation.
Copy link
Member

@RalfJung RalfJung Jun 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The part after "in other words" says way more than the the first part! Also, Why is "src must be valid" and "src.offset(count-1)" split into two conditions? I see one condition here, which is "src.offset(i) must be valid for reading size_of::<T>() bytes for 0 <= i < count", or, equivalently, "src must be valid for reading size_of::<T>() * count bytes".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now see in your comments in the thread that you argue it would otherwise violate aliasing rules. That seems plausible but unnecessarily contorted for a definition.

Given that validity has to be defined for a size anyway, I'd just change this to require validity of the entire affected memory range at once (i.e. length size_of::<T>() * count).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got a bit to clever here :) I was trying to rely exclusively on the definition of validity to define invariants over arrays as well as single accesses. The notion of a "single, live allocation" doesn't appear anywhere in the reference to my knowledge, which is why it went in "in other words".

As I note in my later comment, I think that making pointer validity a function of access size overcomplicates things a bit, and that the type of the pointer should fully specify the number of bytes being accessed when reading or writing.

From this point of view, your src.offset(i) for all i in 0..count works formulation nicely, although it still references offset, which is maybe a bit opaque.

Copy link
Member

@RalfJung RalfJung Jun 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I note in my later comment, I think that making pointer validity a function of access size overcomplicates things a bit, and that the type of the pointer should fully specify the number of bytes being accessed when reading or writing.

In my eyes, making it depend on the whole type makes things more complicated. With your proposal, validity is a function of the pointer and its type, and you should say so clearly ("A pointer pointing to type T is valid ..."). A type contains much more information than just a size! With my proposal, it is crystal clear that the size of the only part that actually matters ("A pointer is valid for an access of size n ..."). Notice that I use "pointer" here as a run-time concept; pointers themselves (as in, just the raw address in memory -- or whatever abstract representation of pointers you want to think of) are intrinsically untyped. So either way you need to add some extra information to define validity; the question is only if you restrict that to what is actually necessary (the size) or if you add a whole lot of other irrelevant information as well (the type).

Plus, for functions where the type alone does not suffice, we don't have to do awkward things like use .offset in our definition and rely on that function to be undefined when crossing allocation boundaries.

Copy link
Contributor Author

@ecstatic-morse ecstatic-morse Jun 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean I would need to qualify that, for example, ptr::read requires that its argument be valid for a read of at least size_of::<T>() bytes? This is what I meant by overcomplicates, not that the concept itself wasn't necessary, but that stating it everywhere would be redundant.

Simply declaring that when the size of an access for *const T is not explicitly stated, we mean valid for size_of::<T>() bytes would work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this!

///
/// * `dst.offset(count-1)` must be [valid]. In other words, the region of
/// memory which begins at `dst` and has a length of `count *
/// size_of::<T>()` bytes must belong to a single, live allocation.
///
/// * The two regions of memory must *not* overlap.
///
/// Like [`read`], `copy` creates a bitwise copy of `T`, regardless of
/// whether `T` is [`Copy`]. If `T` is not [`Copy`], using both the values
/// in the region beginning at `*src` and the region beginning at `*dst` can
/// [violate memory safety][read-ownership].
///
/// [`Copy`]: ../marker/trait.Copy.html
/// [`read`]: ../ptr/fn.read.html
/// [read-ownership]: ../ptr/fn.read.html#ownership-of-the-returned-value
/// [valid]: ../ptr/index.html#safety
///
/// # Examples
///
/// A safe swap function:
/// Manually implement [`Vec::append`]:
///
/// ```
/// use std::mem;
/// use std::ptr;
///
/// # #[allow(dead_code)]
/// fn swap<T>(x: &mut T, y: &mut T) {
/// /// Moves all the elements of `src` into `dst`, leaving `src` empty.
/// fn append<T>(dst: &mut Vec<T>, src: &mut Vec<T>) {
/// let src_len = src.len();
/// let dst_len = dst.len();
///
/// // Ensure that `dst` has enough capacity to hold all of `src`.
/// dst.reserve(src_len);
///
/// unsafe {
/// // Give ourselves some scratch space to work with
/// let mut t: T = mem::uninitialized();
/// // The call to offset is always safe because `Vec` will never
/// // allocate more than `isize::MAX` bytes.
/// let dst = dst.as_mut_ptr().offset(dst_len as isize);
/// let src = src.as_ptr();
///
/// // The two regions cannot overlap becuase mutable references do
/// // not alias, and two different vectors cannot own the same
/// // memory.
/// ptr::copy_nonoverlapping(src, dst, src_len);
/// }
///
/// // Perform the swap, `&mut` pointers never alias
/// ptr::copy_nonoverlapping(x, &mut t, 1);
/// ptr::copy_nonoverlapping(y, x, 1);
/// ptr::copy_nonoverlapping(&t, y, 1);
/// unsafe {
/// // Truncate `src` without dropping its contents.
/// src.set_len(0);
///
/// // y and t now point to the same thing, but we need to completely forget `t`
/// // because it's no longer relevant.
/// mem::forget(t);
/// // Notify `dst` that it now holds the contents of `src`.
/// dst.set_len(dst_len + src_len);
/// }
/// }
///
/// let mut a = vec!['r'];
/// let mut b = vec!['u', 's', 't'];
///
/// append(&mut a, &mut b);
///
/// assert_eq!(a, &['r', 'u', 's', 't']);
/// assert!(b.is_empty());
/// ```
///
/// [`Vec::append`]: ../../std/vec/struct.Vec.html#method.append
#[stable(feature = "rust1", since = "1.0.0")]
pub fn copy_nonoverlapping<T>(src: *const T, dst: *mut T, count: usize);

/// Copies `count * size_of<T>` bytes from `src` to `dst`. The source
/// Copies `count * size_of::<T>()` bytes from `src` to `dst`. The source
/// and destination may overlap.
///
/// `copy` is semantically equivalent to C's `memmove`.
/// If the source and destination will *never* overlap,
/// [`copy_nonoverlapping`] can be used instead.
///
/// `copy` is semantically equivalent to C's [`memmove`].
///
/// [`copy_nonoverlapping`]: ./fn.copy_nonoverlapping.html
/// [`memmove`]: https://www.gnu.org/software/libc/manual/html_node/Copying-Strings-and-Arrays.html#index-memmove
///
/// # Safety
///
/// Care must be taken with the ownership of `src` and `dst`.
/// This method semantically moves the values of `src` into `dst`.
/// However it does not drop the contents of `dst`, or prevent the contents of `src`
/// from being dropped or used.
/// Behavior is undefined if any of the following conditions are violated:
///
/// * Both `src` and `dst` must be [valid].
///
/// * Both `src` and `dst` must be properly aligned.
///
/// * `src.offset(count-1)` must be [valid]. In other words, the region of
/// memory which begins at `src` and has a length of `count *
/// size_of::<T>()` bytes must belong to a single, live allocation.
///
/// * `dst.offset(count-1)` must be [valid]. In other words, the region of
/// memory which begins at `dst` and has a length of `count *
/// size_of::<T>()` bytes must belong to a single, live allocation.
///
/// Like [`read`], `copy` creates a bitwise copy of `T`, regardless of
/// whether `T` is [`Copy`]. If `T` is not [`Copy`], using both the values
/// in the region beginning at `*src` and the region beginning at `*dst` can
/// [violate memory safety][read-ownership].
///
/// [`Copy`]: ../marker/trait.Copy.html
/// [`read`]: ../ptr/fn.read.html
/// [read-ownership]: ../ptr/fn.read.html#ownership-of-the-returned-value
/// [valid]: ../ptr/index.html#safety
///
/// # Examples
///
Expand All @@ -1031,24 +1101,66 @@ extern "rust-intrinsic" {
/// dst
/// }
/// ```
///
#[stable(feature = "rust1", since = "1.0.0")]
pub fn copy<T>(src: *const T, dst: *mut T, count: usize);

/// Invokes memset on the specified pointer, setting `count * size_of::<T>()`
/// bytes of memory starting at `dst` to `val`.
/// Sets `count * size_of::<T>()` bytes of memory starting at `dst` to
/// `val`.
///
/// `write_bytes` is similar to C's [`memset`], but sets `count *
/// size_of::<T>()` bytes to `val`.
///
/// [`memset`]: https://www.gnu.org/software/libc/manual/html_node/Copying-Strings-and-Arrays.html#index-memset
///
/// # Safety
///
/// Behavior is undefined if any of the following conditions are violated:
///
/// * `dst` must be [valid].
///
/// * `dst.offset(count-1)` must be [valid]. In other words, the region of
/// memory which begins at `dst` and has a length of `count *
/// size_of::<T>()` bytes must belong to a single, live allocation.
///
/// * `dst` must be properly aligned.
///
/// Additionally, the caller must ensure that writing `count *
/// size_of::<T>()` bytes to the given region of memory results in a valid
/// value of `T`. Creating an invalid value of `T` can result in undefined
/// behavior.
///
/// [valid]: ../ptr/index.html#safety
///
/// # Examples
///
/// Basic usage:
///
/// ```
/// use std::ptr;
///
/// let mut vec = vec![0; 4];
/// let mut vec = vec![0u32; 4];
/// unsafe {
/// let vec_ptr = vec.as_mut_ptr();
/// ptr::write_bytes(vec_ptr, b'a', 2);
/// ptr::write_bytes(vec_ptr, 0xfe, 2);
/// }
/// assert_eq!(vec, [b'a', b'a', 0, 0]);
/// assert_eq!(vec, [0xfefefefe, 0xfefefefe, 0, 0]);
/// ```
///
/// Creating an invalid value:
///
/// ```no_run
/// use std::ptr;
///
/// let mut v = Box::new(0i32);
///
/// unsafe {
/// // Leaks the previously held value by overwriting the `Box<T>` with
/// // a null pointer.
/// ptr::write_bytes(&mut v, 0, 1);
/// }
///
/// // At this point, using or dropping `v` results in undefined behavior.
/// // v = Box::new(0i32); // ERROR
/// ```
#[stable(feature = "rust1", since = "1.0.0")]
pub fn write_bytes<T>(dst: *mut T, val: u8, count: usize);
Expand Down
Loading