Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve consistency of names handling in prototype functions #1020

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

lionel-
Copy link
Member

@lionel- lionel- commented Apr 17, 2020

  • Consistently return unnamed vectors from vec_ptype2(). Record vectors gain a names<- method to support this. It currently only allows setting names to NULL, but should be extended in the future by storing names in a special field.

  • Consistently preserve names in vec_ptype(), including row names. I still wonder if vec_ptype() should unname though.

  • In the same spirit of unnaming the results of ptype2 methods, we also empty them. This simplifies the implementation of methods which can now just return x or y.

## Needed because partial classes inherit from `vctrs_sclr` which
## can't be renamed. And `vec_ptype2()` etc zap the names.
#' @export
`names<-.vctrs_partial` <- function(x, value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't feel quite right to me — I don't think we should be systematically stripping names, and then providing names<- methods that ignore the stripping. Why not just strip names on atomic vectors?

Copy link
Member Author

@lionel- lionel- Apr 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The partial objects are not vectors, which makes them weird in vctrs. This is just a stopgap. I don't think the hacks around partial vectors should make us pause.

As for record vectors, they should support names<- at some point, like other vectors. I would really prefer a general solution than ignoring S3 vectors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More generally, maybe there really exists some vectors that do not support names. It doesn't seem ill-founded to require of these vectors to implement set-to-null as a no-op in their names<- method.

UNPROTECT(1);
return out;
}

static SEXP col_ptype(SEXP x) {
return vec_ptype(x, args_empty);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the columns of the data frame be named? Or should names be stripped from them too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think they should be unnamed for consistency with vec-assign.

case vctrs_type_s3: return s3_type(x, x_arg);
case vctrs_type_scalar: stop_scalar_type(x, x_arg);
}
never_reached("vec_ptype");
}

// [[ include("vctrs.h") ]]
SEXP vec_ptype_unnamed(SEXP x, struct vctrs_arg* x_arg) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does feel a bit awkward that we have to use vec_ptype_unnamed() in vec_ptype2() to get the unname behavior, but vec_ptype() retains names. I am also leaning towards letting vec_ptype() unname, which would remove the need for the specialized vec_ptype_unnamed()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like vec_ptype() always returning unnamed vectors would simplify the notion of a prototype in vctrs.

@@ -75,6 +85,24 @@ static SEXP s3_type(SEXP x, struct vctrs_arg* x_arg) {
return vec_slice(x, R_NilValue);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be interesting to consider a version of vec_slice() that doesn't preserve names so we don't have to do as much work to unname after we slice. This feels somewhat similar to having a version of vec_c() that doesn't retain names

Copy link
Member

@DavisVaughan DavisVaughan Apr 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we did this, we'd have to reconsider how names are tracked in the lubridate PR. When we proxy, the names are added as a new column. This would get sliced and then restored even if we turned on the option to not preserve names.
https://github.com/tidyverse/lubridate/pull/871/files#diff-29798887c998d8e58008ea67fdf141d3R32

On the other hand, POSIXlt stores names directly on the $year field. So when we proxy we get a column that is named. If we turn off name preservation that should strip these names (So it seems like the POSIXlt approach is "smarter" than by lubridate approach and might be something to keep in mind going forward)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like if we did this then we wouldn't even need a vec_set_names() call

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may still need the set-names call on S3 objects (or at least on S3 data frames) because the names might be encoded in a field, as in POSIXlt vectors. Or maybe we should enforce names as a special vctrs::rcrd_names field in data frame proxies for performance and simplicity. Then vec-slice could rely on either the names attribute or the record field.

Copy link
Member

@DavisVaughan DavisVaughan Apr 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the names might be encoded in a field, as in POSIXlt vectors

I think I'm arguing above that a vec_slice(x, R_NilValue, preserve_names = false) would already work correctly in this case.

The proxy for POSIXlt is a data frame with a $year named year column. That column vector would be sliced with vec_slice(year, preserve_names = false) so names wouldn't be preserved

There might be other cases where this doesn't work, but it might be useful to say that a proxy should always expose the names of a vector in such a way that slicing can remove them like this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe POSIXlt is not a good example. I was thinking the difference between named and unnamed record vectors of size zero would be the presence of a names field. This is equivalent to the presence or absence of a names attributes for normal vectors of size zero.

@lionel-
Copy link
Member Author

lionel- commented Apr 20, 2020

Related issue: #623. vec_cast() should preserve the names of the input

@hadley
Copy link
Member

hadley commented Apr 20, 2020

I am uncomfortable with this change. I don't think it's needed for dplyr 1.0.0, so I'd prefer to put off any discussion until we have more time.

@lionel-
Copy link
Member Author

lionel- commented Apr 20, 2020

Interestingly we have this in vec_is(), which also goes in the direction of removing names from prototypes:

  x <- vec_slice(x, integer())
  ptype <- vec_slice(ptype, integer())

  # FIXME: Remove row names for matrices and arrays, and handle empty
  # but existing dimnames
  x <- vec_set_names(x, NULL)
  ptype <- vec_set_names(ptype, NULL)

Worth noting that rcrd vectors are currently not doing the right thing with this implementation, since vec-set-names currently goes through names<-.vctrs_vctr. This overwrites the field names.

lionel- added a commit to lionel-/vctrs that referenced this pull request Apr 27, 2020
lionel- added a commit to lionel-/vctrs that referenced this pull request Apr 27, 2020
lionel- added a commit to lionel-/vctrs that referenced this pull request Apr 27, 2020
@lionel- lionel- added this to the 0.4.0 milestone May 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants