Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve consistency of names handling in prototype functions #1020

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ S3method("length<-",vctrs_rcrd)
S3method("length<-",vctrs_vctr)
S3method("levels<-",vctrs_sclr)
S3method("levels<-",vctrs_vctr)
S3method("names<-",vctrs_partial)
S3method("names<-",vctrs_rcrd)
S3method("names<-",vctrs_sclr)
S3method("names<-",vctrs_vctr)
S3method("|",vctrs_vctr)
Expand Down
7 changes: 7 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@

# vctrs (development version)

* `vec_ptype2()` and `vec_ptype_common()` now always return empty
vectors. This simplifies the implementation of ptype2 methods, which
no longer need to worry about returning an empty vector.

* `vec_ptype2()` and `vec_ptype_common()` now consistently return
unnamed prototypes.

* `new_vctr()` now always appends a base `"list"` class to list `.data` to
be compatible with changes to `vec_is_list()`. This affects `new_list_of()`,
which now returns an object with a base class of `"list"`.
Expand Down
12 changes: 12 additions & 0 deletions R/partial.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,18 @@ new_partial <- function(..., class = character()) {
new_sclr(..., class = c(class, "vctrs_partial"))
}

## Needed because partial classes inherit from `vctrs_sclr` which
## can't be renamed. And `vec_ptype2()` etc zap the names.
#' @export
`names<-.vctrs_partial` <- function(x, value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't feel quite right to me — I don't think we should be systematically stripping names, and then providing names<- methods that ignore the stripping. Why not just strip names on atomic vectors?

Copy link
Member Author

@lionel- lionel- Apr 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The partial objects are not vectors, which makes them weird in vctrs. This is just a stopgap. I don't think the hacks around partial vectors should make us pause.

As for record vectors, they should support names<- at some point, like other vectors. I would really prefer a general solution than ignoring S3 vectors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More generally, maybe there really exists some vectors that do not support names. It doesn't seem ill-founded to require of these vectors to implement set-to-null as a no-op in their names<- method.

# Allow setting names to `NULL` for compatibility with `vec_ptype2()`
if (!is_null(value)) {
abort("Can't set names of partial vectors.")
}
x
}


#' @export
obj_print_header.vctrs_partial <- function(x, ...) {
NULL
Expand Down
11 changes: 11 additions & 0 deletions R/type-rcrd.R
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,17 @@ vec_restore.vctrs_rcrd <- function(x, to, ...) {
x
}

#' @export
`names<-.vctrs_rcrd` <- function(x, value) {
# Allow setting names to `NULL` for compatibility with `vec_ptype2()`.
# Eventually we should add full names support via a special rcrd
# field.
if (!is_null(value)) {
abort("Setting the names of record vectors is currently unimplemented.")
}
x
}

#' @export
length.vctrs_rcrd <- function(x) {
.Call(vctrs_size, x)
Expand Down
38 changes: 34 additions & 4 deletions src/type.c
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#include "vctrs.h"
#include "utils.h"
#include "arg-counter.h"
#include "type-data-frame.h"

// Initialised at load time
static SEXP syms_vec_ptype_finalise_dispatch = NULL;
Expand All @@ -9,6 +10,7 @@ static SEXP fns_vec_ptype_finalise_dispatch = NULL;

static inline SEXP vec_ptype_slice(SEXP x, SEXP empty);
static SEXP s3_type(SEXP x, struct vctrs_arg* x_arg);
static SEXP df_ptype(SEXP x, bool bare);

// [[ register() ]]
SEXP vctrs_ptype(SEXP x, SEXP x_arg) {
Expand All @@ -30,13 +32,21 @@ SEXP vec_ptype(SEXP x, struct vctrs_arg* x_arg) {
case vctrs_type_character: return vec_ptype_slice(x, vctrs_shared_empty_chr);
case vctrs_type_raw: return vec_ptype_slice(x, vctrs_shared_empty_raw);
case vctrs_type_list: return vec_ptype_slice(x, vctrs_shared_empty_list);
case vctrs_type_dataframe: return bare_df_map(x, &col_ptype);
case vctrs_type_dataframe: return df_ptype(x, true);
case vctrs_type_s3: return s3_type(x, x_arg);
case vctrs_type_scalar: stop_scalar_type(x, x_arg);
}
never_reached("vec_ptype");
}

// [[ include("vctrs.h") ]]
SEXP vec_ptype_unnamed(SEXP x, struct vctrs_arg* x_arg) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does feel a bit awkward that we have to use vec_ptype_unnamed() in vec_ptype2() to get the unname behavior, but vec_ptype() retains names. I am also leaning towards letting vec_ptype() unname, which would remove the need for the specialized vec_ptype_unnamed()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like vec_ptype() always returning unnamed vectors would simplify the notion of a prototype in vctrs.

SEXP out = PROTECT(vec_ptype(x, x_arg));
out = vec_set_names(out, R_NilValue);
UNPROTECT(1);
return out;
}

static SEXP col_ptype(SEXP x) {
return vec_ptype(x, args_empty);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the columns of the data frame be named? Or should names be stripped from them too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think they should be unnamed for consistency with vec-assign.

}
Expand All @@ -52,10 +62,10 @@ static inline SEXP vec_ptype_slice(SEXP x, SEXP empty) {
static SEXP s3_type(SEXP x, struct vctrs_arg* x_arg) {
switch (class_type(x)) {
case vctrs_class_bare_tibble:
return bare_df_map(x, &col_ptype);
return df_ptype(x, true);

case vctrs_class_data_frame:
return df_map(x, &col_ptype);
return df_ptype(x, false);

case vctrs_class_bare_data_frame:
Rf_errorcall(R_NilValue, "Internal error: Bare data frames should be handled by `vec_ptype()`");
Expand All @@ -75,6 +85,24 @@ static SEXP s3_type(SEXP x, struct vctrs_arg* x_arg) {
return vec_slice(x, R_NilValue);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be interesting to consider a version of vec_slice() that doesn't preserve names so we don't have to do as much work to unname after we slice. This feels somewhat similar to having a version of vec_c() that doesn't retain names

Copy link
Member

@DavisVaughan DavisVaughan Apr 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we did this, we'd have to reconsider how names are tracked in the lubridate PR. When we proxy, the names are added as a new column. This would get sliced and then restored even if we turned on the option to not preserve names.
https://github.com/tidyverse/lubridate/pull/871/files#diff-29798887c998d8e58008ea67fdf141d3R32

On the other hand, POSIXlt stores names directly on the $year field. So when we proxy we get a column that is named. If we turn off name preservation that should strip these names (So it seems like the POSIXlt approach is "smarter" than by lubridate approach and might be something to keep in mind going forward)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like if we did this then we wouldn't even need a vec_set_names() call

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may still need the set-names call on S3 objects (or at least on S3 data frames) because the names might be encoded in a field, as in POSIXlt vectors. Or maybe we should enforce names as a special vctrs::rcrd_names field in data frame proxies for performance and simplicity. Then vec-slice could rely on either the names attribute or the record field.

Copy link
Member

@DavisVaughan DavisVaughan Apr 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the names might be encoded in a field, as in POSIXlt vectors

I think I'm arguing above that a vec_slice(x, R_NilValue, preserve_names = false) would already work correctly in this case.

The proxy for POSIXlt is a data frame with a $year named year column. That column vector would be sliced with vec_slice(year, preserve_names = false) so names wouldn't be preserved

There might be other cases where this doesn't work, but it might be useful to say that a proxy should always expose the names of a vector in such a way that slicing can remove them like this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe POSIXlt is not a good example. I was thinking the difference between named and unnamed record vectors of size zero would be the presence of a names field. This is equivalent to the presence or absence of a names attributes for normal vectors of size zero.

}

SEXP df_ptype(SEXP x, bool bare) {
SEXP row_nms = PROTECT(df_rownames(x));

SEXP ptype = R_NilValue;
if (bare) {
ptype = PROTECT(bare_df_map(x, &col_ptype));
} else {
ptype = PROTECT(df_map(x, &col_ptype));
}

if (TYPEOF(row_nms) == STRSXP) {
Rf_setAttrib(ptype, R_RowNamesSymbol, vctrs_shared_empty_chr);
}

UNPROTECT(2);
return ptype;
}

static SEXP vec_ptype_finalise_unspecified(SEXP x);
static SEXP vec_ptype_finalise_dispatch(SEXP x);

Expand Down Expand Up @@ -161,9 +189,11 @@ SEXP vctrs_type_common(SEXP call, SEXP op, SEXP args, SEXP env) {
return out;
}



SEXP vctrs_type_common_impl(SEXP dots, SEXP ptype) {
if (!vec_is_partial(ptype)) {
return vec_ptype(ptype, args_dot_ptype);
return vec_ptype_unnamed(ptype, args_dot_ptype);
}

if (r_is_true(r_peek_option("vctrs.no_guessing"))) {
Expand Down
18 changes: 12 additions & 6 deletions src/type2.c
Original file line number Diff line number Diff line change
Expand Up @@ -18,21 +18,21 @@ SEXP vec_ptype2(SEXP x, SEXP y,
int* left) {
if (x == R_NilValue) {
*left = y == R_NilValue;
return vec_ptype(y, y_arg);
return vec_ptype_unnamed(y, y_arg);
}
if (y == R_NilValue) {
*left = x == R_NilValue;
return vec_ptype(x, x_arg);
return vec_ptype_unnamed(x, x_arg);
}

enum vctrs_type type_x = vec_typeof(x);
enum vctrs_type type_y = vec_typeof(y);

if (type_x == vctrs_type_unspecified) {
return vec_ptype(y, y_arg);
return vec_ptype_unnamed(y, y_arg);
}
if (type_y == vctrs_type_unspecified) {
return vec_ptype(x, x_arg);
return vec_ptype_unnamed(x, x_arg);
}

if (type_x == vctrs_type_scalar) {
Expand All @@ -42,11 +42,17 @@ SEXP vec_ptype2(SEXP x, SEXP y,
stop_scalar_type(y, y_arg);
}

SEXP ptype = R_NilValue;
if (type_x == vctrs_type_s3 || type_y == vctrs_type_s3) {
return vec_ptype2_dispatch(x, y, type_x, type_y, x_arg, y_arg, left);
ptype = PROTECT(vec_ptype2_dispatch(x, y, type_x, type_y, x_arg, y_arg, left));
} else {
return vec_ptype2_switch_native(x, y, type_x, type_y, x_arg, y_arg, left);
ptype = PROTECT(vec_ptype2_switch_native(x, y, type_x, type_y, x_arg, y_arg, left));
}

ptype = vec_ptype_unnamed(ptype, NULL);

UNPROTECT(1);
return ptype;
}

static SEXP vec_ptype2_switch_native(SEXP x,
Expand Down
1 change: 1 addition & 0 deletions src/vctrs.h
Original file line number Diff line number Diff line change
Expand Up @@ -373,6 +373,7 @@ SEXP vec_assign_shaped(SEXP proxy, SEXP index, SEXP value);
bool vec_requires_fallback(SEXP x, struct vctrs_proxy_info info);
SEXP vec_init(SEXP x, R_len_t n);
SEXP vec_ptype(SEXP x, struct vctrs_arg* x_arg);
SEXP vec_ptype_unnamed(SEXP x, struct vctrs_arg* x_arg);
SEXP vec_ptype_finalise(SEXP x);
bool vec_is_unspecified(SEXP x);
SEXP vec_recycle(SEXP x, R_len_t size, struct vctrs_arg* x_arg);
Expand Down
8 changes: 7 additions & 1 deletion tests/testthat/helper-s3.R
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@

foobar <- function(x = list()) structure(x, class = "vctrs_foobar")
foobar <- function(x = list()) {
if (is.data.frame(x)) {
structure(x, class = c("vctrs_foobar", "data.frame"))
} else {
structure(x, class = "vctrs_foobar")
}
}

with_c_foobar <- function(expr) {
with_methods(
Expand Down
15 changes: 11 additions & 4 deletions tests/testthat/test-type-bare.R
Original file line number Diff line number Diff line change
Expand Up @@ -356,13 +356,20 @@ test_that("raw has informative type summaries", {
})

test_that("can provide common type with raw", {
lhs_dispatched <- FALSE
rhs_dispatched <- FALSE

local_methods(
vec_ptype2.raw.vctrs_foobar = function(...) "dispatched-left",
vec_ptype2.raw.vctrs_foobar = function(...) rhs_dispatched <<- TRUE,
vec_ptype2.vctrs_foobar = function(...) NULL,
vec_ptype2.vctrs_foobar.raw = function(...) "dispatched-right"
vec_ptype2.vctrs_foobar.raw = function(...) lhs_dispatched <<- TRUE
)
expect_identical(vec_ptype2(raw(), foobar("")), "dispatched-left")
expect_identical(vec_ptype2(foobar(""), raw()), "dispatched-right")

vec_ptype2(raw(), foobar(""))
vec_ptype2(foobar(""), raw())

expect_true(lhs_dispatched)
expect_true(rhs_dispatched)
})


Expand Down
16 changes: 16 additions & 0 deletions tests/testthat/test-type-rcrd.R
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,22 @@ test_that("vec_proxy() transforms records to data frames", {
)
})

test_that("can't set names", {
x <- new_rcrd(list(x = 1, y = 2))
expect_error(vec_set_names(x, "foo"), "unimplemented")
expect_error(set_names(x, "foo"), "unimplemented")
})

test_that("can set names to NULL", {
x <- new_rcrd(list(x = 1, y = 2))

out <- vec_set_names(x, NULL)
expect_identical(unclass(out), list(x = 1, y = 2))

out <- set_names(x, NULL)
expect_identical(unclass(out), list(x = 1, y = 2))
})


# coercion ----------------------------------------------------------------

Expand Down
64 changes: 58 additions & 6 deletions tests/testthat/test-type2.R
Original file line number Diff line number Diff line change
Expand Up @@ -143,19 +143,29 @@ test_that("vec_ptype2() returns empty prototype when other input is NULL", {
test_that("Subclasses of data.frame dispatch to `vec_ptype2()` methods", {
local_methods(
vec_ptype2.quuxframe = function(x, y, ...) UseMethod("vec_ptype2.quuxframe"),
vec_ptype2.quuxframe.data.frame = function(x, y, ...) "dispatched!",
vec_ptype2.data.frame.quuxframe = function(x, y, ...) "dispatched!"
vec_ptype2.quuxframe.data.frame = function(x, y, ...) lhs_dispatched <<- TRUE,
vec_ptype2.data.frame.quuxframe = function(x, y, ...) rhs_dispatched <<- TRUE
)

quux <- structure(data.frame(), class = c("quuxframe", "data.frame"))

expect_identical(vec_ptype2(quux, mtcars), "dispatched!")
expect_identical(vec_ptype2(mtcars, quux), "dispatched!")
lhs_dispatched <- FALSE
rhs_dispatched <- FALSE

vec_ptype2(quux, mtcars)
vec_ptype2(mtcars, quux)
expect_true(lhs_dispatched)
expect_true(rhs_dispatched)

lhs_dispatched <- FALSE
rhs_dispatched <- FALSE

quux <- structure(data.frame(), class = c("quuxframe", "tbl_df", "data.frame"))

expect_identical(vec_ptype2(quux, mtcars), "dispatched!")
expect_identical(vec_ptype2(mtcars, quux), "dispatched!")
vec_ptype2(quux, mtcars)
vec_ptype2(mtcars, quux)
expect_true(lhs_dispatched)
expect_true(rhs_dispatched)
})

test_that("Subclasses of `tbl_df` do not have `tbl_df` common type (#481)", {
Expand Down Expand Up @@ -225,3 +235,45 @@ test_that("vec_ptype2() errors have informative output", {
})
})

test_that("common type doesn't have names", {
expect_unnamed <- function(vec1, vec2) {
exp <- vec_slice(vec1, 0)

if (is.data.frame(exp)) {
exp <- unrownames(exp)
} else {
exp <- unname(exp)
}

expect_identical(vec_ptype2(vec1, vec2), exp)
expect_identical(vec_ptype_common(vec1), exp)
expect_identical(vec_ptype_common(vec1, vec2), exp)
expect_identical(vec_ptype_common(vec1, .ptype = vec2), exp)
}

expect_unnamed(c(foo = 1), c(bar = 2))
expect_unnamed(foobar(c(foo = 1)), foobar(c(bar = 2)))

# Unlike the `vctrs_foobar` test above, this doesn't hit the is-same-type fallback
with_methods(
vec_ptype2.vctrs_foobar = function(x, y, ...) NULL,
vec_ptype2.vctrs_foobar.vctrs_foobar = function(x, y, ...) x,
expect_unnamed(foobar(c(foo = 1)), foobar(c(bar = 2)))
)

expect_unnamed(mtcars[1:2, ], mtcars[3:4, ])
expect_unnamed(
foobar(mtcars[1:2, ]),
foobar(mtcars[3:4, ])
)

# Note: Zero-rows matrices can't be named so they are not tested here

# For reference, vec_ptype() currently keeps names
expect_identical(vec_ptype(c(foo = 1)), named(dbl()))
expect_identical(vec_ptype(mtcars), mtcars[0, ])
expect_identical(vec_ptype(foobar(mtcars)), foobar(mtcars[0, ]))

skip("FIXME: vec_slice() doesn't restore foreign classes?")
expect_identical(vec_ptype(foobar(c(foo = 1))), foobar(named(dbl())))
})