Implement `vec_map()` #1227

lionel- · 2020-08-19T14:51:27Z

Branched from #1226.

This implements vec_map(). Depending on the supplied prototype, it covers a range of functionality provided in purrr and more:

.ptype = list() implements map()
.ptype = integer() or other base atomic types implements map_int() etc
.ptype can also be set to any vctrs type

It isn't possible to infer the prototype from the list results because this would have no advantage over piping into vec_simplify() at the end.

The implementation follows two code paths depending on whether .ptype is a list or an atomic vector.

When mapping to a list, we update the input in place in the mapping loop and coerce the complete list of outputs to the target ptype at the end. This way the coercion is vectorised.
When mapping to an atomic vector we initialise an output vector and then coerce each input before assignment.

This vctrs implementation is competitive with purrr and base:

### Mapping to list

x <- list(1, b = 2)
bench::mark(
  base = lapply(x, plus_one),
  purrr = map(x, plus_one),
  vctrs = vec_map(x, plus_one)
)[1:8]
#>   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#> 1 base         2.43µs   3.51µs   268997.        0B     0    10000     0
#> 2 purrr       14.92µs  17.33µs    54736.        0B     5.47  9999     1
#> 3 vctrs        3.91µs   4.91µs   194467.        0B    19.4   9999     1

short <- rep(list(1, b = 2), 20)
bench::mark(
  base = lapply(short, plus_one),
  purrr = map(short, plus_one),
  vctrs = vec_map(short, plus_one)
)[1:8]
#>   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#> 1 base         21.6µs   25.8µs    36666.      368B     29.4  9992     8
#> 2 purrr        35.7µs   41.7µs    22743.      368B     25.0  9989    11
#> 3 vctrs        17.1µs     21µs    44847.      368B     31.4  9993     7

long <- rep(list(1, b = 2), 1e5)
bench::mark(
  base = lapply(long, plus_one),
  purrr = map(long, plus_one),
  vctrs = vec_map(long, plus_one)
)[1:8]
#>   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#> 1 base          120ms    126ms      6.81    1.53MB     15.3     4     9
#> 2 purrr         128ms    135ms      7.37    1.53MB     12.9     4     7
#> 3 vctrs          79ms     88ms     11.6     1.53MB     17.4     6     9


### Mapping to atomic vector

x <- list(1L, b = 2L)
bench::mark(
  base = vapply(x, plus_one, integer(1)),
  purrr = map_int(x, plus_one),
  vctrs = vec_map(x, plus_one, .ptype = int())
)[1:8]
#>   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#> 1 base         3.54µs    4.8µs   199405.        0B      0   10000     0
#> 2 purrr        5.67µs   7.28µs   130768.        0B     13.1  9999     1
#> 3 vctrs        6.82µs    8.2µs   114366.        0B      0   10000     0

short <- rep(list(1L, b = 2L), 20)
bench::mark(
  base = vapply(short, plus_one, integer(1)),
  purrr = map_int(short, plus_one),
  vctrs = vec_map(short, plus_one, .ptype = int())
)[1:8]
#>   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#> 1 base         22.7µs   26.9µs    35467.      208B     7.09  9998     2
#> 2 purrr        24.4µs   29.5µs    32371.      208B     6.48  9998     2
#> 3 vctrs        23.5µs   26.8µs    35572.      208B     3.56  9999     1

long <- rep(list(1L, b = 2L), 1e5)
bench::mark(
  base = vapply(long, plus_one, integer(1)),
  purrr = map_int(long, plus_one),
  vctrs = vec_map(long, plus_one, .ptype = int())
)[1:8]
#>   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#> 1 base          104ms  104.1ms      9.61     781KB     38.4     1     4
#> 2 purrr         115ms  115.2ms      8.68     781KB     34.7     1     4
#> 3 vctrs          89ms   89.4ms     11.2      781KB     22.4     2     4

The purrr compat file has been updated to use the vec_map() to get some internal testing and to make it easy to import unit tests from purrr.

And avoid infinite vctrs recursion when manipulating proxies

Should use `vec_simplify()` instead

lionel- · 2020-08-19T14:54:19Z

modify() could now be implemented with vec_map():

modify <- function(.x, .fn, ...) {
  vec_map(.x, .fn, ..., .ptype = .x)
}

modify(1:3, plus_one)
#> [1] 2 3 4

modify(c(FALSE, FALSE), plus_one)
#> [1] TRUE TRUE

modify(c(FALSE, TRUE), plus_one)
#> Error: Can't convert from <integer> to <logical> due to loss of precision.
#> * Locations: 1

lionel- · 2020-08-19T15:46:38Z

When mapping to a list, we update the input in place in the mapping loop and coerce the complete list of outputs to the target ptype at the end. This way the coercion is vectorised.

Like the vec_assign2() implementation in #1228, this requires the list type to be coercible with bare lists. I think we don't lose any important property by requiring this coercion since it goes in the direction of the richer type.

(The more problematic coercion is towards the narrower type but since we require "list" inheritance, we'll basically enforce narrowing coercions to lists once the base type fallback of #1135 is implemented.)

hadley

Let me know when it has docs to review, and I'll think more about the interface.

R/compat-purrr.R

hadley · 2020-08-19T17:32:43Z

R/partial-factor.R

@@ -50,7 +50,7 @@ new_partial_factor <- function(partial = factor(), learned = factor()) {
 vec_ptype_full.vctrs_partial_factor <- function(x, ...) {
  empty <- ""

-  levels <- map(x, levels)
+  levels <- map(unclass(x), levels)


Why does this change?

To avoid an infinite recursion.

But why did it work before?

Because map() didn't use vctrs operations and genericity.

The purrr compat file has been updated to use the vec_map() to get some internal testing and to make it easy to import unit tests from purrr.

oops sorry, the infinite recursion was another problem. It doesn't make sense for ptype-full to be recursed into anyway.

The problem is that map() now assigns names, and partial factors and data frames don't support that:

test-partial-factor.R:5: error: has ok print method `names<-.vctrs_partial_factor()` not supported.

This is a flaw in the partial types but as usual I'm just working around these types when they cause problems for now.

We no longer assign NULL names to be a little more efficient. However this unclass() change is still needed because partial types inherit from vctrs_sclr which have an unsupported names<- method but still allow names(), so vec_map() sees the internal field names. Making the latter unsupported causes a bunch of other issues. It wouldn't solve the problem at hand anyway because now we'd have a vector type for which names() is an error, which is a big genericity flaw. I think we shouldn't worry about these types too much for now.

lionel- · 2020-08-19T17:47:57Z

@hadley

Let me know when it has docs to review, and I'll think more about the interface.

Do you like the genericity model with lists implemented here? This version of map() supports both list and atomic outputs, and so needs the equivalent of [[<-. I thought I'd first get feedback on the mechanism proposed here and in #1226 (the operations in #1228 are also relevant) before documenting and exporting.

hadley

Overall interface seems reasonable to me.

hadley · 2020-08-24T12:21:21Z

src/map.c

+
+    SEXP elt_out = PROTECT(r_eval_force(vec_map_call, env));
+    if (vec_size(elt_out) != 1) {
+      r_abort("Mapped function must return a size 1 vector.");


Would be useful to includde index in this error message.

hadley · 2020-08-24T12:22:14Z

tests/testthat/test-map.R

+
+  expect_identical(
+    vec_map(vctr, identity),
+    vec_chop(vctr)


I think it would be better to define the expected output directly rather than in terms of another function, because I don't have enough of vec_chop() loaded in my head to have any idea what this does.

hadley · 2020-08-24T12:23:42Z

tests/testthat/test-map.R

+
+  return("Used to work in purrr")
+
+  out2 <- map(NULL, identity)


What does this return now?

DavisVaughan · 2020-08-31T14:20:49Z

Generally this looks good, but I think that I have a fairly strong aversion to the list part of this interface.

From slider, I learned that there is a meaningful difference between slide() and slide_vec(.ptype = list()).

The first takes each .f result and assigns it into a bare output list non-generically with SET_VECTOR_ELT() with no restrictions on the result of .f (type or size).
The second works identically to other suffix functions, like slide_dbl(), by casting to the list() ptype and by checking that the size of that list is 1 before assigning it into the output generically with vec_assign().

library(slider)

# Returns a list, assigns each element into the list with SET_VECTOR_ELT()
slide(1:2, ~1)
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 1

# Each `.f` result must be castable to a list of size 1,
# Assigned into output generically with `vec_assign()`
# (so it would extend nicely to list subclasses)
slide_vec(1:2, ~list(1), .ptype = list())
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 1

slide_vec(1:2, ~list(1, 2), .ptype = list())
#> Error: In iteration 1, the result of `.f` had size 2, not 1.

slide_vec(1:2, ~1, .ptype = list())
#> Error: Can't convert <double> to <list>.

I think that the current vec_map(.ptype = list()) implementation allows for 1), but not 2).

For purrr, this isn't a huge deal, because map_lst() and map_vec(.ptype = list()) both don't exist, but from a theoretical perspective it seems like it would be nice for vec_map(.ptype = list()) to work exactly like vec_map(.ptype = dbl()) does (with the same casting and size restrictions).

I would advocate for:

# This is the default
# `.ptype = NULL` is identical to `map()`,
# this is NOT guessing the ptype
vec_map(.x, .fn, ..., .ptype = NULL)

# This is like `slide_vec(.x, .fn, ..., .ptype = list())`
# and follows the exact same code path as the current `atomic_map()`
vec_map(.x, .fn, ..., .ptype = list())

# Genericity can be achieved with the following also going through `atomic_map()`
vec_map(.x, .fn, ..., .ptype = lst_rcrd())

Two additional notes:

.ptype = NULL is used elsewhere to mean "we are going to guess the output ptype". It is worth keeping this in mind, but I would be okay with that meaning something different here.
The current implementation would do .ptype = lst_rcrd() differently from my proposed implementation. It looks like it currently assigns non-generically with SET_VECTOR_ELT() into a bare list, and then would cast at the end to lst_rcrd(). It seems like this works when you can cast a list->list-subclass, but I'm not sure you always can? It works with vctrs_list_rcrd because the attribute fields can be computed from the elements, but I doubt that is always the case. Here is an example where the vectorized attribute fields can't be computed from the data:

library(vctrs)
library(rlang)
#> Warning: package 'rlang' was built under R version 4.0.2

local_methods <- function(..., .frame = caller_env()) {
  local_bindings(..., .env = global_env(), .frame = .frame)
}
local_methods_name_rcrd <- function(frame = caller_env()) {
  local_methods(
    .frame = frame,
    vec_proxy.vctrs_name_rcrd = function(x, ...) data_frame(data = unclass(x), last_names = attr(x, "last_names")),
    vec_restore.vctrs_name_rcrd = function(x, to, ...) new_name_rcrd(x$data, x$last_names),
    vec_ptype2.vctrs_name_rcrd.vctrs_name_rcrd = function(x, y, ...) x,
    vec_cast.vctrs_name_rcrd.vctrs_name_rcrd = function(x, to, ...) x,
    vec_cast.list.vctrs_name_rcrd = function(x, to, ...) vec_data(x)
  )
}

local_methods_name_rcrd()
#> Setting deferred event(s) on global environment.
#>   * Execute (and clear) with `withr::deferred_run()`.
#>   * Clear (without executing) with `withr::deferred_clear()`.

new_name_rcrd <- function(x, last_names) {
  structure(x, last_names = last_names, class = c("vctrs_name_rcrd", "list"))
}

# Example:
new_name_rcrd(list(1, 2:5), c("Foo", "Bar"))
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 2 3 4 5
#> 
#> attr(,"last_names")
#> [1] "Foo" "Bar"
#> attr(,"class")
#> [1] "vctrs_name_rcrd" "list"

x <- c("Vaughan", "Henry")
ptype <- new_name_rcrd(list(), character())

# Currently assigns into bare list, then casts to `ptype`,
# but you can't cast a list to vctrs_name_rcrd
vctrs:::vec_map(x, ~1, .ptype = ptype)
#> Error: Can't convert <list> to <vctrs_name_rcrd>.

# Alternatively, could cast each element to vctrs_name_rcrd and
# assign generically with `vec_assign()`.
# NOTE: You can't do this at all with the current impl.
# The following would give the same result:
# vctrs:::vec_map(x, ~new_name_rcrd(1, .x), .ptype = ptype)
new_name_rcrd(list(1, 1), x)
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 1
#> 
#> attr(,"last_names")
#> [1] "Vaughan" "Henry"  
#> attr(,"class")
#> [1] "vctrs_name_rcrd" "list"

lionel- · 2020-08-31T14:37:37Z

Generally this looks good, but I think that I have a fairly strong aversion to the list part of this interface.

From slider, I learned that there is a meaningful difference between slide() and slide_vec(.ptype = list()).

Sorry to have convinced you to use one approach in slider and then implement another approach for purrr... I now think a unique operation based on [[<- makes more sense, provided that the proposed way to deal with lists (chop2 / assign2) makes sense.

I think that the current vec_map(.ptype = list()) implementation allows for 1), but not 2).

I think it's clearer to just unbox the scalar rather than specify such a ptype. It doesn't seem like there's any practical advantage to an implementation based on [<-. When you think about it, it's weird to require functions that return a list containing a size-one vector.

tzakharko · 2022-07-27T08:41:29Z

I am curious whether this effort has been abandoned or still on the roadmap? I would be happy if I could retire purrr for a more streamlined implementation in vctrs

lionel- · 2022-09-28T14:25:24Z

Now implemented in purrr.

lionel- added 13 commits August 19, 2020 15:25

Add VECTOR_PTR_RO() accessor

07dfe18

Draft vec_map() implementation

8fd5b51

Use vec_map() in the compat-purrr utils

e380523

And avoid infinite vctrs recursion when manipulating proxies

Add missing functions in the purrr compat file

8cd177b

Force-eval vec_map() calls

d6e8057

Fix map_if() compat

2cd5b9a

Import purrr tests for vec_map()

3c3bbd3

Zap attributes of input before mapping to list

d95195f

Test vec_map() with S3 atomic vectors

a7d9de9

Don't allow prototype to be inferred

ef1eaff

Should use `vec_simplify()` instead

Extract list and atomic mapping in different functions

5f5ff4f

Check that mapped function returned a size 1 vector

5605fbd

Document genericity of vec_map()

609098b

lionel- requested review from DavisVaughan and hadley August 19, 2020 15:09

hadley reviewed Aug 19, 2020

View reviewed changes

lionel- added 2 commits August 20, 2020 10:15

Only assign names when they exist

de112a8

Qualify vctrs in compat file

7065a82

hadley approved these changes Aug 24, 2020

View reviewed changes

lionel- mentioned this pull request Nov 4, 2020

implement {map,map2,pmap}_{lgl,int,dbl,chr,raw}_matrix() tidyverse/purrr#801

Closed

3 tasks

lionel- closed this Sep 28, 2022

lionel- deleted the add-map branch September 28, 2022 14:25

tzakharko mentioned this pull request Jun 17, 2024

Map functions with vctrs semantics #1941

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `vec_map()` #1227

Implement `vec_map()` #1227

lionel- commented Aug 19, 2020 •

edited

Loading

lionel- commented Aug 19, 2020

lionel- commented Aug 19, 2020

hadley left a comment

hadley Aug 19, 2020

lionel- Aug 19, 2020

hadley Aug 19, 2020

lionel- Aug 20, 2020

lionel- Aug 20, 2020

lionel- Aug 20, 2020 •

edited

Loading

lionel- commented Aug 19, 2020

hadley left a comment

hadley Aug 24, 2020

hadley Aug 24, 2020

hadley Aug 24, 2020

DavisVaughan commented Aug 31, 2020

lionel- commented Aug 31, 2020

tzakharko commented Jul 27, 2022

lionel- commented Sep 28, 2022

Implement vec_map() #1227

Implement vec_map() #1227

Conversation

lionel- commented Aug 19, 2020 • edited Loading

lionel- commented Aug 19, 2020

lionel- commented Aug 19, 2020

hadley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lionel- Aug 20, 2020 • edited Loading

Choose a reason for hiding this comment

lionel- commented Aug 19, 2020

hadley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DavisVaughan commented Aug 31, 2020

lionel- commented Aug 31, 2020

tzakharko commented Jul 27, 2022

lionel- commented Sep 28, 2022

Implement `vec_map()` #1227

Implement `vec_map()` #1227

lionel- commented Aug 19, 2020 •

edited

Loading

lionel- Aug 20, 2020 •

edited

Loading