Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peculiar interaction between sapply() and a (named) vctr #1416

Open
jennybc opened this issue Jul 19, 2021 · 4 comments
Open

Peculiar interaction between sapply() and a (named) vctr #1416

jennybc opened this issue Jul 19, 2021 · 4 comments

Comments

@jennybc
Copy link
Member

jennybc commented Jul 19, 2021

I recently re-implemented an existing S3 class using vctrs and have noticed that sapply() / lapply() don't necessarily transfer the names of the input to the output. (I was not actually relying on this in any meaningful way, mind you, but I noticed that a piece of documentation started to render differently.)

library(vctrs)

x <- new_vctr(1:3, class = "foofy")
names(x) <- letters[1:3]
x
#> <foofy[3]>
#> a b c 
#> 1 2 3

z <- 1:3
names(z) <- letters[1:3]
z
#> a b c 
#> 1 2 3

f <- function(x) "blah"

sapply(x, f)
#> [1] "blah" "blah" "blah"
sapply(z, f)
#>      a      b      c 
#> "blah" "blah" "blah"

Why do the names of x not transfer, while those for z do? What's weird is that it seems to vary by the function being applied. Here both have names. (Well, it's slightly less surprising now that I see the inner/outer name thing I show below.)

g <- function(x) x

sapply(x, g)
#> a b c 
#> 1 2 3
sapply(z, g)
#> a b c 
#> 1 2 3

Looking at the source of sapply(), these seem to be important differences. The vctrs-made vector returns FALSE for is.vector(), which means it gets sent through as.list() and as.list() also does something quite different for x vs. z, in terms of inner vs outer names.

is.vector(x)
#> [1] FALSE
is.vector(z)
#> [1] TRUE

as.list(x)
#> [[1]]
#> <foofy[1]>
#> a 
#> 1 
#> 
#> [[2]]
#> <foofy[1]>
#> b 
#> 2 
#> 
#> [[3]]
#> <foofy[1]>
#> c 
#> 3
as.list(z)
#> $a
#> [1] 1
#> 
#> $b
#> [1] 2
#> 
#> $c
#> [1] 3

Created on 2021-07-19 by the reprex package (v2.0.0.9000)

@jennybc
Copy link
Member Author

jennybc commented Jul 19, 2021

Interestingly (to me, at least), purrr::map_chr() does the right and expected thing (and that will be what I do):

library(vctrs)

x <- new_vctr(1:3, class = "foofy")
names(x) <- letters[1:3]
x
#> <foofy[3]>
#> a b c 
#> 1 2 3

f <- function(x) "blah"

sapply(x, f)
#> [1] "blah" "blah" "blah"
purrr::map_chr(x, f)
#>      a      b      c 
#> "blah" "blah" "blah"

Created on 2021-07-19 by the reprex package (v2.0.0.9000)

@DavisVaughan
Copy link
Member

The as.list() names handling behavior of the vctrs_vctr class is really inherited from vec_chop(). Because vec_chop() allows you to specify indices which can result in >1 observations per element of the resulting list, it doesn't make sense to always "promote" the inner names to outer names.

library(vctrs)

vctrs:::as.list.vctrs_vctr
#> function (x, ...) 
#> {
#>     out <- vec_chop(x)
#>     if (vec_is_list(x)) {
#>         out <- lapply(out, `[[`, 1)
#>     }
#>     out
#> }
#> <bytecode: 0x7f84270abe08>
#> <environment: namespace:vctrs>

x <- new_vctr(1:3, class = "foofy")
names(x) <- letters[1:3]
x
#> <foofy[3]>
#> a b c 
#> 1 2 3

# This is what `as.list()` calls
vec_chop(x)
#> [[1]]
#> <foofy[1]>
#> a 
#> 1 
#> 
#> [[2]]
#> <foofy[1]>
#> b 
#> 2 
#> 
#> [[3]]
#> <foofy[1]>
#> c 
#> 3

# But we could also specify indices, where it makes more sense to keep the
# inner names rather than "promote" them to outer names, which would not be defined
vec_chop(x, indices = list(1:2, 2:3, 3L))
#> [[1]]
#> <foofy[2]>
#> a b 
#> 1 2 
#> 
#> [[2]]
#> <foofy[2]>
#> b c 
#> 2 3 
#> 
#> [[3]]
#> <foofy[1]>
#> c 
#> 3

But maybe we should consider special casing the as.list.vctrs_vctr() method for compatibility with base as.list() behavior. Like:

as_list_vctrs_vctr <- function(x) {
  names <- vec_names(x)
  has_names <- !is.null(names)
  
  if (has_names) {
    x <- vec_set_names(x, NULL)
  }
  
  out <- vec_chop(x)
  
  if (has_names) {
    out <- vec_set_names(out, names)
  } 
  
  if (vec_is_list(x)) {
    out <- lapply(out, `[[`, 1)
  }
  
  out
}

as_list_vctrs_vctr(x)
#> $a
#> <foofy[1]>
#> [1] 1
#> 
#> $b
#> <foofy[1]>
#> [1] 2
#> 
#> $c
#> <foofy[1]>
#> [1] 3

@jennybc
Copy link
Member Author

jennybc commented Jul 20, 2021

But maybe we should consider special casing the as.list.vctrs_vctr() method for compatibility with base as.list() behavior.

This feels a bit morally similar to #1186 about rbind(), in terms of making vctrs-made objects work / work better with base functions. In that #1186, I'm not sure if rbind() can be made correct and, if not, whether it's better to error.

Just based on what I see here, though, this seems like an easier call? It is surprising when sapply() doesn't work the way it usually does re: names, so if there's no downside, it seem like adjusting as.list.vctrs_vctr() would be nice.

@lionel-
Copy link
Member

lionel- commented Jul 20, 2021

We plan to use vec_chop2() from #1226 once it is merged. This will fix the preservation of names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants