Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new_vctr() in C #1498

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

DavisVaughan
Copy link
Member

I'm hoping to move some of the list-of code to C, since I am using that more internally in tidyr, but that requires moving new_vctr() to C. So this PR does that.

A few notes:

  • I went the same route as new_data_frame() and used .External(), which seemed best for working with ...
  • Like new_data_frame(), supplying names = as an attribute now overrides existing names
  • It is currently impossible to provide class = in ... due to the class argument, but I check that in case we ever make these dots dynamic
  • This also required moving names_repair_missing() to C, but that was straightforward
  • The actual "setting" of attributes at the end is a little clunky because we don't have access to Rf_shallow_duplicate_attr(), which creates an ALTREP wrapper around an object if you plan on just updating its attributes. That makes setting attributes extremely lightweight and fast. That is available through attributes<-, which is what vec_set_attributes() does on R >=3.6, so we call back to R to be able to call that.

We have tried to keep <vctrs_vctr> out of the main codebase, and I think this still accomplishes that. There is nothing here that couldn't be moved to a standalone package - which I think is the right way to think about where C code can interact with the vctr type.

Here is a benchmark showing fairly minor improvements in performance. I think this will be helpful in the long run, since new_vctr() will be called a lot (like in a reduction of vec_ptype2() calls where the result of each call to vec_ptype2() calls new_vctr(), which is what list-of does).

library(vctrs)

x <- 1:1e6 + 0L

bench::mark(vctr = new_vctr(x), iterations = 1000000)
# CRAN
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vctr         4.33µs   4.97µs   189041.    29.8KB     15.9

# This PR
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vctr         2.72µs   3.18µs   301434.    6.41KB     24.4

bench::mark(vctr = new_vctr(x, class = "foo"), iterations = 1000000)
# CRAN
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vctr         4.09µs   4.61µs   206191.        0B     16.5

# This PR
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vctr         2.82µs   3.29µs   292022.        0B     26.0

bench::mark(vctr = new_vctr(x, class = "foo", inherit_base_type = TRUE), iterations = 1000000)
# CRAN
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vctr          4.2µs   4.77µs   198925.        0B     16.3

# This PR
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vctr         3.07µs   3.57µs   269074.        0B     25.0

bench::mark(vctr = new_vctr(x, a = 1, b = x, class = "foo", inherit_base_type = TRUE), iterations = 1000000)
# CRAN
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vctr         4.47µs   5.24µs   182313.        0B     17.5

# This PR
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vctr          3.6µs   4.11µs   234050.        0B     25.3

Created on 2021-11-17 by the reprex package (v2.0.1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant