diff --git a/README.Rmd b/README.Rmd index bdbc2650..ef624402 100644 --- a/README.Rmd +++ b/README.Rmd @@ -23,7 +23,7 @@ knitr::opts_chunk$set( ## Overview -dtplyr provides a [data.table](http://r-datatable.com/) backend for dplyr. The goal of dtplyr is to allow you to write dplyr code that is automatically translated to the equivalent, but usually much faster, data.table code. +data.table seal of approvaldtplyr provides a [data.table](http://r-datatable.com/) backend for dplyr. The goal of dtplyr is to allow you to write dplyr code that is automatically translated to the equivalent, but usually much faster, data.table code. See `vignette("translation")` for details of the current translations, and [table.express](https://github.com/asardaes/table.express) and [rqdatatable](https://github.com/WinVector/rqdatatable/) for related work. @@ -52,7 +52,7 @@ library(dtplyr) library(dplyr, warn.conflicts = FALSE) ``` -Then use `lazy_dt()` to create a "lazy" data table that tracks the operations performed on it. +Then use `lazy_dt()` to create a "lazy" data table that tracks the operations performed on it. ```{r} mtcars2 <- lazy_dt(mtcars) @@ -61,21 +61,21 @@ mtcars2 <- lazy_dt(mtcars) You can preview the transformation (including the generated data.table code) by printing the result: ```{r} -mtcars2 %>% - filter(wt < 5) %>% +mtcars2 %>% + filter(wt < 5) %>% mutate(l100k = 235.21 / mpg) %>% # liters / 100 km - group_by(cyl) %>% + group_by(cyl) %>% summarise(l100k = mean(l100k)) ``` But generally you should reserve this only for debugging, and use `as.data.table()`, `as.data.frame()`, or `as_tibble()` to indicate that you're done with the transformation and want to access the results: ```{r} -mtcars2 %>% - filter(wt < 5) %>% +mtcars2 %>% + filter(wt < 5) %>% mutate(l100k = 235.21 / mpg) %>% # liters / 100 km - group_by(cyl) %>% - summarise(l100k = mean(l100k)) %>% + group_by(cyl) %>% + summarise(l100k = mean(l100k)) %>% as_tibble() ``` @@ -83,13 +83,13 @@ mtcars2 %>% There are two primary reasons that dtplyr will always be somewhat slower than data.table: -* Each dplyr verb must do some work to convert dplyr syntax to data.table - syntax. This takes time proportional to the complexity of the input code, +* Each dplyr verb must do some work to convert dplyr syntax to data.table + syntax. This takes time proportional to the complexity of the input code, not the input _data_, so should be a negligible overhead for large datasets. - [Initial benchmarks][benchmark] suggest that the overhead should be under + [Initial benchmarks][benchmark] suggest that the overhead should be under 1ms per dplyr call. -* To match dplyr semantics, `mutate()` does not modify in place by default. +* To match dplyr semantics, `mutate()` does not modify in place by default. This means that most expressions involving `mutate()` must make a copy that would not be necessary if you were using data.table directly. (You can opt out of this behaviour in `lazy_dt()` with `immutable = FALSE`). diff --git a/README.md b/README.md index 7c7fb568..bbaa64fd 100644 --- a/README.md +++ b/README.md @@ -14,10 +14,10 @@ coverage](https://codecov.io/gh/tidyverse/dtplyr/branch/main/graph/badge.svg)](h ## Overview -dtplyr provides a [data.table](http://r-datatable.com/) backend for -dplyr. The goal of dtplyr is to allow you to write dplyr code that is -automatically translated to the equivalent, but usually much faster, -data.table code. +data.table seal of approvaldtplyr +provides a [data.table](http://r-datatable.com/) backend for dplyr. The +goal of dtplyr is to allow you to write dplyr code that is automatically +translated to the equivalent, but usually much faster, data.table code. See `vignette("translation")` for details of the current translations, and [table.express](https://github.com/asardaes/table.express) and @@ -62,10 +62,10 @@ You can preview the transformation (including the generated data.table code) by printing the result: ``` r -mtcars2 %>% - filter(wt < 5) %>% +mtcars2 %>% + filter(wt < 5) %>% mutate(l100k = 235.21 / mpg) %>% # liters / 100 km - group_by(cyl) %>% + group_by(cyl) %>% summarise(l100k = mean(l100k)) #> Source: local data table [3 x 2] #> Call: `_DT1`[wt < 5][, `:=`(l100k = 235.21/mpg)][, .(l100k = mean(l100k)), @@ -85,11 +85,11 @@ But generally you should reserve this only for debugging, and use you’re done with the transformation and want to access the results: ``` r -mtcars2 %>% - filter(wt < 5) %>% +mtcars2 %>% + filter(wt < 5) %>% mutate(l100k = 235.21 / mpg) %>% # liters / 100 km - group_by(cyl) %>% - summarise(l100k = mean(l100k)) %>% + group_by(cyl) %>% + summarise(l100k = mean(l100k)) %>% as_tibble() #> # A tibble: 3 × 2 #> cyl l100k diff --git a/man/figures/dt-seal.png b/man/figures/dt-seal.png new file mode 100644 index 00000000..1a1aacdb Binary files /dev/null and b/man/figures/dt-seal.png differ