Skip to content

Commit

Permalink
Add data.table seal of approval
Browse files Browse the repository at this point in the history
  • Loading branch information
hadley committed Sep 5, 2024
1 parent 82aee6b commit de65774
Show file tree
Hide file tree
Showing 3 changed files with 24 additions and 24 deletions.
26 changes: 13 additions & 13 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ knitr::opts_chunk$set(

## Overview

dtplyr provides a [data.table](http://r-datatable.com/) backend for dplyr. The goal of dtplyr is to allow you to write dplyr code that is automatically translated to the equivalent, but usually much faster, data.table code.
<a href="https://rdatatable-community.github.io/The-Raft/posts/2024-08-01-seal_of_approval-dtplyr/"><img src='man/figures/dt-seal.png' align="right" width="200" height="157" alt="data.table seal of approval"/></a>dtplyr provides a [data.table](http://r-datatable.com/) backend for dplyr. The goal of dtplyr is to allow you to write dplyr code that is automatically translated to the equivalent, but usually much faster, data.table code.

See `vignette("translation")` for details of the current translations, and [table.express](https://github.com/asardaes/table.express) and [rqdatatable](https://github.com/WinVector/rqdatatable/) for related work.

Expand Down Expand Up @@ -52,7 +52,7 @@ library(dtplyr)
library(dplyr, warn.conflicts = FALSE)
```

Then use `lazy_dt()` to create a "lazy" data table that tracks the operations performed on it.
Then use `lazy_dt()` to create a "lazy" data table that tracks the operations performed on it.

```{r}
mtcars2 <- lazy_dt(mtcars)
Expand All @@ -61,35 +61,35 @@ mtcars2 <- lazy_dt(mtcars)
You can preview the transformation (including the generated data.table code) by printing the result:

```{r}
mtcars2 %>%
filter(wt < 5) %>%
mtcars2 %>%
filter(wt < 5) %>%
mutate(l100k = 235.21 / mpg) %>% # liters / 100 km
group_by(cyl) %>%
group_by(cyl) %>%
summarise(l100k = mean(l100k))
```

But generally you should reserve this only for debugging, and use `as.data.table()`, `as.data.frame()`, or `as_tibble()` to indicate that you're done with the transformation and want to access the results:

```{r}
mtcars2 %>%
filter(wt < 5) %>%
mtcars2 %>%
filter(wt < 5) %>%
mutate(l100k = 235.21 / mpg) %>% # liters / 100 km
group_by(cyl) %>%
summarise(l100k = mean(l100k)) %>%
group_by(cyl) %>%
summarise(l100k = mean(l100k)) %>%
as_tibble()
```

## Why is dtplyr slower than data.table?

There are two primary reasons that dtplyr will always be somewhat slower than data.table:

* Each dplyr verb must do some work to convert dplyr syntax to data.table
syntax. This takes time proportional to the complexity of the input code,
* Each dplyr verb must do some work to convert dplyr syntax to data.table
syntax. This takes time proportional to the complexity of the input code,
not the input _data_, so should be a negligible overhead for large datasets.
[Initial benchmarks][benchmark] suggest that the overhead should be under
[Initial benchmarks][benchmark] suggest that the overhead should be under
1ms per dplyr call.

* To match dplyr semantics, `mutate()` does not modify in place by default.
* To match dplyr semantics, `mutate()` does not modify in place by default.
This means that most expressions involving `mutate()` must make a copy
that would not be necessary if you were using data.table directly.
(You can opt out of this behaviour in `lazy_dt()` with `immutable = FALSE`).
Expand Down
22 changes: 11 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,10 @@ coverage](https://codecov.io/gh/tidyverse/dtplyr/branch/main/graph/badge.svg)](h

## Overview

dtplyr provides a [data.table](http://r-datatable.com/) backend for
dplyr. The goal of dtplyr is to allow you to write dplyr code that is
automatically translated to the equivalent, but usually much faster,
data.table code.
<a href="https://rdatatable-community.github.io/The-Raft/posts/2024-08-01-seal_of_approval-dtplyr/"><img src='man/figures/dt-seal.png' align="right" width="200" height="157" alt="data.table seal of approval"/></a>dtplyr
provides a [data.table](http://r-datatable.com/) backend for dplyr. The
goal of dtplyr is to allow you to write dplyr code that is automatically
translated to the equivalent, but usually much faster, data.table code.

See `vignette("translation")` for details of the current translations,
and [table.express](https://github.com/asardaes/table.express) and
Expand Down Expand Up @@ -62,10 +62,10 @@ You can preview the transformation (including the generated data.table
code) by printing the result:

``` r
mtcars2 %>%
filter(wt < 5) %>%
mtcars2 %>%
filter(wt < 5) %>%
mutate(l100k = 235.21 / mpg) %>% # liters / 100 km
group_by(cyl) %>%
group_by(cyl) %>%
summarise(l100k = mean(l100k))
#> Source: local data table [3 x 2]
#> Call: `_DT1`[wt < 5][, `:=`(l100k = 235.21/mpg)][, .(l100k = mean(l100k)),
Expand All @@ -85,11 +85,11 @@ But generally you should reserve this only for debugging, and use
you’re done with the transformation and want to access the results:

``` r
mtcars2 %>%
filter(wt < 5) %>%
mtcars2 %>%
filter(wt < 5) %>%
mutate(l100k = 235.21 / mpg) %>% # liters / 100 km
group_by(cyl) %>%
summarise(l100k = mean(l100k)) %>%
group_by(cyl) %>%
summarise(l100k = mean(l100k)) %>%
as_tibble()
#> # A tibble: 3 × 2
#> cyl l100k
Expand Down
Binary file added man/figures/dt-seal.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit de65774

Please sign in to comment.