Skip to content

Commit

Permalink
Upkeep 2024-09 (#479)
Browse files Browse the repository at this point in the history
* Update snapshots
* `use_tidy_github_actions()`
* Bump required R version
* Re-document & fix S3 method exports
* Eliminate dplyr warning
  • Loading branch information
hadley committed Sep 5, 2024
1 parent 82aee6b commit 564fccc
Show file tree
Hide file tree
Showing 13 changed files with 80 additions and 56 deletions.
27 changes: 14 additions & 13 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@ on:
pull_request:
branches: [main, master]

name: R-CMD-check
name: R-CMD-check.yaml

permissions: read-all

jobs:
R-CMD-check:
Expand All @@ -25,24 +27,22 @@ jobs:
- {os: macos-latest, r: 'release'}

- {os: windows-latest, r: 'release'}
# Use 3.6 to trigger usage of RTools35
- {os: windows-latest, r: '3.6'}
# use 4.1 to check with rtools40's older compiler
- {os: windows-latest, r: '4.1'}

- {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'}
- {os: ubuntu-latest, r: 'release'}
- {os: ubuntu-latest, r: 'oldrel-1'}
- {os: ubuntu-latest, r: 'oldrel-2'}
- {os: ubuntu-latest, r: 'oldrel-3'}
- {os: ubuntu-latest, r: 'oldrel-4'}
# use 4.0 or 4.1 to check with rtools40's older compiler
- {os: windows-latest, r: 'oldrel-4'}

- {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'}
- {os: ubuntu-latest, r: 'release'}
- {os: ubuntu-latest, r: 'oldrel-1'}
- {os: ubuntu-latest, r: 'oldrel-2'}
- {os: ubuntu-latest, r: 'oldrel-3'}
- {os: ubuntu-latest, r: 'oldrel-4'}

env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
R_KEEP_PKG_SOURCE: yes

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- uses: r-lib/actions/setup-pandoc@v2

Expand All @@ -60,3 +60,4 @@ jobs:
- uses: r-lib/actions/check-r-package@v2
with:
upload-snapshots: true
build_args: 'c("--no-manual","--compact-vignettes=gs+qpdf")'
8 changes: 5 additions & 3 deletions .github/workflows/pkgdown.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ on:
types: [published]
workflow_dispatch:

name: pkgdown
name: pkgdown.yaml

permissions: read-all

jobs:
pkgdown:
Expand All @@ -22,7 +24,7 @@ jobs:
permissions:
contents: write
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- uses: r-lib/actions/setup-pandoc@v2

Expand All @@ -41,7 +43,7 @@ jobs:

- name: Deploy to GitHub pages 🚀
if: github.event_name != 'pull_request'
uses: JamesIves/github-pages-deploy-action@v4.4.1
uses: JamesIves/github-pages-deploy-action@v4.5.0
with:
clean: false
branch: gh-pages
Expand Down
12 changes: 9 additions & 3 deletions .github/workflows/pr-commands.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ on:
issue_comment:
types: [created]

name: Commands
name: pr-commands.yaml

permissions: read-all

jobs:
document:
Expand All @@ -13,8 +15,10 @@ jobs:
runs-on: ubuntu-latest
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
permissions:
contents: write
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- uses: r-lib/actions/pr-fetch@v2
with:
Expand Down Expand Up @@ -50,8 +54,10 @@ jobs:
runs-on: ubuntu-latest
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
permissions:
contents: write
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- uses: r-lib/actions/pr-fetch@v2
with:
Expand Down
23 changes: 17 additions & 6 deletions .github/workflows/test-coverage.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ on:
pull_request:
branches: [main, master]

name: test-coverage
name: test-coverage.yaml

permissions: read-all

jobs:
test-coverage:
Expand All @@ -15,36 +17,45 @@ jobs:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- uses: r-lib/actions/setup-r@v2
with:
use-public-rspm: true

- uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: any::covr
extra-packages: any::covr, any::xml2
needs: coverage

- name: Test coverage
run: |
covr::codecov(
cov <- covr::package_coverage(
quiet = FALSE,
clean = FALSE,
install_path = file.path(normalizePath(Sys.getenv("RUNNER_TEMP"), winslash = "/"), "package")
)
covr::to_cobertura(cov)
shell: Rscript {0}

- uses: codecov/codecov-action@v4
with:
fail_ci_if_error: ${{ github.event_name != 'pull_request' && true || false }}
file: ./cobertura.xml
plugin: noop
disable_search: true
token: ${{ secrets.CODECOV_TOKEN }}

- name: Show testthat output
if: always()
run: |
## --------------------------------------------------------------------
find ${{ runner.temp }}/package -name 'testthat.Rout*' -exec cat '{}' \; || true
find '${{ runner.temp }}/package' -name 'testthat.Rout*' -exec cat '{}' \; || true
shell: bash

- name: Upload test results
if: failure()
uses: actions/upload-artifact@v3
uses: actions/upload-artifact@v4
with:
name: coverage-test-failures
path: ${{ runner.temp }}/package
8 changes: 4 additions & 4 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ Description: Provides a data.table backend for 'dplyr'. The goal of
License: MIT + file LICENSE
URL: https://dtplyr.tidyverse.org, https://github.com/tidyverse/dtplyr
BugReports: https://github.com/tidyverse/dtplyr/issues
Depends:
R (>= 3.6)
Depends:
R (>= 4.0)
Imports:
cli (>= 3.4.0),
data.table (>= 1.13.0),
Expand All @@ -35,10 +35,10 @@ Suggests:
testthat (>= 3.1.2),
tidyr (>= 1.1.0),
waldo (>= 0.3.1)
VignetteBuilder:
VignetteBuilder:
knitr
Config/Needs/website: tidyverse/tidytemplate
Config/testthat/edition: 3
Encoding: UTF-8
Roxygen: {library(tidyr); list(markdown = TRUE)}
RoxygenNote: 7.3.1
RoxygenNote: 7.3.2
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,11 @@ S3method(distinct,dtplyr_step)
S3method(do,dtplyr_step)
S3method(dt_call,dtplyr_step)
S3method(dt_call,dtplyr_step_assign)
S3method(dt_call,dtplyr_step_call)
S3method(dt_call,dtplyr_step_first)
S3method(dt_call,dtplyr_step_join)
S3method(dt_call,dtplyr_step_modify)
S3method(dt_call,dtplyr_step_mutate)
S3method(dt_call,dtplyr_step_set)
S3method(dt_call,dtplyr_step_subset)
S3method(dt_has_computation,dtplyr_step)
Expand Down
1 change: 1 addition & 0 deletions R/step-call.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ step_call <- function(parent, fun, args = list(), vars = parent$vars, in_place =
)
}

#' @export
dt_call.dtplyr_step_call <- function(x, needs_copy = x$needs_copy) {
call2(x$fun, dt_call(x$parent, needs_copy), !!!x$args)
}
Expand Down
3 changes: 1 addition & 2 deletions R/step-join.R
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ dt_call.dtplyr_step_join <- function(x, needs_copy = x$needs_copy) {
anti = call2("[", lhs, call2("!", rhs), on = on),
semi = call2("[", lhs, call2("unique", call2("[", lhs, rhs, which = TRUE, nomatch = NULL, on = on)))
)

if (x$style == "full") {
default_suffix <- c(".x", ".y")
if (!identical(x$suffix, default_suffix)) {
Expand Down Expand Up @@ -133,7 +133,6 @@ right_join.dtplyr_step <- function(x, y, ..., by = NULL, copy = FALSE, suffix =
step_join(x, y, by, style = "right", copy = copy, suffix = suffix)
}


#' @importFrom dplyr inner_join
#' @export
inner_join.dtplyr_step <- function(x, y, ..., by = NULL, copy = FALSE, suffix = c(".x", ".y")) {
Expand Down
1 change: 1 addition & 0 deletions R/step-mutate.R
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ step_mutate <- function(parent, new_vars = list(), use_braces = FALSE, by = new_
out
}

#' @export
dt_call.dtplyr_step_mutate <- function(x, needs_copy = x$needs_copy) {
# i is always empty because we never mutate a subset
if (is_empty(x$new_vars)) {
Expand Down
28 changes: 14 additions & 14 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ knitr::opts_chunk$set(

<!-- badges: start -->
[![CRAN status](https://www.r-pkg.org/badges/version/dtplyr)](https://cran.r-project.org/package=dtplyr)
[![R-CMD-check](https://github.com/tidyverse/dtplyr/workflows/R-CMD-check/badge.svg)](https://github.com/tidyverse/dtplyr/actions)
[![Codecov test coverage](https://codecov.io/gh/tidyverse/dtplyr/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/dtplyr?branch=main)
[![R-CMD-check](https://github.com/tidyverse/dtplyr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidyverse/dtplyr/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/tidyverse/dtplyr/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/dtplyr)
<!-- badges: end -->

## Overview
Expand Down Expand Up @@ -52,7 +52,7 @@ library(dtplyr)
library(dplyr, warn.conflicts = FALSE)
```

Then use `lazy_dt()` to create a "lazy" data table that tracks the operations performed on it.
Then use `lazy_dt()` to create a "lazy" data table that tracks the operations performed on it.

```{r}
mtcars2 <- lazy_dt(mtcars)
Expand All @@ -61,35 +61,35 @@ mtcars2 <- lazy_dt(mtcars)
You can preview the transformation (including the generated data.table code) by printing the result:

```{r}
mtcars2 %>%
filter(wt < 5) %>%
mtcars2 %>%
filter(wt < 5) %>%
mutate(l100k = 235.21 / mpg) %>% # liters / 100 km
group_by(cyl) %>%
group_by(cyl) %>%
summarise(l100k = mean(l100k))
```

But generally you should reserve this only for debugging, and use `as.data.table()`, `as.data.frame()`, or `as_tibble()` to indicate that you're done with the transformation and want to access the results:

```{r}
mtcars2 %>%
filter(wt < 5) %>%
mtcars2 %>%
filter(wt < 5) %>%
mutate(l100k = 235.21 / mpg) %>% # liters / 100 km
group_by(cyl) %>%
summarise(l100k = mean(l100k)) %>%
group_by(cyl) %>%
summarise(l100k = mean(l100k)) %>%
as_tibble()
```

## Why is dtplyr slower than data.table?

There are two primary reasons that dtplyr will always be somewhat slower than data.table:

* Each dplyr verb must do some work to convert dplyr syntax to data.table
syntax. This takes time proportional to the complexity of the input code,
* Each dplyr verb must do some work to convert dplyr syntax to data.table
syntax. This takes time proportional to the complexity of the input code,
not the input _data_, so should be a negligible overhead for large datasets.
[Initial benchmarks][benchmark] suggest that the overhead should be under
[Initial benchmarks][benchmark] suggest that the overhead should be under
1ms per dplyr call.

* To match dplyr semantics, `mutate()` does not modify in place by default.
* To match dplyr semantics, `mutate()` does not modify in place by default.
This means that most expressions involving `mutate()` must make a copy
that would not be necessary if you were using data.table directly.
(You can opt out of this behaviour in `lazy_dt()` with `immutable = FALSE`).
Expand Down
19 changes: 10 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@

[![CRAN
status](https://www.r-pkg.org/badges/version/dtplyr)](https://cran.r-project.org/package=dtplyr)
[![R-CMD-check](https://github.com/tidyverse/dtplyr/workflows/R-CMD-check/badge.svg)](https://github.com/tidyverse/dtplyr/actions)
[![R-CMD-check](https://github.com/tidyverse/dtplyr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidyverse/dtplyr/actions/workflows/R-CMD-check.yaml)
[![Codecov test
coverage](https://codecov.io/gh/tidyverse/dtplyr/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/dtplyr?branch=main)
coverage](https://codecov.io/gh/tidyverse/dtplyr/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/dtplyr)
<!-- badges: end -->

## Overview
Expand Down Expand Up @@ -47,6 +47,7 @@ other goodies that it provides:

``` r
library(data.table)
#> Warning: package 'data.table' was built under R version 4.4.1
library(dtplyr)
library(dplyr, warn.conflicts = FALSE)
```
Expand All @@ -62,10 +63,10 @@ You can preview the transformation (including the generated data.table
code) by printing the result:

``` r
mtcars2 %>%
filter(wt < 5) %>%
mtcars2 %>%
filter(wt < 5) %>%
mutate(l100k = 235.21 / mpg) %>% # liters / 100 km
group_by(cyl) %>%
group_by(cyl) %>%
summarise(l100k = mean(l100k))
#> Source: local data table [3 x 2]
#> Call: `_DT1`[wt < 5][, `:=`(l100k = 235.21/mpg)][, .(l100k = mean(l100k)),
Expand All @@ -85,11 +86,11 @@ But generally you should reserve this only for debugging, and use
you’re done with the transformation and want to access the results:

``` r
mtcars2 %>%
filter(wt < 5) %>%
mtcars2 %>%
filter(wt < 5) %>%
mutate(l100k = 235.21 / mpg) %>% # liters / 100 km
group_by(cyl) %>%
summarise(l100k = mean(l100k)) %>%
group_by(cyl) %>%
summarise(l100k = mean(l100k)) %>%
as_tibble()
#> # A tibble: 3 × 2
#> cyl l100k
Expand Down
2 changes: 1 addition & 1 deletion tests/testthat/_snaps/step-call.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,6 @@
collect(drop_na(dt, "z"))
Condition
Error in `drop_na()`:
! Can't subset columns that don't exist.
! Can't select columns that don't exist.
x Column `z` doesn't exist.

2 changes: 1 addition & 1 deletion tests/testthat/test-step-join.R
Original file line number Diff line number Diff line change
Expand Up @@ -346,10 +346,10 @@ test_that("performs cartesian joins as needed", {
test_that("performs cross join", {
df1 <- data.frame(x = 1:2, y = "a", stringsAsFactors = FALSE)
df2 <- data.frame(x = 3:4)
expected <- dplyr::cross_join(df1, df2) %>% as_tibble()

dt1 <- lazy_dt(df1, "dt1")
dt2 <- lazy_dt(df2, "dt2")
expected <- left_join(df1, df2, by = character()) %>% as_tibble()

expect_snapshot(left_join(dt1, dt2, by = character()))
expect_equal(left_join(dt1, dt2, by = character()) %>% collect(), expected)
Expand Down

0 comments on commit 564fccc

Please sign in to comment.