Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data frame splicing in j #459

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

data frame splicing in j #459

wants to merge 5 commits into from

Conversation

eutwt
Copy link
Collaborator

@eutwt eutwt commented Dec 2, 2023

closes #454

This appears to work but I'm not sure how I feel about adding it. I'm not sure if there's a better option for j when there's more than one ... argument (second example below). I'm also assuming R doesn't make a copy of the elements in the inner list but haven't tested that.

Right now I only made the change for summarise()

devtools::load_all('~/Documents/dtplyr')
#> ℹ Loading dtplyr

mtcars |> 
  dtplyr::lazy_dt() |> 
  dplyr::summarise(
    stats::quantile(mpg, probs = c(0.5, 0.9)) |> 
      tibble::as_tibble_row(),
    .by = cyl
  )
#> Source: local data table [3 x 3]
#> Call:   `_DT1`[, tibble::as_tibble_row(stats::quantile(mpg, probs = c(0.5, 
#>     0.9))), keyby = .(cyl)]
#> 
#>     cyl `50%` `90%`
#>   <dbl> <dbl> <dbl>
#> 1     4  26    32.4
#> 2     6  19.7  21.2
#> 3     8  15.2  18.3
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

mtcars |> 
  dtplyr::lazy_dt() |> 
  dplyr::summarise(
    stats::quantile(mpg, probs = c(0.5, 0.9)) |> 
      tibble::as_tibble_row(),
    z = 1,
    .by = cyl
  )
#> Source: local data table [3 x 4]
#> Call:   `_DT2`[, unlist(.(tibble::as_tibble_row(stats::quantile(mpg, 
#>     probs = c(0.5, 0.9))), z = list(1)), recursive = FALSE), 
#>     keyby = .(cyl)]
#> 
#>     cyl `50%` `90%`     z
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     4  26    32.4     1
#> 2     6  19.7  21.2     1
#> 3     8  15.2  18.3     1
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

Created on 2023-12-01 with reprex v2.0.2

@eutwt
Copy link
Collaborator Author

eutwt commented Dec 2, 2023

I didn't realize this when I made the commits above but when there is no by we don't need to list() + unlist() because data.table already splices the output. Just need to handle the names.

library(data.table)

df <- data.frame(a = 1, b = 2)
lst <- as.list(df)
dt <- data.table(x = 'x')
# spliced
dt[, .(df)]
#>     df.a  df.b
#>    <num> <num>
#> 1:     1     2
dt[, .(df, y = 'y')]
#>     df.a  df.b      y
#>    <num> <num> <char>
#> 1:     1     2      y

Created on 2023-12-02 with reprex v2.0.2

But this doesn't work with by

df <- data.frame(a = 1, b = 2)
lst <- as.list(df)
dt <- data.table(x = 'x')

dt[, .(df), by = x]
#> Error in `[.data.table`(dt, , .(df), by = x): All items in j=list(...) should be atomic vectors or lists. If you are trying something like j=list(.SD,newcol=mean(colA)) then use := by group instead (much quicker), or cbind or merge afterwards.

@markfairbanks
Copy link
Collaborator

Somehow this is possible now in dbplyr including with more complicated cases than calls to data frame creation functions. I've never had the time to investigate, so maybe it's a mechanism that wouldn't work in dtplyr.

pacman::p_load(dbplyr, dplyr)

new_cols <- tibble(c = 3, d = 4)

query <- memdb_frame(a = 1, b = 2)

query %>%
  mutate(new_cols) %>%
  show_query()
#> <SQL>
#> SELECT `dbplyr_001`.*, 3.0 AS `c`, 4.0 AS `d`
#> FROM `dbplyr_001`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect translation of data-frame-format dots in summarise()
2 participants