Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rle() returns NULL #434

Closed
camnesia opened this issue Apr 21, 2023 · 2 comments · Fixed by #449
Closed

rle() returns NULL #434

camnesia opened this issue Apr 21, 2023 · 2 comments · Fixed by #449
Labels
bug an unexpected problem or unintended behavior

Comments

@camnesia
Copy link

I recently updated the dtplyr package and the code snippet with rle() no longer works. The val should have values 19 and 4 but instead both are NULL.

library(dplyr)
library(tidyr)
library(dtplyr)

data <- tibble(name = c('a','b'),
             string = c('0000000000000000000','0000hu000000')) %>%
  lazy_dt() %>%
  mutate(val = sapply(string, function(x) rle(strsplit(x, '')[[1]])$lengths[1])) %>%
  collect()

image

@markfairbanks markfairbanks added the bug an unexpected problem or unintended behavior label Apr 21, 2023
@markfairbanks
Copy link
Collaborator

markfairbanks commented May 1, 2023

Root cause - rle() is returning an rle object (more or less a list) with lengths and values. Since base::lengths() is a function in the base environment, dt_squash() prepends ... It assumes lengths is a variable in the global environment instead of something to be extracted from the rle object.

Another example. Let's say we're trying to add a column of another data frame to our lazy_dt:

library(dplyr)
library(dtplyr)

df <- tibble(length = 1)

tibble(x = 1:3, y = c("a", "a", "b")) %>%
  lazy_dt() %>%
  mutate(new = df$length)
#> Warning: Unknown or uninitialised column: `..length`.
#> Warning in `[.data.table`(copy(`_DT1`), , `:=`(new = ..df$..length)): Column
#> 'new' does not exist to remove
#> Source: local data table [3 x 2]
#> Call:   copy(`_DT1`)[, `:=`(new = ..df$..length)]
#> 
#>       x y    
#>   <int> <chr>
#> 1     1 a    
#> 2     2 a    
#> 3     3 b    
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

@markfairbanks
Copy link
Collaborator

The problem gets even more complicated if you put an action inside a function since dt_squash_call() tries to run eval() on them:

library(dtplyr)
library(dplyr, warn.conflicts = FALSE)

fn <- function() {
  df <- tibble(length = 1)

  tibble(x = 1:3, y = c("a", "a", "b")) %>%
    lazy_dt() %>%
    mutate(new = df$length)
}

fn()
#> Error in `$`(structure(list(length = 1), class = c("tbl_df", "tbl", "data.frame": invalid subscript type 'builtin'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants