Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using TRUE in case_when causes error if output vector length does not equal group size .N #339

Open
eutwt opened this issue Feb 11, 2022 · 1 comment
Labels
bug an unexpected problem or unintended behavior

Comments

@eutwt
Copy link
Collaborator

eutwt commented Feb 11, 2022

Originally posted by @KesterJ in #300 (comment)

I've encountered a version of this issue that doesn't involve &&, and where group_by() is called after lazy_dt(). Reprex below:

library(dplyr, warn.conflicts = FALSE)
library(dtplyr, warn.conflicts = FALSE)

options(dplyr.summarise.inform = FALSE)

loans <- tibble(
  borrower_id = c(1,1,1,1,2,2),
  loan_id = c("A", "A", "B", "B", "C", "C"),
  year = c(2020, 2021, 2020, 2021, 2020, 2021),
  repayments = c(0, 0, 0, 200, 150, 50)
)

#In dplyr (works)
loans %>%
  group_by(borrower_id, year) %>%
  summarise(
    status = case_when(any(repayments > 0) ~ "Made repayments",
                       TRUE ~ "Did not make any repayments")
  ) %>%
  ungroup()
#> # A tibble: 4 x 3
#>   borrower_id  year status                     
#>         <dbl> <dbl> <chr>                      
#> 1           1  2020 Did not make any repayments
#> 2           1  2021 Made repayments            
#> 3           2  2020 Made repayments            
#> 4           2  2021 Made repayments

#In dtplyr (does not work)
loans %>%
  lazy_dt() %>%
  group_by(borrower_id, year) %>%
  summarise(
    status = case_when(any(repayments > 0) ~ "Made repayments",
                       TRUE ~ "Did not make any repayments")
  ) %>%
  ungroup() %>%
  as_tibble()
#> Error in fcase(any(repayments > 0), "Made repayments", rep(TRUE, .N), : Argument #3 has a different length than argument #1. Please make sure all logical conditions have the same length.

#In dtplyr with different grouping that includes only one row per group (works)
loans %>%
  lazy_dt() %>%
  group_by(loan_id, year) %>%
  summarise(
    status = case_when(any(repayments > 0) ~ "Made repayments",
                       TRUE ~ "Did not make any repayments")
  ) %>%
  ungroup() %>%
  as_tibble()
#> # A tibble: 6 x 3
#>   loan_id  year status                     
#>   <chr>   <dbl> <chr>                      
#> 1 A        2020 Did not make any repayments
#> 2 A        2021 Did not make any repayments
#> 3 B        2020 Did not make any repayments
#> 4 B        2021 Made repayments            
#> 5 C        2020 Made repayments            
#> 6 C        2021 Made repayments

Created on 2022-02-11 by the reprex package (v2.0.1)

@eutwt
Copy link
Collaborator Author

eutwt commented Feb 13, 2022

I think the only way to address this would be to assign the first argument and its length to variables, then pass them to fcase with the TRUE/Ts replicated the right number of times. But, that seems like a bad idea

library(dplyr, warn.conflicts = FALSE)
library(dtplyr, warn.conflicts = FALSE)
options(dplyr.summarise.inform = FALSE)

loans <- tibble(
  borrower_id = c(1,1,1,1,2,2),
  loan_id = c("A", "A", "B", "B", "C", "C"),
  year = c(2020, 2021, 2020, 2021, 2020, 2021),
  repayments = c(0, 0, 0, 200, 150, 50)
)

dtp_out <- 
  loans %>%
    lazy_dt() %>%
    group_by(borrower_id, year) %>%
    summarise(
      status = case_when(any(repayments > 0) ~ "Made repayments",
                         TRUE ~ "Did not make any repayments")
    ) 


dtp_out %>%
  ungroup() %>%
  as_tibble()
#> # A tibble: 4 × 3
#>   borrower_id  year status                     
#>         <dbl> <dbl> <chr>                      
#> 1           1  2020 Did not make any repayments
#> 2           1  2021 Made repayments            
#> 3           2  2020 Made repayments            
#> 4           2  2021 Made repayments

dtp_out %>% 
  show_query()
#> `_DT1`[, .(status = local({
#>     .dtp_case_arg1 <- any(repayments > 0)
#>     .dtp_case_len <- length(.dtp_case_arg1)
#>     fcase(.dtp_case_arg1, "Made repayments", rep(TRUE, .dtp_case_len), 
#>         "Did not make any repayments")
#> })), keyby = .(borrower_id, year)]

Created on 2022-02-12 by the reprex package (v2.0.1)

@markfairbanks markfairbanks added the bug an unexpected problem or unintended behavior label Jun 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants