`if_any` on zero-column selection yields `TRUE` #7059

brookslogan · 2024-07-26T23:57:05Z

Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/ or https://community.rstudio.com/.

if_any(<zero-column selection>, .fns) will produce TRUE. This contradicts expectations from, e..g, any().

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.

Brief description of the problem

suppressPackageStartupMessages({
  library(dplyr)
})

tbl <- tibble(
  x1 = 1:5,
  x2 = c(-1, 4, 5, 4, 1),
  y = c(1, 4, 2, 4, 9),
)

# Basic example: this seems surprising:
tbl %>% filter(if_any(c(), ~ is.na(.x)))
#> # A tibble: 5 × 3
#>      x1    x2     y
#>   <int> <dbl> <dbl>
#> 1     1    -1     1
#> 2     2     4     4
#> 3     3     5     2
#> 4     4     4     4
#> 5     5     1     9

# It seems more natural to return zero rows, like this workaround:
tbl %>% filter(!if_all(c(), ~ !is.na(.x)))
#> # A tibble: 0 × 3
#> # ℹ 3 variables: x1 <int>, x2 <dbl>, y <dbl>

# Debugging: checking what is directly output from `if_any`:
tbl %>% filter({tmp <- if_any(c(), ~ is.na(.x)); print(tmp); tmp}) %>% invisible()
#> [1] TRUE

# Less minimal, motivating example: I don't recall the original context where I
# originally encountered this. But we might imagine models where the user can
# manually specify certain variables for special treatment, or when variables
# are selected for special treatment automatically.

predict1 <- function(tbl, regular_predictors, rate_predictors, outcomes) {
  problematic_rows <- tbl %>%
    filter(if_any({{rate_predictors}}, ~ .x < 0))
  if (nrow(problematic_rows) != 0L) {
    stop(paste0(
      "Rate predictors should always be < 0; some problematic rows shown below:",
      paste(collapse = "\n", paste0("  ", capture.output(print(problematic_rows))))
    ))
  }
  return (42L)
}

# Output here is good:
predict1(tbl, x1, x2, y)
#> Error in predict1(tbl, x1, x2, y): Rate predictors should always be < 0; some problematic rows shown below:  # A tibble: 1 × 3
#>        x1    x2     y
#>     <int> <dbl> <dbl>
#>   1     1    -1     1

# Output here is bad; there shouldn't be a problem flagged:
predict1(tbl, c(x1, x2), c(), y)
#> Error in predict1(tbl, c(x1, x2), c(), y): Rate predictors should always be < 0; some problematic rows shown below:  # A tibble: 5 × 3
#>        x1    x2     y
#>     <int> <dbl> <dbl>
#>   1     1    -1     1
#>   2     2     4     4
#>   3     3     5     2
#>   4     4     4     4
#>   5     5     1     9

^{Created on 2024-07-26 with reprex v2.1.1}

The text was updated successfully, but these errors were encountered:

DavisVaughan · 2024-07-27T13:12:08Z

At first glance, I'm fairly certain I agree for this reasoning alone

> any()
[1] FALSE
> all()
[1] TRUE

DavisVaughan · 2024-07-27T14:06:44Z

For TDD - I think the problem is here

dplyr/R/across.R

Lines 637 to 640 in 0005f67

    
           # Select all rows if there are no inputs 
        
           if (!length(quos)) { 
        
             return(list(quo(TRUE))) 
        
           }

We unconditionally return TRUE when there are no inputs, but I think this should only be for if_all(). For if_any() I think this should return FALSE, in line with any() and all() as shown above.

You could create a new variable called empty up here where we branch based on the type of if function being used (i.e. you'd assign it TRUE or FALSE)

dplyr/R/across.R

Lines 618 to 624 in 0005f67

    
           if (is_call(call, "if_any")) { 
        
             op <- "|" 
        
             if_fn <- "if_any" 
        
           } else { 
        
             op <- "&" 
        
             if_fn <- "if_all" 
        
           }

And then return that instead of just always returning TRUE

Don't forget to update the comment here

dplyr/R/across.R

Line 637 in 0005f67

# Select all rows if there are no inputs

And add tests with both if_any() and if_all() with no inputs

Fixes tidyverse#7059 This commit addresses the unexpected behavior of `if_any()` and `if_all()` when no columns are selected. Specifically, `if_any()` now returns `FALSE`, and `if_all()` returns `TRUE` in cases where no columns match the selection criteria. Additionally, tests have been added to ensure the correct behavior of these functions in both `filter()` and `mutate()` contexts, including the scenarios highlighted by the `reprex` example. This ensures that empty selections behave consistently with the logical expectations.

… reference issue tidyverse#7059

DavisVaughan added bug an unexpected problem or unintended behavior rows ↕️ Operations on rows: filter(), slice(), arrange() tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day labels Jul 27, 2024

jrwinget mentioned this issue Aug 15, 2024

Fix if_any() and if_all() behavior with zero-column selections #7072

Merged

jrwinget added a commit to jrwinget/dplyr that referenced this issue Aug 15, 2024

Simplified expected values in tests and adjusted test descriptions to…

eca8a14

… reference issue tidyverse#7059

DavisVaughan closed this as completed in b2c7e04 Aug 15, 2024

DavisVaughan closed this as completed in #7072 Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`if_any` on zero-column selection yields `TRUE` #7059

`if_any` on zero-column selection yields `TRUE` #7059

brookslogan commented Jul 26, 2024

DavisVaughan commented Jul 27, 2024

DavisVaughan commented Jul 27, 2024

if_any on zero-column selection yields TRUE #7059

if_any on zero-column selection yields TRUE #7059

Comments

brookslogan commented Jul 26, 2024

DavisVaughan commented Jul 27, 2024

DavisVaughan commented Jul 27, 2024

`if_any` on zero-column selection yields `TRUE` #7059

`if_any` on zero-column selection yields `TRUE` #7059