Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

if_any on zero-column selection yields TRUE #7059

Closed
brookslogan opened this issue Jul 26, 2024 · 2 comments · Fixed by #7072
Closed

if_any on zero-column selection yields TRUE #7059

brookslogan opened this issue Jul 26, 2024 · 2 comments · Fixed by #7072
Labels
bug an unexpected problem or unintended behavior rows ↕️ Operations on rows: filter(), slice(), arrange() tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day

Comments

@brookslogan
Copy link

Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/ or https://community.rstudio.com/.

if_any(<zero-column selection>, .fns) will produce TRUE. This contradicts expectations from, e..g, any().

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.


Brief description of the problem

suppressPackageStartupMessages({
  library(dplyr)
})

tbl <- tibble(
  x1 = 1:5,
  x2 = c(-1, 4, 5, 4, 1),
  y = c(1, 4, 2, 4, 9),
)

# Basic example: this seems surprising:
tbl %>% filter(if_any(c(), ~ is.na(.x)))
#> # A tibble: 5 × 3
#>      x1    x2     y
#>   <int> <dbl> <dbl>
#> 1     1    -1     1
#> 2     2     4     4
#> 3     3     5     2
#> 4     4     4     4
#> 5     5     1     9

# It seems more natural to return zero rows, like this workaround:
tbl %>% filter(!if_all(c(), ~ !is.na(.x)))
#> # A tibble: 0 × 3
#> # ℹ 3 variables: x1 <int>, x2 <dbl>, y <dbl>

# Debugging: checking what is directly output from `if_any`:
tbl %>% filter({tmp <- if_any(c(), ~ is.na(.x)); print(tmp); tmp}) %>% invisible()
#> [1] TRUE

# Less minimal, motivating example: I don't recall the original context where I
# originally encountered this. But we might imagine models where the user can
# manually specify certain variables for special treatment, or when variables
# are selected for special treatment automatically.

predict1 <- function(tbl, regular_predictors, rate_predictors, outcomes) {
  problematic_rows <- tbl %>%
    filter(if_any({{rate_predictors}}, ~ .x < 0))
  if (nrow(problematic_rows) != 0L) {
    stop(paste0(
      "Rate predictors should always be < 0; some problematic rows shown below:",
      paste(collapse = "\n", paste0("  ", capture.output(print(problematic_rows))))
    ))
  }
  return (42L)
}

# Output here is good:
predict1(tbl, x1, x2, y)
#> Error in predict1(tbl, x1, x2, y): Rate predictors should always be < 0; some problematic rows shown below:  # A tibble: 1 × 3
#>        x1    x2     y
#>     <int> <dbl> <dbl>
#>   1     1    -1     1

# Output here is bad; there shouldn't be a problem flagged:
predict1(tbl, c(x1, x2), c(), y)
#> Error in predict1(tbl, c(x1, x2), c(), y): Rate predictors should always be < 0; some problematic rows shown below:  # A tibble: 5 × 3
#>        x1    x2     y
#>     <int> <dbl> <dbl>
#>   1     1    -1     1
#>   2     2     4     4
#>   3     3     5     2
#>   4     4     4     4
#>   5     5     1     9

Created on 2024-07-26 with reprex v2.1.1

@DavisVaughan
Copy link
Member

At first glance, I'm fairly certain I agree for this reasoning alone

> any()
[1] FALSE
> all()
[1] TRUE

@DavisVaughan DavisVaughan added bug an unexpected problem or unintended behavior rows ↕️ Operations on rows: filter(), slice(), arrange() tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day labels Jul 27, 2024
@DavisVaughan
Copy link
Member

For TDD - I think the problem is here

dplyr/R/across.R

Lines 637 to 640 in 0005f67

# Select all rows if there are no inputs
if (!length(quos)) {
return(list(quo(TRUE)))
}

We unconditionally return TRUE when there are no inputs, but I think this should only be for if_all(). For if_any() I think this should return FALSE, in line with any() and all() as shown above.

You could create a new variable called empty up here where we branch based on the type of if function being used (i.e. you'd assign it TRUE or FALSE)

dplyr/R/across.R

Lines 618 to 624 in 0005f67

if (is_call(call, "if_any")) {
op <- "|"
if_fn <- "if_any"
} else {
op <- "&"
if_fn <- "if_all"
}

And then return that instead of just always returning TRUE

Don't forget to update the comment here

# Select all rows if there are no inputs

And add tests with both if_any() and if_all() with no inputs

jrwinget added a commit to jrwinget/dplyr that referenced this issue Aug 15, 2024
Fixes tidyverse#7059

This commit addresses the unexpected behavior of `if_any()` and `if_all()` when no columns are selected. Specifically, `if_any()` now returns `FALSE`, and `if_all()` returns `TRUE` in cases where no columns match the selection criteria.

Additionally, tests have been added to ensure the correct behavior of these functions in both `filter()` and `mutate()` contexts, including the scenarios highlighted by the `reprex` example. This ensures that empty selections behave consistently with the logical expectations.
jrwinget added a commit to jrwinget/dplyr that referenced this issue Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior rows ↕️ Operations on rows: filter(), slice(), arrange() tidy-dev-day 🤓 Tidyverse Developer Day rstd.io/tidy-dev-day
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants