Shouldn't fcase() recycle? #4258

sindribaldur · 2020-02-25T12:52:54Z

Would be nice if either of these would work

fcase(
  iris$Sepal.Length > 5, ">5",
  iris$Sepal.Length < 4, "<4",
  TRUE, as.character(iris$Sepal.Length)
)
# Results in an error


fcase(
  iris$Sepal.Length > 5, ">5",
  iris$Sepal.Length < 4, "<4",
  default = as.character(iris$Sepal.Length) # Length of 'default' must be 1.
)

My current solution (maybe I am missing some neater solution):

fcase(
  iris$Sepal.Length > 5, ">5",
  iris$Sepal.Length < 4, "<4",
  rep(TRUE, nrow(iris)), as.character(iris$Sepal.Length)
)

dplyr::case_when(
  iris$Sepal.Length > 5 ~ ">5",
  iris$Sepal.Length < 4 ~ "<4",
  TRUE ~  as.character(iris$Sepal.Length)
)

ColeMiller1 · 2020-02-27T00:10:04Z

Just a few SQL like alternatives:

## the uneven argument is the ELSE somewhat similar to Oracle DECODE()
fcase(
  iris$Sepal.Length > 5, ">5",
  iris$Sepal.Length < 4, "<4",
  as.character(iris$Sepal.Length)
)

## new special symbol similar to at the end of an actual case when statement
fcase(
  iris$Sepal.Length > 5, ">5",
  iris$Sepal.Length < 4, "<4",
  .ELSE, as.character(iris$Sepal.Length)
)

When I first started using dplyr from a SQL background, I kept finding it surprising the else was implemented as TRUE.

2005m · 2020-02-28T11:02:06Z

I can understand the need for a default vector, however in the example above i would have written the code as follows:

x = iris$Sepal.Length
fcase(
  x > 5, ">5",
  x < 4, "<4",
  x <= 5, as.character(x)
)

That avoids the overhead that is currently being implemented in the PR and mentionned by @jangorecki

2005m · 2020-02-28T11:10:04Z

It also raises the question. Do we want the same behaviour in fifelse ?

sindribaldur · 2020-02-28T11:28:15Z

@2005m my example was bad, the default or .ELSE condition can't always be stated so easily.

2005m · 2020-02-28T11:29:40Z

A good example would be great.

shrektan · 2020-02-28T11:32:31Z

I've already made the implementations that supports scalar condition and lazy-eval defaults in the PR above. Please take a look there.

sindribaldur · 2020-02-28T11:48:14Z

Here is a different example. fcase() by no means necessary, but makes for concise and clean-looking code.

DTmtcars[, rn := fcase(
                   rn %like% "^Merc", sub("^Merc", "Mercedes", rn),
                   rn == "Toyota Corona", paste0(rn, "s"),
                   ...
                   rep(TRUE, length(rn)), rn # else just the original vector
                 )]

knapply · 2020-04-26T16:49:51Z

Here's a simple example that may be helpful.

I have a vector of country names...

countries <- c("USA", "Britain", "Russian Federation", "Trinidad-Tobago", 
               "Bahamas", "Congo", "UAE", "Sao Tome", "Timor-Leste",
               "Canada", "Mexico")

... that I'd like to standardize for downstream tasks.

Let's say that I need to modify everything except for Canada and Mexico. Using dplyr::case_when(), I would do so like this...

dplyr::case_when(
  countries == "USA" ~ "United States of America",
  countries == "Britain" ~ "United Kingdom",
  countries == "Russian Federation" ~ "Russia",
  countries == "Trinidad-Tobago" ~ "Trinidad and Tobago",
  countries == "Bahamas" ~ "The Bahamas",
  countries == "Timor-Leste" ~ "East Timor",
  countries == "UAE" ~ "United Arab Emirates",
  countries == "Congo" ~ "Democratic Republic of the Congo",
  countries == "Sao Tome" ~ "Sao Tome and Principe",
  TRUE ~ countries
)
#>  [1] "United States of America"         "United Kingdom"                  
#>  [3] "Russia"                           "Trinidad and Tobago"             
#>  [5] "The Bahamas"                      "Democratic Republic of the Congo"
#>  [7] "United Arab Emirates"             "Sao Tome and Principe"           
#>  [9] "East Timor"                       "Canada"                          
#> [11] "Mexico"

but data.table::fcase() seems to require finding the values you want to leave as-is separately.

conditions <- c("USA", "Britain", "Russian Federation", "Trinidad-Tobago", 
                "Bahamas", "Congo", "UAE", "Sao Tome", "Timor-Leste")

(dont_modify <- !countries %in% conditions)
#>  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE


data.table::fcase(
  countries == "USA", "United States of America",
  countries == "Britain", "United Kingdom",
  countries == "Russian Federation", "Russia",
  countries == "Trinidad-Tobago", "Trinidad and Tobago",
  countries == "Bahamas", "The Bahamas",
  countries == "Timor-Leste", "East Timor",
  countries == "UAE", "United Arab Emirates",
  countries == "Congo", "Democratic Republic of the Congo",
  countries == "Sao Tome", "Sao Tome and Principe",
  dont_modify, countries
)
#>  [1] "United States of America"         "United Kingdom"                  
#>  [3] "Russia"                           "Trinidad and Tobago"             
#>  [5] "The Bahamas"                      "Democratic Republic of the Congo"
#>  [7] "United Arab Emirates"             "Sao Tome and Principe"           
#>  [9] "East Timor"                       "Canada"                          
#> [11] "Mexico"

installed.packages()["data.table", "Version"]
#> [1] "1.12.9"

jangorecki · 2020-04-26T20:16:59Z

@knapply example you presented is perfect example where lookup table should be prefered, it is much easier to maintain and cleaner to use. Then you do countries[countries_new_name, name := i.new_name, on="name"].

knapply · 2020-04-26T23:23:23Z

I think I must've oversimplified, so here's an expanded example that sticks with the country theme...

library(data.table)

countries <- data.table(
  name = c("Czech Republic", "Czecho-Slovakia", "Mexico", "Czech Republic", 
           "Canada", "Czechoslovakia", "USA", "Britain"),
  year = c(1918, 1990:1996)
); countries
#>               name year
#> 1:  Czech Republic 1918
#> 2: Czecho-Slovakia 1990
#> 3:          Mexico 1991
#> 4:  Czech Republic 1992
#> 5:          Canada 1993
#> 6:  Czechoslovakia 1994
#> 7:             USA 1995
#> 8:         Britain 1996

is_czech_name <- function(x) {
  x %chin% c("Czechoslovak Republic", "Czechoslovakia",
             "Czech Republic", "Czecho-Slovakia")
}

@jangorecki this kind of pattern-matching flexibility is more what I'm getting at.

countries[, name := dplyr::case_when(
  is_czech_name(name) & year <= 1938 ~ "Czechoslovak Republic",
  is_czech_name(name) & year %between% c(1939, 1992) ~ "Czechoslovakia",
  is_czech_name(name) & year >= 1993 ~ "Czech Republic",
  name == "USA" ~ "United States of America",
  name == "Britain" ~ "United Kingdom",
  TRUE ~ name
)]
#>                        name year
#> 1:    Czechoslovak Republic 1918
#> 2:           Czechoslovakia 1990
#> 3:                   Mexico 1991
#> 4:           Czechoslovakia 1992
#> 5:                   Canada 1993
#> 6:           Czech Republic 1994
#> 7: United States of America 1995
#> 8:           United Kingdom 1996

2005m · 2020-04-28T11:15:12Z

@jangorecki , countries[countries_new_name, name := i.new_name, on="name"] is nice, but :
-do you agree that fcase is much faster ? (there is actually a faster solution than fcase)
-how do you deal with missing values? (i.e. the default argument or TRUE) ?
-does your solution also work when the output values are vectors?
Thanks.

jangorecki · 2020-04-28T15:07:05Z

I have not idea if it is faster, but I would expect fcase to be slower. Unless we look at microseconds, then anything can be faster than [.data.table call. Update on join will use binary merge to match new values to rows, so it will scale well. fcase is lazy so can potentially avoid a lot of computation. There are many factors, but I think scaling up will be in favor of update-on-join.
If there is missing value, then the name is not being updated at all, so stays as it was.
counties_new_name needs to be a list/data.table

2005m · 2020-04-28T16:53:32Z

Yes you are right for large vector [.data.table is around 30% faster than fcase.
The fastest solution is a vectorised switch, which cut the time by half on a single thread.
Odd for missing values I get NA, so it does not stay as it was.

iago-pssjd · 2022-08-22T14:49:46Z

Is there any update on this? Also, the solution proposed in first comment

fcase(
  iris$Sepal.Length > 5, ">5",
  iris$Sepal.Length < 4, "<4",
  rep(TRUE, nrow(iris)), as.character(iris$Sepal.Length)
)

does not work to me since I am using fcase inside [.data.table and by = .() group with different group sizes, so I obtain

Argument #9 has a different length than argument #1. Please make sure all logical conditions have the same length.

Instead, it works replacing rep(TRUE, nrow(iris)) with rep(TRUE, .N)

davidbudzynski · 2023-06-01T21:15:33Z

Has there been any progress made on this issue? I can confirm that @2005m's nif works exactly as expected.

Fablepongiste · 2023-08-01T12:12:40Z

This should really be looked into, it is a classical issue and for once dplyr way with case_when so much nicer.

shrektan mentioned this issue Feb 27, 2020

fcase support scalar condition, vectorize default and lazy-eval default #4264

Merged

This comment was marked as off-topic.

Sign in to view

MichaelChirico mentioned this issue May 14, 2020

Master list of most-requested issues #3189

Open

76 tasks

MichaelChirico added the High label May 30, 2020

jangorecki removed the High label Jun 3, 2020

This comment has been minimized.

Sign in to view

MichaelChirico mentioned this issue Jul 4, 2020

Question regarding fcase #4593

Closed

r2evans mentioned this issue Aug 12, 2020

data.table::fcase with vectorized default markfairbanks/tidytable#110

Closed

jangorecki added this to the 1.13.5 milestone Nov 23, 2020

markfairbanks mentioned this issue Feb 17, 2021

Enable case_when(TRUE ~ default) syntax tidyverse/dtplyr#191

Merged

jangorecki modified the milestones: 1.14.3, 1.14.5 Jul 19, 2022

davidbudzynski mentioned this issue Jun 1, 2023

fix typo in funique.Rd 2005m/kit#37

Merged

jangorecki modified the milestones: 1.14.11, 1.15.1 Oct 29, 2023

MichaelChirico added the top request One of our most-requested issues label Apr 14, 2024

MichaelChirico closed this as completed in #4264 Jul 29, 2024

DavisVaughan mentioned this issue Aug 5, 2024

Consider reverting fcase recycled conditions #6352

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shouldn't fcase() recycle? #4258

Shouldn't fcase() recycle? #4258

sindribaldur commented Feb 25, 2020

ColeMiller1 commented Feb 27, 2020

2005m commented Feb 28, 2020

2005m commented Feb 28, 2020 •

edited

Loading

sindribaldur commented Feb 28, 2020

2005m commented Feb 28, 2020

shrektan commented Feb 28, 2020

sindribaldur commented Feb 28, 2020

knapply commented Apr 26, 2020 •

edited

Loading

jangorecki commented Apr 26, 2020

knapply commented Apr 26, 2020

2005m commented Apr 28, 2020

jangorecki commented Apr 28, 2020

2005m commented Apr 28, 2020 •

edited

Loading

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

iago-pssjd commented Aug 22, 2022

davidbudzynski commented Jun 1, 2023

Fablepongiste commented Aug 1, 2023

Shouldn't fcase() recycle? #4258

Shouldn't fcase() recycle? #4258

Comments

sindribaldur commented Feb 25, 2020

ColeMiller1 commented Feb 27, 2020

2005m commented Feb 28, 2020

2005m commented Feb 28, 2020 • edited Loading

sindribaldur commented Feb 28, 2020

2005m commented Feb 28, 2020

shrektan commented Feb 28, 2020

sindribaldur commented Feb 28, 2020

knapply commented Apr 26, 2020 • edited Loading

jangorecki commented Apr 26, 2020

knapply commented Apr 26, 2020

2005m commented Apr 28, 2020

jangorecki commented Apr 28, 2020

2005m commented Apr 28, 2020 • edited Loading

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

iago-pssjd commented Aug 22, 2022

davidbudzynski commented Jun 1, 2023

Fablepongiste commented Aug 1, 2023

2005m commented Feb 28, 2020 •

edited

Loading

knapply commented Apr 26, 2020 •

edited

Loading

2005m commented Apr 28, 2020 •

edited

Loading