-
Notifications
You must be signed in to change notification settings - Fork 977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shouldn't fcase() recycle? #4258
Comments
Just a few SQL like alternatives: ## the uneven argument is the ELSE somewhat similar to Oracle DECODE()
fcase(
iris$Sepal.Length > 5, ">5",
iris$Sepal.Length < 4, "<4",
as.character(iris$Sepal.Length)
)
## new special symbol similar to at the end of an actual case when statement
fcase(
iris$Sepal.Length > 5, ">5",
iris$Sepal.Length < 4, "<4",
.ELSE, as.character(iris$Sepal.Length)
) When I first started using dplyr from a SQL background, I kept finding it surprising the else was implemented as TRUE. |
I can understand the need for a default vector, however in the example above i would have written the code as follows: x = iris$Sepal.Length
fcase(
x > 5, ">5",
x < 4, "<4",
x <= 5, as.character(x)
) That avoids the overhead that is currently being implemented in the PR and mentionned by @jangorecki |
It also raises the question. Do we want the same behaviour in |
@2005m my example was bad, the default or |
A good example would be great. |
I've already made the implementations that supports scalar condition and lazy-eval defaults in the PR above. Please take a look there. |
Here is a different example.
|
Here's a simple example that may be helpful. I have a vector of country names... countries <- c("USA", "Britain", "Russian Federation", "Trinidad-Tobago",
"Bahamas", "Congo", "UAE", "Sao Tome", "Timor-Leste",
"Canada", "Mexico") ... that I'd like to standardize for downstream tasks. Let's say that I need to modify everything except for Canada and Mexico. Using dplyr::case_when(
countries == "USA" ~ "United States of America",
countries == "Britain" ~ "United Kingdom",
countries == "Russian Federation" ~ "Russia",
countries == "Trinidad-Tobago" ~ "Trinidad and Tobago",
countries == "Bahamas" ~ "The Bahamas",
countries == "Timor-Leste" ~ "East Timor",
countries == "UAE" ~ "United Arab Emirates",
countries == "Congo" ~ "Democratic Republic of the Congo",
countries == "Sao Tome" ~ "Sao Tome and Principe",
TRUE ~ countries
)
#> [1] "United States of America" "United Kingdom"
#> [3] "Russia" "Trinidad and Tobago"
#> [5] "The Bahamas" "Democratic Republic of the Congo"
#> [7] "United Arab Emirates" "Sao Tome and Principe"
#> [9] "East Timor" "Canada"
#> [11] "Mexico" but conditions <- c("USA", "Britain", "Russian Federation", "Trinidad-Tobago",
"Bahamas", "Congo", "UAE", "Sao Tome", "Timor-Leste")
(dont_modify <- !countries %in% conditions)
#> [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
data.table::fcase(
countries == "USA", "United States of America",
countries == "Britain", "United Kingdom",
countries == "Russian Federation", "Russia",
countries == "Trinidad-Tobago", "Trinidad and Tobago",
countries == "Bahamas", "The Bahamas",
countries == "Timor-Leste", "East Timor",
countries == "UAE", "United Arab Emirates",
countries == "Congo", "Democratic Republic of the Congo",
countries == "Sao Tome", "Sao Tome and Principe",
dont_modify, countries
)
#> [1] "United States of America" "United Kingdom"
#> [3] "Russia" "Trinidad and Tobago"
#> [5] "The Bahamas" "Democratic Republic of the Congo"
#> [7] "United Arab Emirates" "Sao Tome and Principe"
#> [9] "East Timor" "Canada"
#> [11] "Mexico" installed.packages()["data.table", "Version"]
#> [1] "1.12.9" |
@knapply example you presented is perfect example where lookup table should be prefered, it is much easier to maintain and cleaner to use. Then you do |
I think I must've oversimplified, so here's an expanded example that sticks with the country theme... library(data.table)
countries <- data.table(
name = c("Czech Republic", "Czecho-Slovakia", "Mexico", "Czech Republic",
"Canada", "Czechoslovakia", "USA", "Britain"),
year = c(1918, 1990:1996)
); countries
#> name year
#> 1: Czech Republic 1918
#> 2: Czecho-Slovakia 1990
#> 3: Mexico 1991
#> 4: Czech Republic 1992
#> 5: Canada 1993
#> 6: Czechoslovakia 1994
#> 7: USA 1995
#> 8: Britain 1996
is_czech_name <- function(x) {
x %chin% c("Czechoslovak Republic", "Czechoslovakia",
"Czech Republic", "Czecho-Slovakia")
} @jangorecki this kind of pattern-matching flexibility is more what I'm getting at. countries[, name := dplyr::case_when(
is_czech_name(name) & year <= 1938 ~ "Czechoslovak Republic",
is_czech_name(name) & year %between% c(1939, 1992) ~ "Czechoslovakia",
is_czech_name(name) & year >= 1993 ~ "Czech Republic",
name == "USA" ~ "United States of America",
name == "Britain" ~ "United Kingdom",
TRUE ~ name
)]
#> name year
#> 1: Czechoslovak Republic 1918
#> 2: Czechoslovakia 1990
#> 3: Mexico 1991
#> 4: Czechoslovakia 1992
#> 5: Canada 1993
#> 6: Czech Republic 1994
#> 7: United States of America 1995
#> 8: United Kingdom 1996 |
@jangorecki , |
|
Yes you are right for large vector |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Is there any update on this? Also, the solution proposed in first comment
does not work to me since I am using
Instead, it works replacing |
Has there been any progress made on this issue? I can confirm that @2005m's |
This should really be looked into, it is a classical issue and for once dplyr way with case_when so much nicer. |
Would be nice if either of these would work
My current solution (maybe I am missing some neater solution):
The text was updated successfully, but these errors were encountered: