Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pivot_longer() on unbalanced wide data doesn't properly populate NAs when using ".value" in names_to #229

Closed
markfairbanks opened this issue Mar 22, 2021 · 3 comments

Comments

@markfairbanks
Copy link
Collaborator

markfairbanks commented Mar 22, 2021

library(dtplyr)
library(dplyr)
library(tidyr)

test_df <- tibble(x_5 = 5, x_6 = 6, y_7 = 7, y_8 = 8)

# Actual
lazy_dt(test_df) %>%
  pivot_longer(everything(), names_to = c(".value", "id"), names_sep = "_") %>%
  collect()
#> # A tibble: 2 x 3
#>   id        x     y
#>   <chr> <dbl> <dbl>
#> 1 1         5     7
#> 2 2         6     8

# Expected
test_df %>%
  pivot_longer(everything(), names_to = c(".value", "id"), names_sep = "_")
#> # A tibble: 4 x 3
#>   id        x     y
#>   <chr> <dbl> <dbl>
#> 1 5         5    NA
#> 2 6         6    NA
#> 3 7        NA     7
#> 4 8        NA     8

This one is a bit more difficult - there is an open data.table issue regarding this. There is a workaround outlined in that issue that basically requires doing the data.table versions of pivot_longer() %>% separate() %>% pivot_wider(). It would require a much bigger rewrite of pivot_longer.dtplyr_step() internally.

@hadley Do you want me to code out this workaround? Or is this something you would rather have throw an error as "not possible" in data.table::melt()? I think short term I can code in the "unbalanced data" error along with the fix to #228.

@hadley
Copy link
Member

hadley commented Mar 22, 2021

Yeah, I think it's fine for this to error.

@tdhock
Copy link

tdhock commented Jan 30, 2024

hi! this feature is supported in the new version of data.table that was released to CRAN today,

> melt(data.table(test_df), measure.vars=measure(value.name, id, sep="_"))
       id     x     y
   <char> <num> <num>
1:      5     5    NA
2:      6     6    NA
3:      7    NA     7
4:      8    NA     8
> melt(data.table(test_df), measure.vars=measurev(list(value.name=NULL, id=NULL), sep="_"))
       id     x     y
   <char> <num> <num>
1:      5     5    NA
2:      6     6    NA
3:      7    NA     7
4:      8    NA     8
> packageVersion("data.table")
[1] '1.15.0'

@markfairbanks
Copy link
Collaborator Author

Thanks @tdhock I'll revisit this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants