Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error The provided value was not a valid string reference. when missing values in long character vector #760

Open
etiennebacher opened this issue Sep 10, 2024 · 1 comment

Comments

@etiennebacher
Copy link

etiennebacher commented Sep 10, 2024

#584 added strL usage when a string is longer than 2045 characters:

  • write_dta() now uses strL when strings are too long to be stored in an str# variable (write_dta support for long string (strL)? #437). strL is used when strings are longer than 2045 characters by default, which matches Stata's behaviour, but this can be reduced with the strl_threshold argument.

This works for a character vector without missing values:

library(haven)
dest <- tempfile(fileext = ".dta")
dat <- data.frame(x = c("a", strrep("a", 2046)))
write_dta(dat, dest)

However it errors when there is a missing value in the character vector:

library(haven)
packageVersion("haven")
#> [1] '2.5.4.9000'

dest <- tempfile(fileext = ".dta")
dat <- data.frame(x = c("a", NA, strrep("a", 2046)))
                             ### <--------------------------------added a missing value
write_dta(dat, dest)
#> Error: Failed to insert value [2, 1]: The provided value was not a valid string reference.
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 RC (2024-06-06 r86715 ucrt)
#>  os       Windows 10 x64 (build 19045)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_Europe.utf8
#>  ctype    English_Europe.utf8
#>  tz       Europe/Paris
#>  date     2024-09-18
#>  pandoc   3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date (UTC) lib source
#>  cli           3.6.3      2024-06-21 [1] CRAN (R 4.4.1)
#>  digest        0.6.37     2024-08-19 [1] CRAN (R 4.4.1)
#>  evaluate      0.24.0     2024-06-10 [1] CRAN (R 4.4.1)
#>  fansi         1.0.6      2023-12-08 [1] CRAN (R 4.4.0)
#>  fastmap       1.2.0      2024-05-15 [1] CRAN (R 4.4.0)
#>  forcats       1.0.0      2023-01-29 [1] CRAN (R 4.4.0)
#>  fs            1.6.4      2024-04-25 [1] CRAN (R 4.4.0)
#>  glue          1.7.0      2024-01-09 [1] CRAN (R 4.4.0)
#>  haven       * 2.5.4.9000 2024-09-18 [1] Github (tidyverse/haven@73c6d46)
#>  hms           1.1.3      2023-03-21 [1] CRAN (R 4.4.0)
#>  htmltools     0.5.8.1    2024-04-04 [1] CRAN (R 4.4.0)
#>  knitr         1.48       2024-07-07 [1] CRAN (R 4.4.1)
#>  lifecycle     1.0.4      2023-11-07 [1] CRAN (R 4.4.0)
#>  magrittr      2.0.3      2022-03-30 [1] CRAN (R 4.4.0)
#>  pillar        1.9.0      2023-03-22 [1] CRAN (R 4.4.0)
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.4.0)
#>  reprex        2.1.1      2024-07-06 [1] CRAN (R 4.4.1)
#>  rlang         1.1.4      2024-06-14 [1] Github (r-lib/rlang@ae699d1)
#>  rmarkdown     2.28       2024-08-17 [1] CRAN (R 4.4.1)
#>  rstudioapi    0.16.0     2024-03-24 [1] CRAN (R 4.4.0)
#>  sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.4.0)
#>  tibble        3.2.1      2023-03-20 [1] CRAN (R 4.4.0)
#>  utf8          1.2.4      2023-10-22 [1] CRAN (R 4.4.0)
#>  vctrs         0.6.5      2023-12-01 [1] CRAN (R 4.4.0)
#>  withr         3.0.1      2024-07-31 [1] CRAN (R 4.4.1)
#>  xfun          0.47       2024-08-17 [1] CRAN (R 4.4.1)
#>  yaml          2.3.10     2024-07-26 [1] CRAN (R 4.4.1)
#> 
#>  [1] C:/Users/etienne/AppData/Local/Programs/R/R-4.4.1rc/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@aaronrudkin
Copy link

This is happening for me as well. As an intermediate fix, since NAs are not significant for me, I just do a variant of mutate(dat, across(where(is.character), ~ replace_na(.x, "")))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants