Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gbff files cannot be imported #118

Open
Rahimlou opened this issue Mar 29, 2022 · 5 comments
Open

gbff files cannot be imported #118

Rahimlou opened this issue Mar 29, 2022 · 5 comments
Labels
bug Something isn't working Windows

Comments

@Rahimlou
Copy link

ecoli_genes <- read_feats("GCA_009738455.1_ASM973845v1_genomic.gbff")
ecoli_genes <- read_gbk("GCA_009738455.1_ASM973845v1_genomic.gbff")

The above functions are not working. Return the following error:
Harmonizing attribute names
Error in left_join():
! Can't join on x$feat_id x y$feat_id because of incompatible types.
i x$feat_id is of type logical.
i y$feat_id is of type character.

@thackl
Copy link
Owner

thackl commented Mar 30, 2022

I cannot recreate your error on my system (see below). Can you install the latest version of gggenomes? And what OS are you on? Could you send me the output of your sessionInfo()?

> read_gbk("~/Downloads/ncbi-genomes-2022-03-30/GCF_009738455.1_ASM973845v1_genomic.gbff")
writing directives
writing features
Harmonizing attribute names                                                   
• ID -> feat_id
• Dbxref -> dbxref
• Parent -> parent_ids
• Name -> name
• EC_number -> ec_number
• Alias -> alias
• ncRNA_class -> nc_rna_class
• seq:cat) -> seq_cat
• seq:gag) -> seq_gag
• seq:ggt) -> seq_ggt
• seq:tgc) -> seq_tgc
• seq:gat) -> seq_gat
• seq:cgg) -> seq_cgg
• seq:gaa) -> seq_gaa
• seq:cag) -> seq_cag
• seq:ctg) -> seq_ctg
• seq:ttg) -> seq_ttg
• seq:tag) -> seq_tag
• seq:gga) -> seq_gga
• seq:tga) -> seq_tga
• seq:gta) -> seq_gta
• seq:tct) -> seq_tct
• seq:tcg) -> seq_tcg
• seq:taa) -> seq_taa
• seq:gca) -> seq_gca
• seq:gcc) -> seq_gcc
• seq:cga) -> seq_cga
• seq:gtt) -> seq_gtt
Features read
# A tibble: 10 × 3
   source type              n
   <chr>  <chr>         <int>
 1 NA     CDS            5453
 2 NA     gene           5589
 3 NA     misc_feature     30
 4 NA     ncRNA             7
 5 NA     region            2
 6 NA     regulatory        6
 7 NA     repeat_region     1
 8 NA     rRNA             22
 9 NA     tmRNA             1
10 NA     tRNA            106
# A tibble: 11,217 × 65
   seq_id      start    end strand type  feat_id introns parent_ids source score
   <chr>       <int>  <int> <chr>  <chr> <chr>   <list>  <list>     <chr>  <chr>
 1 NZ_CP046527     1 5.51e6 +      regi… region… <NULL>  <chr [1]>  NA     NA   
 2 NZ_CP046527     1 5.51e6 -      gene  gene-G… <int>   <chr [1]>  NA     NA   
 3 NZ_CP046527     1 5.51e6 -      CDS   cds-GN… <dbl>   <chr [1]>  NA     NA   
 4 NZ_CP046527   382 5.28e2 -      gene  gene-G… <int>   <chr [1]>  NA     NA   
 5 NZ_CP046527   382 5.28e2 -      CDS   cds-GN… <NULL>  <chr [1]>  NA     NA   
 6 NZ_CP046527   528 1.10e3 -      gene  gene-G… <int>   <chr [1]>  NA     NA   
 7 NZ_CP046527   528 1.10e3 -      CDS   cds-GN… <NULL>  <chr [1]>  NA     NA   
 8 NZ_CP046527  1368 1.90e3 -      gene  gene-G… <int>   <chr [1]>  NA     NA   
 9 NZ_CP046527  1368 1.90e3 -      CDS   cds-GN… <NULL>  <chr [1]>  NA     NA   
10 NZ_CP046527  1906 2.12e3 -      gene  gene-G… <int>   <chr [1]>  NA     NA   
# … with 11,207 more rows, and 55 more variables: phase <chr>, name <chr>,
#   dbxref <chr>, collection_date <chr>, mol_type <chr>, serotype <chr>,
#   strain <chr>, organism <chr>, country <chr>, isolation_source <chr>,
#   collected_by <chr>, pseudo <chr>, locus_tag <chr>, inference <chr>,
#   transl_table <chr>, product <chr>, note <chr>, old_locus_tag <chr>,
#   protein_id <chr>, anticodon <chr>, ribosomal_slippage <chr>,
#   ec_number <chr>, alias <chr>, rpt_family <chr>, rpt_type <chr>, …

@Rahimlou
Copy link
Author

I installed v. 0.9.5.9000.
I'm using Windows 10 x64

the sessionInfo():

R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] TreeTools_1.7.1 tidytree_0.3.9 castor_1.7.2 Rcpp_1.0.8 phytools_1.0-1
[6] maps_3.4.0 S4Vectors_0.32.3 BiocGenerics_0.40.0 picante_1.8.2 nlme_3.1-153
[11] vegan_2.5-7 lattice_0.20-45 permute_0.9-7 ape_5.6-2 forcats_0.5.1
[16] tidyverse_1.3.1 gggenomes_0.9.5.9000 snakecase_0.11.0 jsonlite_1.8.0 tibble_3.1.6
[21] thacklr_0.0.0.9000 tidyr_1.2.0 stringr_1.4.0 readr_2.1.2 purrr_0.3.4
[26] gggenes_0.4.1 ggplot2_3.3.5 dplyr_1.0.8

loaded via a namespace (and not attached):
[1] colorspace_2.0-3 ellipsis_0.3.2 rprojroot_2.0.2 fs_1.5.2
[5] rstudioapi_0.13 farver_2.1.0 remotes_2.4.2 ggfittext_0.9.1
[9] bit64_4.0.5 RSpectra_0.16-0 fansi_1.0.2 lubridate_1.8.0
[13] xml2_1.3.3 R.methodsS3_1.8.1 codetools_0.2-18 splines_4.1.2
[17] mnormt_2.0.2 cachem_1.0.6 pkgload_1.2.4 broom_0.7.12
[21] cluster_2.1.2 dbplyr_2.1.1 R.oo_1.24.0 compiler_4.1.2
[25] httr_1.4.2 backports_1.4.1 lazyeval_0.2.2 assertthat_0.2.1
[29] Matrix_1.3-4 fastmap_1.1.0 cli_3.2.0 prettyunits_1.1.1
[33] tools_4.1.2 igraph_1.2.11 coda_0.19-4 gtable_0.3.0
[37] glue_1.6.2 clusterGeneration_1.3.7 fastmatch_1.1-3 cellranger_1.1.0
[41] vctrs_0.3.8 rbibutils_2.2.7 ps_1.6.0 brio_1.1.3
[45] testthat_3.1.2 rvest_1.0.2 lifecycle_1.0.1 phangorn_2.8.1
[49] devtools_2.4.3 MASS_7.3-54 scales_1.1.1 vroom_1.5.7
[53] hms_1.1.1 parallel_4.1.2 expm_0.999-6 RColorBrewer_1.1-2
[57] curl_4.3.2 memoise_2.0.1 yulab.utils_0.0.4 naturalsort_0.1.3
[61] stringi_1.7.6 desc_1.4.1 plotrix_3.8-2 pkgbuild_1.3.1
[65] Rdpack_2.3 rlang_1.0.2 pkgconfig_2.0.3 labeling_0.4.2
[69] bit_4.0.4 processx_3.5.2 tidyselect_1.1.2 magrittr_2.0.2
[73] R6_2.5.1 IRanges_2.28.0 generics_0.1.2 combinat_0.0-8
[77] DBI_1.1.2 pillar_1.7.0 haven_2.4.3 withr_2.5.0
[81] mgcv_1.8-38 scatterplot3d_0.3-41 modelr_0.1.8 crayon_1.5.0
[85] utf8_1.2.2 tmvnsim_1.0-2 tzdb_0.2.0 usethis_2.1.5
[89] grid_4.1.2 readxl_1.3.1 callr_3.7.0 reprex_2.0.1
[93] digest_0.6.29 R.cache_0.15.0 numDeriv_2016.8-1.1 R.utils_2.11.0
[97] munsell_0.5.0 sessioninfo_1.2.2 quadprog_1.5-8

@Rahimlou
Copy link
Author

I also get 10 warning messages when loading "gggenomes":

Warning messages:
1: replacing previous import ‘purrr::invoke’ by ‘rlang::invoke’ when loading ‘gggenomes’
2: replacing previous import ‘purrr::flatten_raw’ by ‘rlang::flatten_raw’ when loading ‘gggenomes’
3: replacing previous import ‘purrr::as_function’ by ‘rlang::as_function’ when loading ‘gggenomes’
4: replacing previous import ‘purrr::flatten_dbl’ by ‘rlang::flatten_dbl’ when loading ‘gggenomes’
5: replacing previous import ‘purrr::flatten_lgl’ by ‘rlang::flatten_lgl’ when loading ‘gggenomes’
6: replacing previous import ‘purrr::flatten_int’ by ‘rlang::flatten_int’ when loading ‘gggenomes’
7: replacing previous import ‘purrr::%@%’ by ‘rlang::%@%’ when loading ‘gggenomes’
8: replacing previous import ‘purrr::flatten_chr’ by ‘rlang::flatten_chr’ when loading ‘gggenomes’
9: replacing previous import ‘purrr::splice’ by ‘rlang::splice’ when loading ‘gggenomes’
10: replacing previous import ‘purrr::flatten’ by ‘rlang::flatten’ when loading ‘gggenomes’

@Rahimlou
Copy link
Author

Rahimlou commented Apr 1, 2022

The problem is running R on Windows. I tried to run R on Linux using the docker container and the function read_gbk() worked well.

@thackl
Copy link
Owner

thackl commented Apr 1, 2022

Thanks for the info. I haven't enough time to take the package out on Windows...

@thackl thackl added bug Something isn't working Windows labels Oct 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Windows
Projects
None yet
Development

No branches or pull requests

2 participants