Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add euclidean option and missing correction to bitwise.dist #176

Merged
merged 20 commits into from
Apr 3, 2018

Conversation

zkamvar
Copy link
Member

@zkamvar zkamvar commented Apr 1, 2018

To aid in the move towards getting genlight objects supported for AMOVA, I've implemented a Euclidean distance option as well as a correction for missing data, which mimics the default distance by dist().

The benefit here is that the speed and amount of memory used are greatly reduced with bitwise.dist() as compared to dist() due to the fact that it doesn't require conversion to an integer matrix:

suppressPackageStartupMessages(library("poppr"))
library("lineprof")
options(width = 100)

# Creating 50 samples with 100k SNPs with ~6% missing data
x <- new("genlight", matrix(sample(c(0:2, NA), 5000000, replace = TRUE, prob = c(1, 1, 1, 0.2)), nrow = 50), ploidy = 2)
x
#>  /// GENLIGHT OBJECT /////////
#> 
#>  // 50 genotypes,  100,000 binary SNPs, size: 2.5 Mb
#>  312468 (6.25 %) missing data
#> 
#>  // Basic content
#>    @gen: list of 50 SNPbin
#>    @ploidy: ploidy of each individual  (range: 2-2)
#> 
#>  // Optional content
#>    @other: a list containing: elements without names

# Memory --------------------------
lineprof(bd <- bitwise.dist(x, euclidean=TRUE, scale_missing=TRUE))
#> Reducing depth to 2 (from 5)
#>    time alloc release dups                             ref                      src
#> 1 0.001 0.113       0   53                 internal.r#2669 fix_uneven_diploid/%>%  
#> 2 0.247 0.028       0   21                         ".Call" .Call                   
#> 3 0.001 0.027       0    3 c("NA.posi", "standardGeneric") NA.posi/standardGeneric 
#> 4 0.001 0.017       0    2               "lazyLoadDBfetch" lazyLoadDBfetch         
#> 5 0.722 0.060       0    1                   bitwise.r#229 missing_correction/.Call
#> 6 0.001 0.000       0    3 c("as.dist", "as.dist.default") as.dist/as.dist.default
lineprof(dd <- dist(x))
#> Reducing depth to 2 (from 58)
#>    time  alloc release dups                                  ref                          src
#> 1 0.223 18.285  13.639 1605 c("as.matrix", "as.matrix.genlight") as.matrix/as.matrix.genlight
#> 2 0.736  4.768   2.392    6                         character(0)

# Speed ---------------------------
system.time(bd <- bitwise.dist(x, euclidean=TRUE, scale_missing=TRUE))
#>    user  system elapsed 
#>   1.534   0.028   0.484
system.time(dd <- dist(x))
#>    user  system elapsed 
#>   2.297   0.082   2.399

Created on 2018-04-01 by the reprex package (v0.2.0).

Things that still need to be done

  • change warnings_are_errors back to true in Travis

This does not work at the moment with bitwise.ia.
For some reason, it keeps crashing. This has
something to do with passing SEXP objects to C code
internally, but I'm not sure what is wrong here.
This is not seeming to work. Every time I run it, I get either a
crash or a C stack is too close to limit warning. This happens
regardless if I pass the Euclid parameter to bitwise diploid or
if I attempt to create it within association_index_diploid.
This is documented at the very end of
http://adv-r.had.co.nz/C-interface.html#c-vectors

These all create R-level objects, so they need to be PROTECT()ed.
This no longer works :(
Something about the shared parallelization was
screwing up the code. Unfortunately, I can't for
the life of me figure out why since the same
process happens with haploids.
It's 10x faster than the old routine
@zkamvar zkamvar merged commit 5176741 into master Apr 3, 2018
@zkamvar zkamvar deleted the euclidean-bitwise branch April 3, 2018 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant