Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change win.ia algorithm #181

Merged
merged 15 commits into from
May 2, 2018
Merged
31 changes: 31 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
@@ -1,6 +1,33 @@
poppr 2.8.0
===========

BUG FIX
-------

* `win.ia()` now has more consistent behavior with chromosome structure and will
no longer result in an integer overflow
(see https://github.com/grunwaldlab/poppr/issues/179). Thanks to @MarisaMiller
for the detailed bug report.

ALGORITHMIC CHANGE
------------------

* `win.ia()` may result in slightly different results because of two changes:
1. The windows will now always start at position one on any given chromosome.
This will result in some windows at the beginning of chromosomes having a
value of `NA` if the first variant starts beyond the first window.
2. Windows are now calculated for each chromosome independently. The previous
version first concatenated chromosomes with at least a window-sized gap
between the chromosomes, but failed to ensure that the window always started
at the beginning of the chromosome. This version fixes that issue.
(see https://github.com/grunwaldlab/poppr/issues/179).

DEPRECATION
-----------

* The `chromosome_buffer` argument for `win.ia()` has been permanently set to
`TRUE` and deprecated as it is no longer used.

NEW FEATURES
------------

Expand All @@ -17,6 +44,10 @@ NEW FEATURES
* `genind2genalex()` now has an `overwrite` parameter set to `FALSE` to prevent
accidental overwriting of files.

* `win.ia()` has a new argument `name_window`, which will give each element in
the result the designation of the terminal position of that window. Thanks to
@MarisaMiller for the suggestion!

poppr 2.7.1
============

Expand Down
153 changes: 82 additions & 71 deletions R/bitwise.r
Original file line number Diff line number Diff line change
Expand Up @@ -318,41 +318,41 @@ bitwise.ia <- function(x, missing_match=TRUE, differences_only=FALSE, threads=0)
#' function will scan windows across the loci positions and calculate the index
#' of association.
#'
#' @param x a \code{\link{genlight}} or \code{\link{snpclone}} object.
#' @param x a [genlight][genlight-class] or [snpclone][snpclone-class] object.
#'
#' @param window an integer specifying the size of the window.
#'
#' @param min.snps an integer specifying the minimum number of snps allowed per
#' window. If a window does not meet this criteria, the value will return as
#' NA.
#' `NA`.
#'
#' @param threads The maximum number of parallel threads to be used within this
#' function. A value of 0 (default) will attempt to use as many threads as
#' there are available cores/CPUs. In most cases this is ideal. A value of 1
#' will force the function to run serially, which may increase stability on
#' some systems. Other values may be specified, but should be used with
#' caution.
#' function. Defaults to 1 thread, in which the function will run serially. A
#' value of 0 will attempt to use as many threads as there are available
#' cores/CPUs. In most cases this is ideal for speed. Note: this option is
#' passed to [bitwise.ia()] and does not parallelize the windowization process.
#'
#' @param quiet if \code{FALSE}, a progress bar will be printed to the screen.
#' @param quiet if `FALSE` (default), a progress bar will be printed to the screen.
#'
#' @param name_window if `TRUE` (default), the result vector will be named with
#' the terminal position of the window. In the case where several chromosomes
#' are represented, the position will be appended using a period/full stop.
#'
#' @param chromosome_buffer if \code{TRUE} (default), buffers will be placed
#' @param chromosome_buffer *DEPRECATED* if `TRUE` (default), buffers will be placed
#' between adjacent chromosomal positions to prevent windows from spanning two
#' chromosomes.
#'
#' @return Index of association representing the samples in this genlight
#' object.
#' @return A value of the standardized index of association for all windows in
#' each chromosome.
#'
#' @note this will calculate the standardized index of association from Agapow
#' 2001. See \code{\link{ia}} for details.
#' and Burt, 2001. See [ia()] for details.
#'
#' @author Zhian N. Kamvar, Jonah C. Brooks
#'
#'
#' @md
#' @export
#' @seealso \code{\link[adegenet]{genlight}},
#' \code{\link{snpclone}},
#' \code{\link{samp.ia}},
#' \code{\link{ia}},
#' \code{\link{bitwise.dist}}
#' @seealso [genlight][genlight-class], [snpclone][snpclone-class], [ia()], [samp.ia()], [bitwise.dist()]
#' @examples
#'
#' # with structured snps assuming 1e4 positions
Expand Down Expand Up @@ -381,93 +381,104 @@ bitwise.ia <- function(x, missing_match=TRUE, differences_only=FALSE, threads=0)
#'
#' # Converting chromosomal coordinates to tidy data
#' library("dplyr")
#' library("tidyr")
#' res_tidy <- res %>%
#' data_frame(rd = ., chromosome = names(.)) %>% # create two column data frame
#' filter(chromosome != "") %>% # filter out null chromosomes
#' group_by(chromosome) %>% # group data by chromosome
#' mutate(window = row_number()) %>% # windows by chromosome
#' ungroup(chromosome) %>% # ungroup and reorder
#' mutate(chromosome = factor(chromosome, unique(chromosome)))
#' separate(chromosome, into = c("chromosome", "position")) %>% # get the position info
#' mutate(position = as.integer(position)) %>% # force position as integers
#' mutate(chromosome = factor(chromosome, unique(chromosome))) # force order chromosomes
#' res_tidy
#'
#' # Plotting with ggplot2
#' library("ggplot2")
#' ggplot(res_tidy, aes(x = window, y = rd, color = chromosome)) +
#' ggplot(res_tidy, aes(x = position, y = rd, color = chromosome)) +
#' geom_line() +
#' facet_wrap(~chromosome, nrow = 1) +
#' ylab(expression(bar(r)[d])) +
#' xlab("window (100bp)") +
#' theme(legend.position = "bottom")
#' xlab("terminal position of sliding window") +
#' labs(caption = "window size: 100bp") +
#' theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
#' theme(legend.position = "top")
#'
#' }
#'
#==============================================================================#
win.ia <- function(x, window = 100L, min.snps = 3L, threads = 1L, quiet = FALSE,
chromosome_buffer = TRUE){
name_window = TRUE, chromosome_buffer = TRUE){
stopifnot(is(x, "genlight"))
if (is.null(position(x))){
if (is.null(position(x))) {
position(x) <- seq(nLoc(x))
}
if (!chromosome_buffer) {
msg <- paste("The argument `chromosome_buffer` has been deprecated as of",
"poppr version 1.8.0. All chromosomes are treated separately",
"by default.")
warning(msg, immediate. = TRUE)
}
chromos <- !is.null(chromosome(x))
if (chromos){
CHROM <- chromosome(x)
x <- adjust_position(x, chromosome_buffer, window)
xpos <- position(x)
quiet <- should_poppr_be_quiet(quiet)
winmat <- make_windows(maxp = max(xpos), minp = 1L, window = window)
if (chromos) {
# Converting to character is necessary to avoid empty chromosomes.
# See: https://twitter.com/ZKamvar/status/991114778415325184
CHROM <- as.character(chromosome(x))
chrom_names <- unique(CHROM)
pos_per_chrom <- split(xpos, CHROM)[chrom_names]
win_per_chrom <- ceiling(vapply(pos_per_chrom, max, integer(1))/window)
names(win_per_chrom) <- chrom_names
nwin <- sum(win_per_chrom)
nchrom <- length(win_per_chrom) -> chromosomes_left
} else {
if (any(duplicated(position(x)))){
if (any(duplicated(position(x)))) {
msg <- paste("There are duplicate positions in the data without any",
"chromosome structure. All positions must be unique.\n\n",
"Please the function chromosome() to add chromosome",
"coordinates or modify the positions.")
stop(msg, call. = FALSE)
}
nwin <- nrow(winmat)
chromosomes_left <- 1L
}
xpos <- position(x)
quiet <- should_poppr_be_quiet(quiet)
winmat <- make_windows(maxp = max(xpos), minp = min(xpos), window = window)
nwin <- nrow(winmat)
res_mat <- vector(mode = "numeric", length = nwin)
if (chromos) res_names <- vector(mode = "character", length = nwin)
res_counter <- 1L
if (name_window || chromos) res_names <- vector(mode = "character", length = nwin)
if (!quiet) progbar <- txtProgressBar(style = 3)
for (i in seq(nwin)){
posns <- which(xpos %in% winmat[i, 1]:winmat[i, 2])
last_pos <- posns[length(posns)]
if (length(posns) < min.snps){
res_mat[i] <- NA
} else {
res_mat[i] <- bitwise.ia(x[, posns], threads = threads)
}
if (chromos && !is.na(last_pos) && length(last_pos) > 0){
res_names[i] <- CHROM[last_pos]
}
if (!quiet){
setTxtProgressBar(progbar, i/nwin)
while (chromosomes_left > 0L) {
chrom_counter <- if (chromos) nchrom - chromosomes_left + 1L else 1L
current_windows <- if (chromos) win_per_chrom[chrom_counter] else nwin
for (i in seq(current_windows)) {
# Define the window
the_window <- winmat[i, 1]:winmat[i, 2]
the_chrom <- if (chromos) chrom_names[chrom_counter] else TRUE
posns <- xpos %in% the_window
# If there is chromosome structure, then add the current chromosome as an
# additional constraint to the snps analyzed
j <- if (chromos) posns & CHROM == the_chrom else posns

# Check to make sure the SNP threshold is met. If not, set to NA
if (sum(j) < min.snps) {
res_mat[res_counter] <- NA_real_
} else {
res_mat[res_counter] <- bitwise.ia(x[, j], threads = threads)
}
if (name_window || chromos) {
the_name <- if (chromos) paste(the_chrom, winmat[i, 2], sep = ".") else as.character(winmat[i, 2])
res_names[res_counter] <- the_name
}
if (!quiet) {
setTxtProgressBar(progbar, res_counter/nwin)
}
res_counter <- res_counter + 1L
}
# Decrement the number of chromosomes left to ensure the while loop can exit.
chromosomes_left <- chromosomes_left - 1L
}
if (!quiet) cat("\n")
if (chromos) names(res_mat) <- res_names
if (name_window || chromos) names(res_mat) <- res_names
return(res_mat)
}


adjust_position <- function(x, chromosome_buffer = TRUE, window){
xpos <- position(x)
# Each chromosome has it's own position.
# In this case, get large round number for each chromosome break.
maxp <- 10^ceiling(log(max(xpos), 10))
# The buffer prevents the window from crossing into the next chromosome
buffer <- chromosome_buffer*window
if (length(unique(pmin(xpos))) < nLoc(x)){
lpos <- split(xpos, chromosome(x))
for (p in seq(lpos)){
# adding the large round number plus a buffer (if applicable). This will
# ensure that chromosomes don't overlap.
lpos[[p]] <- lpos[[p]] + (maxp*(p - 1)) + (buffer*(p - 1)) + 1
}
xpos <- unlist(lpos, use.names = FALSE)
}
position(x) <- xpos
return(x)
}
#==============================================================================#
#' Calculate random samples of the index of association for genlight objects.
#'
Expand Down
4 changes: 3 additions & 1 deletion R/bruvo.r
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,7 @@ bruvo.dist <- function(pop, replen = 1, add = TRUE, loss = TRUE, by_locus = FALS
if (length(replen) < nLoc(pop)){
replen <- vapply(alleles(pop), function(x) guesslengths(as.numeric(x)), 1)
warning(repeat_length_warning(replen), immediate. = TRUE)
if (interactive()) sleep(2L)
}
bruvomat <- new('bruvomat', pop, replen)
funk_call <- match.call()
Expand Down Expand Up @@ -360,9 +361,10 @@ bruvo.boot <- function(pop, replen = 1, add = TRUE, loss = TRUE, sample = 100,
# Bruvo's distance depends on the knowledge of the repeat length. If the user
# does not provide the repeat length, it can be estimated by the smallest
# repeat difference greater than 1. This is not a preferred method.
if (length(replen) < length(locNames(pop))){
if (length(replen) < length(locNames(pop))) {
replen <- vapply(alleles(pop), function(x) guesslengths(as.numeric(x)), 1)
warning(repeat_length_warning(replen), immediate. = TRUE)
if (interactive()) sleep(2L)
}
bootgen <- new('bruvomat', pop, replen)
# Steps: Create initial tree and then use boot.phylo to perform bootstrap
Expand Down
58 changes: 29 additions & 29 deletions man/win.ia.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading