You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current bitwise.dist() does not have the option to calculate euclidean distance. This can be done currently by converting the data to a matrix and using R's dist() function, but may be prohibitive on larger data sets due to memory. It may be possible to implement this relatively easily. If there is no missing data, for haploids, we simply need to take the square root of the distance. For diploids, we do the same, but we also have to ensure that the distances are squared at each locus. This will involve converting all of the 2 distances to 4. This can be done in get_distance_custom():
ch_dist=Hor | S; // Force ones everywhere they are the same
dist=get_zeros(S); // Add one distance for every non-shared zygosity
dist+=get_zeros(ch_dist); // Add another one for every difference that has no heterozygotes
returndist;
}
On line 2142, if we multiply the result of get_zeroes(ch_dist) by 3, then it will be equivalent to squaring the distance. Here's a small proof of this in R:
Created on 2018-02-04 by the reprex
package (v0.1.1.9000).
Missing data
The challenge comes to what happens to missing data. The problem is that, to match R's dist() function, comparisons with missing data are re-scaled to n(n - x) where n is the number of sites and x is the amount of missing sites in the comparison. This will involve counting up the number of missing sites while they are being accounted for while constructing the distance. This should only involve one extra variable.
Tasks
add euclidean argument to bitwise.dist()
add euclidean argument to get_distance_custom()
take the square root of the result (in bitwise.dist())
add counter for missing data in bitwise_dist_haploid() and bitwise_dist_diploid()
The text was updated successfully, but these errors were encountered:
The current bitwise.dist() does not have the option to calculate euclidean distance. This can be done currently by converting the data to a matrix and using R's
dist()
function, but may be prohibitive on larger data sets due to memory. It may be possible to implement this relatively easily. If there is no missing data, for haploids, we simply need to take the square root of the distance. For diploids, we do the same, but we also have to ensure that the distances are squared at each locus. This will involve converting all of the 2 distances to 4. This can be done inget_distance_custom()
:poppr/src/bitwise_distance.c
Lines 2130 to 2145 in 3994ed3
On line 2142, if we multiply the result of
get_zeroes(ch_dist)
by 3, then it will be equivalent to squaring the distance. Here's a small proof of this in R:Created on 2018-02-04 by the reprex
package (v0.1.1.9000).
Missing data
The challenge comes to what happens to missing data. The problem is that, to match R's dist() function, comparisons with missing data are re-scaled to n(n - x) where n is the number of sites and x is the amount of missing sites in the comparison. This will involve counting up the number of missing sites while they are being accounted for while constructing the distance. This should only involve one extra variable.
Tasks
bitwise.dist()
get_distance_custom()
bitwise.dist()
)bitwise_dist_haploid()
andbitwise_dist_diploid()
The text was updated successfully, but these errors were encountered: