Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hexp calcuation is erroneous #47

Closed
zkamvar opened this issue Jul 16, 2015 · 4 comments
Closed

Hexp calcuation is erroneous #47

zkamvar opened this issue Jul 16, 2015 · 4 comments

Comments

@zkamvar
Copy link
Member

zkamvar commented Jul 16, 2015

The problem:

After re-reading Nei (1978), I realized that my implementation of Hexp is:

N/(N - 1) * 1 - sum(p^2)

Where N is the number of allelic states and p is the vector of allele frequencies. Nei's definition is:

kn/(kn - 1) * 1 - sum(p^2)

Where n is the number of observed samples at a locus and k is the ploidy (to account for dosage).

User-facing impacts:

  • poppr()
  • locus_table()

What needs to be fixed:

  • internal function locus_table_pegas()

Impacts after fix:

  • Polyploids - Because the calculation is dependent on ploidy, this means that it would inappropriate to calculate this for polyploids due to ambiguous dosage.
  • locus_table(type = "genotype") - Hexp will change to unbiased simpson's index.

Unfortunately, I might have to wait until just before August to submit this patch, lest I anger our CRAN overlords.

@zkamvar
Copy link
Member Author

zkamvar commented Jul 16, 2015

To ensure that things are working correctly, I will write a test based on the example presented in Kosman (2003).

zkamvar added a commit that referenced this issue Jul 17, 2015
For haploids and diploids, the calculation will
return the size-corrected index. For polyploids,
locus_table will return a corrected simpson's index while poppr will return simpson's index. It's all very confusing....
zkamvar added a commit that referenced this issue Jul 17, 2015
zkamvar added a commit that referenced this issue Jul 17, 2015
zkamvar added a commit that referenced this issue Jul 17, 2015
more adventures in #47!
zkamvar added a commit that referenced this issue Jul 17, 2015
This was why I was getting weird values. The clouds are beginning to clear on issue #47
@zkamvar
Copy link
Member Author

zkamvar commented Jul 17, 2015

Currently, the strategy is:

If it's polyploid, change to unbiased Simpson's index over alleles:

(n/(n - 1)) * 1 - sum(pi^2)

This way a measure can actually be reached instead of having complaints of missing data in the result.

Currenlty, locus table will report a different column name, and poppr probably should as well.

Now, I just need to fix the documentation:

  • adjust documentation in locus_table()
  • adjust documentation in poppr()
  • adjust documentation in vignette (that was wrong anyways 😩)

zkamvar added a commit that referenced this issue Jul 17, 2015
In another thrilling installment of addressing #47, we changed the output of locus_table to be Mu and not uSimpson because it's easier to type and I have a direct reference!
zkamvar added a commit that referenced this issue Jul 17, 2015
update documentation and tests for #47
zkamvar added a commit that referenced this issue Jul 17, 2015
ALL FOR THE GLORY OF #47!
@zkamvar
Copy link
Member Author

zkamvar commented Jul 17, 2015

The new column name for poppr and locus_table is Mu.

@zkamvar
Copy link
Member Author

zkamvar commented Jul 17, 2015

Scratch that, reverse it.

The calculation will be

(n/(n - 1)) * 1 - sum(p^2)

where n is the number of observed alleles. This will impact polyploids and mixed ploidy populations by increasing diversity, but it's better than using kN, which would increase it even more.

zkamvar added a commit that referenced this issue Jul 17, 2015
@zkamvar zkamvar mentioned this issue Jul 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant