Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling missing genotype calls #139

Closed
schonbej opened this issue Apr 3, 2020 · 1 comment · Fixed by #174
Closed

Handling missing genotype calls #139

schonbej opened this issue Apr 3, 2020 · 1 comment · Fixed by #174
Milestone

Comments

@schonbej
Copy link

schonbej commented Apr 3, 2020

I am working with Hail based Variant Spark.
Missing genotype calls in my dataset passed to random_forest_model as x cause a NullPointerExeption when trying to call fit_trees on the model.
For now I'll filter out rows with call rates !=1 with Hail, but it would be great if Variant Spark could handle this.

Code:

rf_model = vshl.random_forest_model(y=c2s_mt.sample_membership.sample, x=c2s_mt.GT.n_alt_alleles())
rf_model.fit_trees(500, 100)

Error message:
fit_trees_nullpointer_exception.txt

@piotrszul
Copy link
Collaborator

I believe in general case the imputation is better performed by specialised genomics tools.
However to basic support for imputation will be added in the following manner:
an extra parameter imputation_type will be added to methods.random_forest_model with allowed values:

  • None : no imputation - an error will be reported when input VCF contains uncalled genotypes
  • "mode" : perform a basic imputation by replacing missing values with the mode (most frequent value) of the non missing values. In case of multi-modal distribution the smallest mode is used.

e.g.:

rf_model = vshl.random_forest_model(y=mt.hipster.x22_16050408,
                    x=mt.GT.n_alt_alleles(), seed = 13, mtry_fraction = 0.05,
                    min_node_size = 5, max_depth = 10, imputation_type = "mode")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants