diff --git a/.readthedocs.yml b/.readthedocs.yml new file mode 100644 index 00000000..7ff9e620 --- /dev/null +++ b/.readthedocs.yml @@ -0,0 +1,14 @@ +# .readthedocs.yaml +# Read the Docs configuration file +# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details + +# Required +version: 2 + +build: + os: ubuntu-22.04 + tools: + python: "3.11" + +mkdocs: + configuration: mkdocs.yml diff --git a/docs/pgs/faq.md b/docs/pgs/faq.md new file mode 100644 index 00000000..ba6310c9 --- /dev/null +++ b/docs/pgs/faq.md @@ -0,0 +1,16 @@ +# Frequently Asked Questions + +## Can I use the Polygenic Score Calculation extension without an email address? +Yes, the extension can also be used with a username without an email. However, without an email, notifications are not sent, and access to genotyped data may be limited. + +## Extending expiration date or reset download counter +Your data is available for 7 days. In case you need an extension, please let [us](/contact) know. + +## How can I improve the download speed? +[aria2](https://aria2.github.io/) tries to utilize your maximum download bandwidth. Please keep in mind to raise the k parameter significantly (-k, --min-split-size=SIZE). You will otherwise hit the Michigan Imputation Server download limit for each file (thanks to Anthony Marcketta for point this out). + +## Can I download all results at once? +We provide wget command for all results. Please open the results tab. The last column in each row includes direct links to all files. + +## Can I perform PGS calculation locally? +Imputationserveris using a standalone tool called pgs-calc. It reads the imputed dosages from VCF files and uses them to calculate scores. It supports imputed genotypes from Michigan Imputation Server or TOPMed Imputation Server out of the box and score files from PGS Catalog or PRSWeb instances. In addition, own created score files containing chromosomal positions, both alleles and the effect size can be used easily. pgs-calc uses the chromosomal positions and alleles to find the corresponding dosages in genotype files, but provides also tools to resolve rsIDs in score files using dbSNP. Therefore, it can be applied to genotype files with variants that were not annotated with rsIDs. Moreover, the standalone version provides options to improve the coverage by using the provided proxy mapping file for Europeans or a custom population specific mapping file. pgs-calc is available at https://github.com/lukfor/pgs-calc. \ No newline at end of file diff --git a/docs/pgs/getting-started.md b/docs/pgs/getting-started.md new file mode 100644 index 00000000..f8f81119 --- /dev/null +++ b/docs/pgs/getting-started.md @@ -0,0 +1,109 @@ +# Polygenic Score Calculation + +We provide an easy to use and user-friendly web interface to apply thousands of published polygenic risk scores to imputed genotypes in an efficient way. +By extending the popular Michigan Imputation Server the module integrates it seamless into the existing imputation workflow and enables users without knowledge in that field to take advantage of this method. +The graphical report includes all meta-data about the scores in a single place and helps users to understand and screen thousands of scores in an easy and intuitive way. + +![pipeline.png](images%2Fpipeline.png) + +An extensive quality control pipeline is executed automatically to detect and fix possible strand-flips and to filter out missing SNPs to prevent systematic errors (e.g. lower scores for individuals with missing or wrong aligned genetic data). + +## Getting started + +To utilize the Polygenic Score Calculation extension on ImputationServer, you must first [register](https://imputationserver.sph.umich.edu/index.html#!pages/register) for an account. +An activation email will be sent to the provided address. Once your email address is verified, you can access the service at no cost. + +**Please note that the extension can also be used with a username without an email. However, without an email, notifications are not sent, and access to genotyped data may be limited.** + +No dataset at hand? No problem, download our example dataset to test the PGS extension: [50-samples.zip](https://imputationserver.sph.umich.edu/resources/50-samples.zip). + + +When incorporating the Polygenic Score Calculation extension in your research, please cite the following papers: + +> Das S, Forer L, Schönherr S, Sidore C, Locke AE, Kwong A, Vrieze S, Chew EY, Levy S, McGue M, Schlessinger D, Stambolian D, Loh PR, Iacono WG, Swaroop A, Scott LJ, Cucca F, Kronenberg F, Boehnke M, Abecasis GR, Fuchsberger C. [Next-generation genotype imputation service and methods](https://www.ncbi.nlm.nih.gov/pubmed/27571263). Nature Genetics 48, 1284–1287 (2016). + +> Samuel A. Lambert, Laurent Gil, Simon Jupp, Scott C. Ritchie, Yu Xu, Annalisa Buniello, Aoife McMahon, Gad Abraham, Michael Chapman, Helen Parkinson, John Danesh, Jacqueline A. L. MacArthur and Michael Inouye. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nature Genetics. doi: 10.1038/s41588-021-00783-5 (2021). + +## Setting up your first Polygenic Score Calculation job + +1. [Log in](https://imputationserver.sph.umich.edu/index.html#!pages/login) with your credentials and navigate to the **Run** tab to initiate a new Polygenic Score Calculation job. +2. Please click on **"Polygenic Score calculation"** and the submission dialog appears. +3. The submission dialog allows you to specify job properties. + +![](images/submit-job01.png) + +The following options are available: + + +### Reference Panel + +Our PGS extension offers genotype imputation from different reference panels. The most accurate and largest panel is **HRC (Version r1.1 2016)**. Please select one that fulfills your needs and supports the population of your input data: + +- HRC (Version r1.1 2016) +- 1000 Genomes Phase 3 (Version 5) +- 1000 Genomes Phase 1 (Version 3) +- HapMap 2 + +More details about all available reference panels can be found [here](/pgs/reference-panels/). + +### Upload VCF files from your computer + +When using the file upload, data is uploaded from your local file system to Michigan Imputation Server. By clicking on **Select Files** an open dialog appears where you can select your VCF files: + +![](images/upload-data01.png) + +Multiple files can be selected using the `ctrl`, `cmd` or `shift` keys, depending on your operating system. +After you have confirmed your choice, all selected files are listed in the submission dialog: + +![](images/upload-data02.png) + +Please make sure that all files fulfill the [requirements](/prepare-your-data). + + +!!! important +Since version 1.7.2 URL-based uploads (sftp and http) are no longer supported. Please use direct file uploads instead. + +### Build +Please select the build of your data. Currently the options **hg19** and **hg38** are supported. Michigan Imputation Server automatically updates the genome positions (liftOver) of your data. All reference panels are based on hg19 coordinates. + +### Scores and Trait Category + +Choose the precomputed Polygenic Score repository relevant to your study from the available options. Based on the selected repository, different trait categories appear and can be selected (e.g. Cancer scores): + + ![](images/pgs-repository.png) + +More details about all available PGS repositories can be found [here](/pgs/scores/). + +### Ancestry Estimation + +You can enable ancestry estimation by selecting a reference population used to classify your uploaded samples. Currently, we support a worldwide panel based on HGDP. + +## Start Polygenic Score Calculation + +After agreeing to the *Terms of Service*, initiate the calculation by clicking on **Submit job**. The system will perform Input Validation and Quality Control immediately. If your data passes these steps, the job is added to the queue for processing. + + ![](images/queue01.png) + +## Monitoring and Retrieving Results + +- **Input Validation**: Verify the validity of your uploaded files and review basic statistics. + + ![](images/input-validation01.png) + +- **Quality Control**: Examine the QC report and download statistics after the system filters variants based on various criteria. + + ![](images/quality-control02.png) + +- **Polygenic Score Calculation**: Monitor the progress of the imputation and polygenic scores calculation in real time for each chromosome. + + ![](images/imputation01.png) + +## Downloading Results + +Upon completion, you will be notified by email if you enter your address on registration. A zip archive containing results can be downloaded directly from the server. + + ![](images/job-results.png) + +Click on the filename to download results directly via a web-browser. For command line downloads, use the **share** symbol to obtain private links. + +**Important**: All data is automatically deleted after 7 days. Download needed data within this timeframe. A reminder is sent 48 hours before data deletion. diff --git a/docs/pgs/images/imputation01.png b/docs/pgs/images/imputation01.png new file mode 100644 index 00000000..e7e52ad1 Binary files /dev/null and b/docs/pgs/images/imputation01.png differ diff --git a/docs/pgs/images/input-validation01.png b/docs/pgs/images/input-validation01.png new file mode 100644 index 00000000..379673e8 Binary files /dev/null and b/docs/pgs/images/input-validation01.png differ diff --git a/docs/pgs/images/pgs-repository.png b/docs/pgs/images/pgs-repository.png new file mode 100644 index 00000000..76c12f7e Binary files /dev/null and b/docs/pgs/images/pgs-repository.png differ diff --git a/docs/pgs/images/pipeline.png b/docs/pgs/images/pipeline.png new file mode 100644 index 00000000..0a675747 Binary files /dev/null and b/docs/pgs/images/pipeline.png differ diff --git a/docs/pgs/images/quality-control02.png b/docs/pgs/images/quality-control02.png new file mode 100644 index 00000000..5495fb23 Binary files /dev/null and b/docs/pgs/images/quality-control02.png differ diff --git a/docs/pgs/images/report-01.png b/docs/pgs/images/report-01.png new file mode 100644 index 00000000..4f3ee91d Binary files /dev/null and b/docs/pgs/images/report-01.png differ diff --git a/docs/pgs/images/report-02.png b/docs/pgs/images/report-02.png new file mode 100644 index 00000000..53a41ca3 Binary files /dev/null and b/docs/pgs/images/report-02.png differ diff --git a/docs/pgs/images/submit-job01.png b/docs/pgs/images/submit-job01.png new file mode 100644 index 00000000..9a49433c Binary files /dev/null and b/docs/pgs/images/submit-job01.png differ diff --git a/docs/pgs/images/upload-data01.png b/docs/pgs/images/upload-data01.png new file mode 100644 index 00000000..5c7d7a8f Binary files /dev/null and b/docs/pgs/images/upload-data01.png differ diff --git a/docs/pgs/images/upload-data02.png b/docs/pgs/images/upload-data02.png new file mode 100644 index 00000000..0f9fa025 Binary files /dev/null and b/docs/pgs/images/upload-data02.png differ diff --git a/docs/pgs/output-files.md b/docs/pgs/output-files.md new file mode 100644 index 00000000..2b362c45 --- /dev/null +++ b/docs/pgs/output-files.md @@ -0,0 +1,38 @@ +# Output Files + +The Polygenic Score Calculation Results CSV file provides Polygenic Score (PGS) values for different samples and associated identifiers. +Users can leverage this CSV file to analyze and compare Polygenic Score values across different samples. The data facilitates the investigation of genetic associations and their impact on specific traits or conditions. + +## CSV Format + +The CSV file consists of a header row and data rows: + +### Header Row + +- **sample**: Represents the identifier for each sample. +- **PGS000001, PGS000002, PGS000003, ...**: Columns representing different Polygenic Score values associated with the respective identifiers. + +### Data Rows + +- Each row corresponds to a sample and provides the following information: + - **sample**: Identifier for the sample. + - **PGS000001, PGS000002, PGS000003, ...**: Polygenic Score values associated with the respective identifiers for the given sample. + +### Example + +Here's an example row: + +```csv +sample, PGS000001, PGS000002, PGS000003, ... +sample1, -4.485780284301654, 4.119604924228042, 0.0, -4.485780284301654 +``` + +- **sample1**: Sample identifier. + - **-4.485780284301654**: Polygenic Score value for `PGS000001`. + - **4.119604924228042**: Polygenic Score value for `PGS000002`. + - **0.0**: Polygenic Score value for `PGS000003`. + +**Note:** + +- Polygenic Score values are provided as floating-point numbers. +- The absence of values (e.g., `0.0`) indicates a lack of Polygenic Score information for a particular identifier in a given sample. diff --git a/docs/pgs/pipeline.md b/docs/pgs/pipeline.md new file mode 100644 index 00000000..133e6ea2 --- /dev/null +++ b/docs/pgs/pipeline.md @@ -0,0 +1,11 @@ +# Pipeline + +![pipeline.png](images%2Fpipeline.png) + + + + + + +## Ancestry estimation +We use LASER to perform principal components analysis (PCA) based on the genotypes of each sample and to place them into a reference PCA space which was constructed using a set of reference individuals [14]. We built reference coordinates based on 938 samples from the Human Genome Diversity Project (HGDP) [15] and labeled them by the ancestry categories proposed by the GWASCatalog [16] which are also used in PGS Catalog. \ No newline at end of file diff --git a/docs/pgs/reference-panels.md b/docs/pgs/reference-panels.md new file mode 100644 index 00000000..e2dbd3d6 --- /dev/null +++ b/docs/pgs/reference-panels.md @@ -0,0 +1,45 @@ +# Reference Panels for PGS Calculation + +Our server offers PGS calculation from the following reference panels: + + +## HRC (Version r1.1 2016) + +The HRC panel consists of 64,940 haplotypes of predominantly European ancestry. + +| || +| | | +| Number of Samples | 32,470 | +| Sites (chr1-22) | 39,635,008 | +| Chromosomes | 1-22, X| +| Website | [http://www.haplotype-reference-consortium.org](http://www.haplotype-reference-consortium.org); [HRC r1.1 Release Note](https://imputationserver.sph.umich.edu/start.html#!pages/hrc-r1.1) | + +## 1000 Genomes Phase 3 (Version 5) + +Phase 3 of the 1000 Genomes Project consists of 5,008 haplotypes from 26 populations across the world. + +| || +| | | +| Number of Samples | 2,504 | +| Sites (chr1-22) | 49,143,605 | +| Chromosomes | 1-22, X| +| Website | [http://www.internationalgenome.org](http://www.internationalgenome.org) | + + +## 1000 Genomes Phase 1 (Version 3) + +| || +| | | +| Number of Samples | 1,092 | +| Sites (chr1-22) | 28,975,367 | +| Chromosomes | 1-22, X| +| Website | [http://www.internationalgenome.org](http://www.internationalgenome.org) | + +## HapMap 2 + +| || +| | | +| Number of Samples | 60 | +| Sites (chr1-22) | 2,542,916 | +| Chromosomes | 1-22 | +| Website: | [http://www.hapmap.org](http://www.hapmap.org) | diff --git a/docs/pgs/report.md b/docs/pgs/report.md new file mode 100644 index 00000000..c899222c --- /dev/null +++ b/docs/pgs/report.md @@ -0,0 +1,14 @@ +# Interactive Report + +The created report contains a list of all scores, where each score has a different color based on its coverage. The color green indicates that the coverage is very high and nearly all SNPs from the score were also found in the imputed dataset. The color red indicates that very few SNPs were found and the coverage is therefore low. + +![report.png](images/report-01.png) + +In addition, the report includes detailed metadata for each score such as the number of variants, the number of well-imputed genotypes and the population used to construct the score. A direct link to PGS Catalog, Cancer PRSWeb or ExPRSWeb is also available for further investigation (e.g. for getting information about the method that was used to construct the score). Further, the report displays the distribution of the scores of all uploaded samples and can be interactively explored. This allows users to detect samples with either a high or low risk immediately. + +Moreover, the report gives an overview of all estimated ancestries from the uploaded genotypes and compares them with the populations of the GWAS that was used to create the score. + +![report.png](images/report-02.png) + + +If an uploaded sample with an unsupported population is detected, a warning message is provided and the sample is excluded from the summary statistics. diff --git a/docs/pgs/scores.md b/docs/pgs/scores.md new file mode 100644 index 00000000..0493c05b --- /dev/null +++ b/docs/pgs/scores.md @@ -0,0 +1,21 @@ +# Scores + +We support currently the following PGS repositories out of the box: + +## PGS-Catalog + +We use PGS Catalog as the source of scores for PGS Server (version 19. Jan 2023). the PGS Catalog is an online database that collects and annotates published scores and currently provides access to over 3,900 scores encompassing more than 580 traits. + +> Samuel A. Lambert, Laurent Gil, Simon Jupp, Scott C. Ritchie, Yu Xu, Annalisa Buniello, Aoife McMahon, Gad Abraham, Michael Chapman, Helen Parkinson, John Danesh, Jacqueline A. L. MacArthur and Michael Inouye. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nature Genetics. doi: 10.1038/s41588-021-00783-5 (2021). + +## Cancer-PRSweb + +Collection of scores for major cancer traits. + +> Fritsche LG, Patil S, Beesley LJ, VandeHaar P, Salvatore M, Ma Y, Peng RB, Taliun D, Zhou X, Mukherjee B: Cancer PRSweb: An Online Repository with Polygenic Risk Scores for Major Cancer Traits and Their Evaluation in Two Independent Biobanks. Am J Hum Genet 2020, 107(5):815-836. + +## ExPRSweb + +Collection of scores for common health-related exposures like body mass index or alcohol consumption. + +> Ma Y, Patil S, Zhou X, Mukherjee B, Fritsche LG: ExPRSweb: An online repository with polygenic risk scores for common health-related exposures. Am J Hum Genet 2022, 109(10):1742-1760. diff --git a/docs/pgs/tutorial.md b/docs/pgs/tutorial.md new file mode 100644 index 00000000..f4969928 --- /dev/null +++ b/docs/pgs/tutorial.md @@ -0,0 +1,44 @@ +# Testing Imputationserver PGS: Step by Step + + +To test Imputationserver PGS, please execute the following steps: + +**0. Signup and create a login:** + +Imputationserver PGS requires a login to access the Polygenic Risk Score (PGS) calculation service. +This login is crucial to maintain the security and privacy of any uploaded human genotype data. + +Users have the flexibility to create this login either with or **without providing an email address**. Please visit the [signup](https://imputationserver.sph.umich.edu/index.html#!pages/register) page and proceed to create a login. + +**1. Download the Example Dataset:** +Start by downloading the example dataset provided for testing the PGS extension. You can obtain the dataset by clicking on the following link: [50-samples.zip](https://imputationserver.sph.umich.edu/resources/50-samples.zip). + +**2. Unpack the Data:** +After downloading the zip file, unzip or extract its contents to a location of your choice on your computer. + +**3. Access Polygenic Risk Score Application:** +Navigate to the "Run" menu and select "Polygenic Risk Score" from the options. + +**4. Choose 1000 Genomes Phase 3 Panel:** +In the Polygenic Risk Score application, select the "1000 Genomes Phase 3" panel as the reference dataset for imputation. + +**5. Specify PGS Catalog and Trait:** +Identify and specify the Polygenic Score (PGS) Catalog you want to use for scoring. +Choose a relevant trait for the analysis, such as "Cancer". + +**6. Optional Ancestry Estimation:** +Optionally, you can choose to include ancestry estimation in your analysis. This step may enhance the precision and interpreation of the results. + +**7. Agree to Terms of Service:** +Before proceeding, make sure to read and agree to the Terms of Service provided by Imputationserver. It is essential to comply with the platform's terms and conditions. + +**8. Submit the Job:** +After configuring all the necessary parameters, click on the "Submit" button to initiate the PGS calculation. + +**9. Monitor Progress:** +Depending on the server load, the calculation may take a certain amount of time (about 30 minutes). Allow the process to complete. + +**10. Download Results:** +Once the calculation is finished, you can view the results provided by Imputationserver PGS and download a report and all calcuated scores. + +Congratulations! You have successfully tested Imputationserver PGS using the provided example dataset and configuration settings. Now you are ready to use the service with yout own dataset! \ No newline at end of file diff --git a/docs/workshops/ASHG2023.md b/docs/workshops/ASHG2023.md new file mode 100644 index 00000000..ad3c35c8 --- /dev/null +++ b/docs/workshops/ASHG2023.md @@ -0,0 +1,27 @@ +**Workshop ASHG2023** + +# Welcome to the Michigan Imputation Server Workshop! + +## Workshop Title +The Michigan Imputation Server: Data Preparation, Genotype Imputation, and Data Analysis + +## Topic +Statistical Genetics and Genetic Epidemiology + +## Target Audience +Attendees interested in learning how to perform genotype imputation and use imputed genotypes in their research, especially trainees. There are no prerequisites for this workshop. Attendees are expected to follow materials on their personal laptops. + +## Workshop Slides +You can download the slides of all workshop sessions [here](https://github.com/genepi/imputationserver-ashg/raw/main/slides/MIS_Workshop_2023.pdf). Please also have a look at the individual sessions below for additional training material. + +## Links +- [Interactive Poll](http://pollev.com/ashg) +- [Workshop Website](https://www.ashg.org/meetings/2023meeting/2023-ashg-invited-workshop-schedule/) + +## Workshop Facilitator(s) +- Christian Fuchsberger, christian.fuchsberger@eurac.edu (Eurac Research) +- Sebastian Schönherr, sebastian.schoenherr@i-med.ac.at (Medical University of Innsbruck) +- Lukas Forer, lukas.forer@i-med.ac.at (Medical University of Innsbruck) +- Xueling Sim, ephsx@nus.edu.sg (National University of Singapore) +- Saori Sakaue, ssakaue@broadinstitute.org (Broad Institute) +- Albert Smith, albertvs@umich.edu (University of Michigan) \ No newline at end of file diff --git a/docs/workshops/ASHG2023/Session1.md b/docs/workshops/ASHG2023/Session1.md new file mode 100644 index 00000000..7e9a5c74 --- /dev/null +++ b/docs/workshops/ASHG2023/Session1.md @@ -0,0 +1,18 @@ +**Workshop ASHG2023 > Session 1: Imputation and the Server** + +# Server Links + +[Michigan Imputation Server](https://imputationserver.sph.umich.edu) + +[TOPMed Imputation Server](https://imputation.biodatacatalyst.nhlbi.nih.gov) + + +# Selected Literature + +[Das S, Forer L, Schönherr S, Sidore C, Locke AE, Kwong A, Vrieze SI, Chew EY, Levy S, McGue M, Schlessinger D, Stambolian D, Loh PR, Iacono WG, Swaroop A, Scott LJ, Cucca F, Kronenberg F, Boehnke M, Abecasis GR, Fuchsberger C. Next-generation genotype imputation service and methods. Nat Genet. 2016 Oct;48(10):1284-1287. doi: 10.1038/ng.3656.](https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/27571263/) + +[Das S, Abecasis GR, Browning BL. Genotype Imputation from Large Reference Panels. Annu Rev Genomics Hum Genet. 2018 Aug 31;19:73-96. doi: 10.1146/annurev-genom-083117-021602.](https://arjournals.annualreviews.org/doi/10.1146/annurev-genom-083117-021602?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed) + +[Fuchsberger C, Abecasis GR, Hinds DA. minimac2: faster genotype imputation. Bioinformatics. 2015 Mar 1;31(5):782-4. doi: 10.1093/bioinformatics/btu704. Epub 2014 Oct 22. PMID: 25338720; PMCID: PMC4341061.](https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/25338720/) + +[Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet. 2012 Jul 22;44(8):955-9. doi: 10.1038/ng.2354. PMID: 22820512; PMCID: PMC3696580.](https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/22820512/) diff --git a/docs/workshops/ASHG2023/Session2.md b/docs/workshops/ASHG2023/Session2.md new file mode 100644 index 00000000..488d5c07 --- /dev/null +++ b/docs/workshops/ASHG2023/Session2.md @@ -0,0 +1,40 @@ +**Workshop ASHG2023 > Session 2: Run a job, Data Preparation and Data Download** + +# Welcome + +Welcome to Session 2! In this session you will learn how to submit a job on Michigan Imputation Server (MIS) and how to prepare your input data that they are passing the QC step. + +# Tutorial + +## Getting Started +As a quick start, the following video includes all required steps to submit and monitor a job using the graphical web interface. + + + + +## Run a job on your own +After you [registered](https://imputationserver.sph.umich.edu/start.html#!pages/register) successfully, the following URL will bring you directly to the job submission page: +[https://imputationserver.sph.umich.edu/index.html#!run/minimac4](https://imputationserver.sph.umich.edu/index.html#!run/minimac4) + +## Submission Page - Select parameters +The UI includes several parameters which need to be specified. Our [Getting Started guide](https://imputationserver.readthedocs.io/en/latest/getting-started/) describes all required parameters to do so. + +## Submission Page - Upload data + +We are providing two data datasets that can be downloaded from below. In case the unphased dataset is selected, an additional phasing step using Eagle is automatically performed. For this demo, we recommend selecting the **HapMap 2 panel** (Input parameter 'Reference panel') to get your results as quick as possible. Please also have a look at our [supported reference panels](https://imputationserver.readthedocs.io/en/latest/reference-panels/) when using MIS in a production setup. + +- [Phased dataset chr20 hg19](https://github.com/genepi/imputationserver-ashg/raw/main/files/chr20.R50.merged.1.330k.recode.small.vcf.gz) +- [Unphased dataset chr20 hg19](https://github.com/genepi/imputationserver-ashg/raw/main/files/chr20.R50.merged.1.330k.recode.unphased.small.vcf.gz) + +## Submission Page - Submit +After all parameters have been selected and you click 'submit', the job will be added to our Input Validation and QC queue. Please have a look at our [Data Preparation Guide](https://imputationserver.readthedocs.io/en/latest/prepare-your-data) to learn how to prepare your dataset using a pre-imputation tool. + +## Monitor Jobs, Download Data +If the job passes the QC step, it will be added to our long-time queue. As soon as your job is finished, you will receive an email with the password to download and encrypt your data. Also checkout our [Pipeline Overview Guide](https://imputationserver.readthedocs.io/en/latest/pipeline/) to learn more about the different internal parameters. The complete source code can also be found on [GitHub](https://imputationserver.sph.umich.edu). + +## QC Results +A simple QC report including the frequency plot is available [here](https://htmlpreview.github.io/?https://github.com/genepi/imputationserver-ashg/blob/main/files/qcreport.html). + + +# Contact +If you have any questions please write [me an email](mailto:sebastian.schoenherr@i-med.ac.at) or contact me on [Twitter](https://twitter.com/seppinho). diff --git a/docs/workshops/ASHG2023/Session3.md b/docs/workshops/ASHG2023/Session3.md new file mode 100644 index 00000000..ce7fb69c --- /dev/null +++ b/docs/workshops/ASHG2023/Session3.md @@ -0,0 +1 @@ +**Workshop ASHG2023 > Session 3: Performing GWAS using imputed data** diff --git a/docs/workshops/ASHG2023/Session4.md b/docs/workshops/ASHG2023/Session4.md new file mode 100644 index 00000000..4c598fd8 --- /dev/null +++ b/docs/workshops/ASHG2023/Session4.md @@ -0,0 +1,121 @@ +**Workshop ASHG2023 > Session 4: nf-gwas, Imputation Bot and PGS Server** + +# nf-gwas Report +You can download a nf-gwas test report from [here](https://github.com/genepi/imputationserver-ashg/raw/main/files/nf-gwas-example.zip). Please unzip the file and open the index.html file. + +# Imputation Bot Tutorial + +## Requirements + +You will need the following things properly installed on your computer. + +* [Java 8 or higher](http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html) + + +## Download and Install + +Download and install the latest version from our download page using the following commands: + +``` +curl -sL imputationbot.now.sh | bash +``` + + +Test the installation with the following command: + +```sh +imputationbot version +``` + +The documentation is available at [http://imputationbot.readthedocs.io](http://imputationbot.readthedocs.io). + + +## Get your API Token + +Enable API access from your Profile page. + +1. Login and click on your **username** and then **profile**: + +![Image1](/workshops/ASHG2020/images/token1.png) + +2. Click on **Create API Token** + +![Image1](/workshops/ASHG2020/images/token2.png) + +3. Copy your API Token and paste it when `imputationbot add-instance` ask for it. + +![Image1](/workshops/ASHG2020/images/token3.png) + +Api Tokens are valid for 30 days. You can check the status in the web interface or with `imputationbot instances` + +![Image1](/workshops/ASHG2020/images/token4.png) + +4. Next, configure imputationbot with the following command: + +``` +imputationbot add-instance +``` + +``` +Imputation Bot 0.8.3 🤖 +https://imputationserver.sph.umich.edu +(c) 2019-2020 Lukas Forer, Sebastian Schoenherr and Christian Fuchsberger +Built by lukas on 2020-09-01T11:31:10Z + +Imputationserver Url [https://imputationserver.sph.umich.edu]: +API Token [None]: eyJjdHkiOiJ0ZXh0XC9wbGFpbiIsImFsZyI6IkhTMjU2In0.eyJtYWlsIjoibHVrYXMuZm9yZXJAaS1tZWQuYWMuYXQiLCJleHBpcmUiOjE1NzMyMjkwNTY3NTEsIm5hbWUiOiJMdWthcyBGb3JlciIsImFwaSI6dHJ1ZSwidXNlcm5hbWUiOiJsdWtmb3IifQ.qY7iEM6ul-gJ0EuHmEUHRnoS5hZs7kD1HC95NFaxE9w +``` + +## Run imputation + +You can use the `impute` command to submit a job: + +- The `--files` parameter defines the location of our VCF file. If we plan to impute more than one file we can enter the path to a folder or separate multiple filenames by `,`. +- We can use the `--refpanel` parameter to specify the reference panel. For the **1000 Geneoms Phase 3** panel we use `1000g-phase-3-v5`. If we are not sure what panels are provided by the server, we can use `imputationbot refpanels` to get a list of all reference panels and their supported populations. +- For `--population` we use `eur` which stands for **European** + +The complete command looks like this: + +```bash +imputationbot impute --files /path/to/your.vcf.gz --refpanel 1000g-phase-3-v5 --population eur +``` + +After submission we get the URL where we can monitor the progress of our job. + +## Monitor Jobs + +However, we can also use Imputation Bot to get a list all our jobs and their status: + +``` +imputationbot jobs +``` + +To get more details about our job, we can use the `jobs` command followed by the job ID: + +``` +imputationbot jobs job-XXXXXXXX-XXXXXX-XXX +``` + +## Download all Results + +We can use the `download` command to download all imputed genotypes and the QC report at once: + +``` +imputationbot download job-XXXXXXXX-XXXXXX-XXX +``` + +If the job is still running, Imputation Bot waits until the job is finished and starts automatically with the download. + +You can provide Imputation Bot the password we sent you via email and it decrypts all files for you: + +``` +imputationbot download job-XXXXXXXX-XXXXXX-XXX --password MYPASSWORD +``` + +## Documentation + +The documentation is available at [http://imputationbot.readthedocs.io](http://imputationbot.readthedocs.io). + +## Contact + +Feel free to contact [us](https://imputationserver.sph.umich.edu/index.html#!pages/contact) in case of any problems. diff --git a/docs/workshops/ASHG2023/Session5.md b/docs/workshops/ASHG2023/Session5.md new file mode 100644 index 00000000..1e2409cb --- /dev/null +++ b/docs/workshops/ASHG2023/Session5.md @@ -0,0 +1,3 @@ +**Workshop ASHG2023 > Session 5: HLA Imputation** + +- [Example output of the HLA imputation VCF](https://github.com/genepi/imputationserver-ashg/raw/main/files/chr6.dose.vcf.gz) \ No newline at end of file diff --git a/docs/workshops/ASHG2023/Session6.md b/docs/workshops/ASHG2023/Session6.md new file mode 100644 index 00000000..a64c7140 --- /dev/null +++ b/docs/workshops/ASHG2023/Session6.md @@ -0,0 +1 @@ +**Workshop ASHG2023 > Session 6: The TOPMed Imputation Server** diff --git a/files/bin/Minimac4 b/files/bin/Minimac4 deleted file mode 100755 index a9630714..00000000 Binary files a/files/bin/Minimac4 and /dev/null differ diff --git a/files/bin/minimac4 b/files/bin/minimac4 new file mode 100755 index 00000000..0ce2bec2 Binary files /dev/null and b/files/bin/minimac4 differ diff --git a/files/imputationserver-beagle.yaml b/files/imputationserver-beagle.yaml index fb2c263d..f12813b7 100644 --- a/files/imputationserver-beagle.yaml +++ b/files/imputationserver-beagle.yaml @@ -1,7 +1,7 @@ id: imputationserver-beagle name: Genotype Imputation supporting Beagle (Minimac4) description: This is the new Michigan Imputation Server Pipeline using Minimac4. Documentation can be found here.

If your input data is GRCh37/hg19 please ensure chromosomes are encoded without prefix (e.g. 20).
If your input data is GRCh38hg38 please ensure chromosomes are encoded with prefix 'chr' (e.g. chr20). -version: 1.7.4 +version: 2.0.0 website: https://imputationserver.readthedocs.io category: diff --git a/files/imputationserver-hla.yaml b/files/imputationserver-hla.yaml index e7f27d5e..6550a66c 100644 --- a/files/imputationserver-hla.yaml +++ b/files/imputationserver-hla.yaml @@ -1,7 +1,7 @@ id: imputationserver-hla name: Genotype Imputation HLA (Minimac4) description: This is the new Michigan Imputation Server Pipeline using Minimac4. Documentation can be found here.

If your input data is GRCh37/hg19 please ensure chromosomes are encoded without prefix (e.g. 20).
If your input data is GRCh38hg38 please ensure chromosomes are encoded with prefix 'chr' (e.g. chr20). -version: 1.7.4 +version: 2.0.0 website: https://imputationserver.readthedocs.io category: diff --git a/files/imputationserver-pgs.yaml b/files/imputationserver-pgs.yaml index ec8d0615..f8ffa4ff 100644 --- a/files/imputationserver-pgs.yaml +++ b/files/imputationserver-pgs.yaml @@ -1,8 +1,8 @@ id: imputationserver-pgs -name: Genotype Imputation (PGS Calc Integration) -description: This is the new Michigan Imputation Server Pipeline using Minimac4. Documentation can be found here.

If your input data is GRCh37/hg19 please ensure chromosomes are encoded without prefix (e.g. 20).
If your input data is GRCh38hg38 please ensure chromosomes are encoded with prefix 'chr' (e.g. chr20). -version: 1.7.4 -website: https://imputationserver.readthedocs.io +name: Polygenic Score Calculation +description: "You can upload genotyped data and the application imputes your genotypes, performs ancestry estimation and finally calculates Polygenic Risk Scores.

No dataset at hand? No problem, download our example dataset: 50-samples.zip

" +version: 2.0.0 +website: https://imputationserver.readthedocs.io/en/latest/pgs/getting-started category: installation: @@ -53,11 +53,13 @@ workflow: generates: $local $outputimputation $logfile $hadooplogs binaries: ${app_hdfs_folder}/bin +#if( $reference != "disabled") - name: Ancestry Estimation jar: imputationserver.jar classname: genepi.imputationserver.steps.ancestry.TraceStep binaries: ${app_hdfs_folder}/bin references: ${app_hdfs_folder}/references +#end - name: Data Compression and Encryption jar: imputationserver.jar @@ -95,6 +97,7 @@ workflow: 0.1: 0.1 0.2: 0.2 0.3: 0.3 + visible: false - id: phasing description: Phasing @@ -103,14 +106,13 @@ workflow: values: eagle: Eagle v2.4 (phased output) no_phasing: No phasing + visible: false - id: population description: Population - type: list - values: - bind: refpanel - property: populations - category: RefPanel + value: mixed + type: text + visible: false - id: mode description: Mode @@ -120,6 +122,7 @@ workflow: qconly: Quality Control Only imputation: Quality Control & Imputation phasing: Quality Control & Phasing Only + visible: false - id: aesEncryption description: AES 256 encryption @@ -129,7 +132,7 @@ workflow: values: true: yes false: no - visible: true + visible: false - id: meta description: Generate Meta-imputation file @@ -138,7 +141,7 @@ workflow: values: true: yes false: no - visible: true + visible: false - id: myseparator0 type: separator @@ -154,14 +157,24 @@ workflow: required: true category: PGSPanel + - id: pgsCategory + description: Trait Category + type: list + values: + bind: pgsPanel + property: categories + category: PGSPanel + - id: reference - description: Reference Populations + description: Ancestry Estimation type: list required: true - value: HGDP_938_genotyped + value: disabled values: + disabled: "Disabled" HGDP_938_genotyped: Worldwide (HGDP) - HGDP_938_imputed: Worldwide (imputed HGDP) + #HGDP_938_imputed: Worldwide (imputed HGDP) + visible: true - id: dim description: Number of principal components to compute diff --git a/files/minimac4.yaml b/files/minimac4.yaml index f470aa27..492ceadb 100644 --- a/files/minimac4.yaml +++ b/files/minimac4.yaml @@ -1,7 +1,7 @@ id: imputationserver name: Genotype Imputation (Minimac4) description: This is the new Imputation Server Pipeline using Minimac4. Documentation can be found here.

If your input data is GRCh37/hg19 please ensure chromosomes are encoded without prefix (e.g. 20).
If your input data is GRCh38/hg38 please ensure chromosomes are encoded with prefix 'chr' (e.g. chr20).

There is a limit of three concurrent jobs per person. The TOPMed imputation server is a free resource, and these limits allow us to provide service to a wide audience. We reserve the right to terminate users who violate this policy. -version: 1.8.0-beta4 +version: 2.0.0 website: https://topmedimpute.readthedocs.io/en/latest/ category: diff --git a/mkdocs.yml b/mkdocs.yml index 413ceab4..41af951e 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -12,26 +12,14 @@ nav: - Contact: contact.md - Developer Documentation: - API: api.md -- Workshops: - - ASHG2022: - - Overview: workshops/ASHG2022.md - - "Session 1: Imputation and the Server": workshops/ASHG2022/Session1.md - - "Session 2: Run a job, Data Preparation and Data Download": workshops/ASHG2022/Session2.md - - "Session 3: Performing GWAS using imputed data": workshops/ASHG2022/Session3.md - - "Session 4: Imputation Bot and PGS Server": workshops/ASHG2022/Session4.md - - "Session 5: HLA Imputation": workshops/ASHG2022/Session5.md - - "Session 6: TOPMed Imputation Server": workshops/ASHG2022/Session6.md - - ASHG2020: - - Overview: workshops/ASHG2020.md - - "Session 1: Imputation and the Server": workshops/ASHG2020/Session1.md - - "Session 2: Run a job, Quality Control and Data Preparation": workshops/ASHG2020/Session2.md - - "Session 3: Tracking runs and downloading data": workshops/ASHG2020/Session3.md - - "Session 4: Performing GWAS using imputed data": workshops/ASHG2020/Session4.md - - "Session 5: Imputation bot": workshops/ASHG2020/Session5.md - - "Session 6: TOPMed Imputation Server": workshops/ASHG2020/Session6.md - - "Session 7: Imputation panels": workshops/ASHG2020/Session7.md - - +- Workshop (ASHG2023): + - Overview: workshops/ASHG2023.md + - "Session 1: Imputation and the Server": workshops/ASHG2023/Session1.md + - "Session 2: Run a job, Data Preparation and Data Download": workshops/ASHG2023/Session2.md + - "Session 3: Performing GWAS using imputed data": workshops/ASHG2023/Session3.md + - "Session 4: nf-gwas, Imputation Bot and PGS Server": workshops/ASHG2023/Session4.md + - "Session 5: HLA Imputation": workshops/ASHG2023/Session5.md + - "Session 6: TOPMed Imputation Server": workshops/ASHG2023/Session6.md repo_name: statgen/imputationserver repo_url: https://github.com/statgen/imputationserver diff --git a/pages/home.stache b/pages/home.stache index 0ed79a82..4f4c3737 100755 --- a/pages/home.stache +++ b/pages/home.stache @@ -2,15 +2,15 @@

Michigan Imputation Server

- Free Next-Generation Genotype Imputation Service + Free Next-Generation Genotype Imputation Platform

- <% if(!loggedIn) {%> + {{#is(loggedIn, false)}}


- Sign up now  - Login + Sign up now  + Login

- <% } %> + {{/is}}
@@ -22,35 +22,123 @@

- <%= counter.attr('complete.chromosomes') ? (Math.round(counter.attr('complete.chromosomes') / 22.0 /1000.0/1000.0 * 10) / 10).toLocaleString() : '0'%> M
Imputed Genomes + {{div(counter.complete.chromosomes, 22000000)}}M
Imputed Genomes

- <%= Math.round(counter.attr('users')).toLocaleString() %>
Registerd Users + {{counter.users}}
Registered Users

- > 100
Published GWAS + {{#counter.running.runs}}{{.}}{{else}}0{{/counter.running.runs}}
Running Jobs

+ + +
+
+ +
+
+
+

Genotype Imputation

+
+

+ You can upload genotyping data and the application imputes your genotypes against different reference panels. +

+

+ {{#is(loggedIn, false)}} + Run + {{ else }} + Run + {{/is}} +   Learn more +

+ +
+
+
+
+

HLA Imputation

+
+

+ Enables accurate prediction of human leukocyte antigen (HLA) genotypes from your uploaded genotyping data using multi-ancestry reference panels. +

+

+ {{#is(loggedIn, false)}} + Run + {{ else }} + Run + {{/is}} +   Learn more +

+
+
+
+
+

Polygenic Score Calculation

+
+

+ You can upload genotyping data and the application imputes your genotypes, performs ancestry estimation and finally calculates Polygenic Risk Scores. +

+

+ {{#is(loggedIn, false)}} + Run + {{ else }} + Run + {{/is}} +   Learn more +

+
+
+
+ +
+
+
+ +

Latest News

+
-
Latest News
-
+

+ 21 May 2021
+ We have increased the max sample size to 110k. +

+ 15 April 2021
+ Update to new framework completed! Currently, max sample size will be limited to 25k, but we expect to lift this limitation in the next few weeks. +

+

+ 18 March 2020
+ Due to coronavirus-related impacts support may be slower than usual. If you haven't heard back from us after a week or so, feel free to e-mail again to check on the status of things. Take care! +

+ 07 November 2019
+ Updated MIS to v1.2.4! Major improvements: Minimac4 for imputation, improved chrX support, QC check right after upload, better documentation. Checkout out our GitHub repository for further information. +

+

+ 17 October 2019
+ Michigan Imputation Server at ASHG19. All information is available here. +

+ +

+ 27 November 2018
+ Redesigned user interface to improve user experience. +

+

27 June 2017
Updated pipeline to v1.0.2. Release notes can be found here.

@@ -58,17 +146,13 @@ 29 Aug 2016
Imputation server paper is out now: Das et al., Nature Genetics 2016

-

- 14 June 2016
- Supporting 23andMe input format. -

19 April 2016
Updated HRC Panel (r1.1) available.

12 January 2016
- New Reference Panel (CAAPA) available. + New Reference Panel (CAAPA) available.

24 April 2015
@@ -80,7 +164,9 @@

- <% !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); %> +
@@ -103,19 +189,19 @@
-

+

Upload your genotypes to our server located in Michigan.
All interactions with the server are secured.
-

+

Choose a reference panel. We will take care of pre-phasing and imputation.
-

+

Download the results.
All results are encrypted with a one-time password. After 7 days, all results are deleted from our server.
@@ -127,7 +213,7 @@
-

Up to date and accurate reference panels

+

Wide-range of reference panels supported

@@ -183,17 +269,17 @@

- +

- Imputation Server is open source and easy to install on your own Hadoop cluster. + Imputation Server is open source and easy to install on your own Hadoop cluster or use Docker.

- +

Host your own confidential reference panels in a secure and private environment. @@ -203,7 +289,7 @@

- +

You have full control about the service. Write us to get more information. diff --git a/pages/images/github.png b/pages/images/github.png new file mode 100644 index 00000000..32d55610 Binary files /dev/null and b/pages/images/github.png differ diff --git a/pom.xml b/pom.xml index fa7f0908..34badd94 100644 --- a/pom.xml +++ b/pom.xml @@ -5,11 +5,8 @@ genepi imputationserver - - 1.7.4 - + 2.0.0 jar - University of Michigan Imputation Server http://maven.apache.org @@ -299,7 +296,7 @@ lukfor pgs-calc - 1.5.4 + 1.6.1 diff --git a/src/main/java/genepi/imputationserver/steps/CompressionEncryption.java b/src/main/java/genepi/imputationserver/steps/CompressionEncryption.java index a5020b9d..b4c6ddce 100644 --- a/src/main/java/genepi/imputationserver/steps/CompressionEncryption.java +++ b/src/main/java/genepi/imputationserver/steps/CompressionEncryption.java @@ -106,159 +106,160 @@ public boolean run(WorkflowContext context) { context.beginTask("Export data..."); - // get sorted directories - List folders = HdfsUtil.getDirectories(output); + if (pgsPanel == null) { - ImputationResults imputationResults = new ImputationResults(folders, phasingOnly); - Map imputedChromosomes = imputationResults.getChromosomes(); + // get sorted directories + List folders = HdfsUtil.getDirectories(output); - Set chromosomes = imputedChromosomes.keySet(); - boolean lastChromosome = false; - int index = 0; + ImputationResults imputationResults = new ImputationResults(folders, phasingOnly); + Map imputedChromosomes = imputationResults.getChromosomes(); - String checksumFilename = FileUtil.path(localOutput, "results.md5"); - LineWriter writer = new LineWriter(checksumFilename); + Set chromosomes = imputedChromosomes.keySet(); + boolean lastChromosome = false; + int index = 0; - for (String name : chromosomes) { + String checksumFilename = FileUtil.path(localOutput, "results.md5"); + LineWriter writer = new LineWriter(checksumFilename); - index++; + for (String name : chromosomes) { - if (index == chromosomes.size()) { - lastChromosome = true; - } + index++; - ImputedChromosome imputedChromosome = imputedChromosomes.get(name); + if (index == chromosomes.size()) { + lastChromosome = true; + } - context.println("Export and merge chromosome " + name); + ImputedChromosome imputedChromosome = imputedChromosomes.get(name); - // create temp dir - String temp = FileUtil.path(localOutput, "temp"); - FileUtil.createDirectory(temp); + context.println("Export and merge chromosome " + name); - // output files + // create temp dir + String temp = FileUtil.path(localOutput, "temp"); + FileUtil.createDirectory(temp); - ArrayList files = new ArrayList(); + // output files - // merge info files - if (!phasingOnly) { - String infoOutput = FileUtil.path(temp, "chr" + name + ".info.gz"); - FileMerger.mergeAndGzInfo(imputedChromosome.getInfoFiles(), infoOutput); - files.add(new File(infoOutput)); - } + ArrayList files = new ArrayList(); - // merge all dosage files + // merge info files + if (!phasingOnly) { + String infoOutput = FileUtil.path(temp, "chr" + name + ".info.gz"); + FileMerger.mergeAndGzInfo(imputedChromosome.getInfoFiles(), infoOutput); + files.add(new File(infoOutput)); + } - String dosageOutput; - if (phasingOnly) { - dosageOutput = FileUtil.path(temp, "chr" + name + ".phased.vcf.gz"); - } else { - dosageOutput = FileUtil.path(temp, "chr" + name + ".dose.vcf.gz"); - } - files.add(new File(dosageOutput)); + // merge all dosage files - MergedVcfFile vcfFile = new MergedVcfFile(dosageOutput); - vcfFile.addHeader(context, imputedChromosome.getHeaderFiles()); + String dosageOutput; + if (phasingOnly) { + dosageOutput = FileUtil.path(temp, "chr" + name + ".phased.vcf.gz"); + } else { + dosageOutput = FileUtil.path(temp, "chr" + name + ".dose.vcf.gz"); + } + files.add(new File(dosageOutput)); - for (String file : imputedChromosome.getDataFiles()) { - context.println("Read file " + file); - vcfFile.addFile(HdfsUtil.open(file)); - HdfsUtil.delete(file); - } + MergedVcfFile vcfFile = new MergedVcfFile(dosageOutput); + vcfFile.addHeader(context, imputedChromosome.getHeaderFiles()); - vcfFile.close(); + for (String file : imputedChromosome.getDataFiles()) { + context.println("Read file " + file); + vcfFile.addFile(HdfsUtil.open(file)); + HdfsUtil.delete(file); + } - // merge all meta files - if (mergeMetaFiles) { + vcfFile.close(); - context.println("Merging meta files..."); + // merge all meta files + if (mergeMetaFiles) { - String dosageMetaOutput = FileUtil.path(temp, "chr" + name + ".empiricalDose.vcf.gz"); - MergedVcfFile vcfFileMeta = new MergedVcfFile(dosageMetaOutput); + context.println("Merging meta files..."); - String headerMetaFile = imputedChromosome.getHeaderMetaFiles().get(0); - context.println("Use header from file " + headerMetaFile); + String dosageMetaOutput = FileUtil.path(temp, "chr" + name + ".empiricalDose.vcf.gz"); + MergedVcfFile vcfFileMeta = new MergedVcfFile(dosageMetaOutput); - vcfFileMeta.addFile(HdfsUtil.open(headerMetaFile)); + String headerMetaFile = imputedChromosome.getHeaderMetaFiles().get(0); + context.println("Use header from file " + headerMetaFile); - for (String file : imputedChromosome.getDataMetaFiles()) { - context.println("Read file " + file); - vcfFileMeta.addFile(HdfsUtil.open(file)); - HdfsUtil.delete(file); - } - vcfFileMeta.close(); + vcfFileMeta.addFile(HdfsUtil.open(headerMetaFile)); - context.println("Meta files merged."); + for (String file : imputedChromosome.getDataMetaFiles()) { + context.println("Read file " + file); + vcfFileMeta.addFile(HdfsUtil.open(file)); + HdfsUtil.delete(file); + } + vcfFileMeta.close(); - files.add(new File(dosageMetaOutput)); - } + context.println("Meta files merged."); - if (sanityCheck.equals("yes") && lastChromosome) { - context.println("Run tabix on chromosome " + name + "..."); - Command tabix = new Command(FileUtil.path(workingDirectory, "bin", "tabix")); - tabix.setSilent(false); - tabix.setParams("-f", dosageOutput); - if (tabix.execute() != 0) { - context.endTask("Error during index creation: " + tabix.getStdOut(), WorkflowContext.ERROR); - return false; + files.add(new File(dosageMetaOutput)); } - context.println("Tabix done."); - } - // create zip file - String fileName = "chr_" + name + ".zip"; - String filePath = FileUtil.path(localOutput, fileName); - File file = new File(filePath); - createEncryptedZipFile(file, files, password, aesEncryption); + if (sanityCheck.equals("yes") && lastChromosome) { + context.println("Run tabix on chromosome " + name + "..."); + Command tabix = new Command(FileUtil.path(workingDirectory, "bin", "tabix")); + tabix.setSilent(false); + tabix.setParams("-f", dosageOutput); + if (tabix.execute() != 0) { + context.endTask("Error during index creation: " + tabix.getStdOut(), WorkflowContext.ERROR); + return false; + } + context.println("Tabix done."); + } - // add checksum to hash file - context.println("Creating file checksum for " + filePath); - long checksumStart = System.currentTimeMillis(); - String checksum = FileChecksum.HashFile(new File(filePath), FileChecksum.Algorithm.MD5); - writer.write(checksum + " " + fileName); - long checksumEnd = (System.currentTimeMillis() - checksumStart) / 1000; - context.println("File checksum for " + filePath + " created in " + checksumEnd + " seconds."); + // create zip file + String fileName = "chr_" + name + ".zip"; + String filePath = FileUtil.path(localOutput, fileName); + File file = new File(filePath); + createEncryptedZipFile(file, files, password, aesEncryption); - // delete temp dir - FileUtil.deleteDirectory(temp); + // add checksum to hash file + context.println("Creating file checksum for " + filePath); + long checksumStart = System.currentTimeMillis(); + String checksum = FileChecksum.HashFile(new File(filePath), FileChecksum.Algorithm.MD5); + writer.write(checksum + " " + fileName); + long checksumEnd = (System.currentTimeMillis() - checksumStart) / 1000; + context.println("File checksum for " + filePath + " created in " + checksumEnd + " seconds."); - IExternalWorkspace externalWorkspace = context.getExternalWorkspace(); + // delete temp dir + FileUtil.deleteDirectory(temp); - if (externalWorkspace != null) { + IExternalWorkspace externalWorkspace = context.getExternalWorkspace(); - long start = System.currentTimeMillis(); + if (externalWorkspace != null) { - context.println("External Workspace '" + externalWorkspace.getName() + "' found"); + long start = System.currentTimeMillis(); - context.println("Start file upload: " + filePath); + context.println("External Workspace '" + externalWorkspace.getName() + "' found"); - String url = externalWorkspace.upload("local", file); + context.println("Start file upload: " + filePath); - long end = (System.currentTimeMillis() - start) / 1000; + String url = externalWorkspace.upload("local", file); - context.println("Upload finished in " + end + " sec. File Location: " + url); + long end = (System.currentTimeMillis() - start) / 1000; - context.println("Add " + localOutput + " to custom download"); + context.println("Upload finished in " + end + " sec. File Location: " + url); - String size = FileUtils.byteCountToDisplaySize(file.length()); + context.println("Add " + localOutput + " to custom download"); - context.addDownload("local", fileName, size, url); + String size = FileUtils.byteCountToDisplaySize(file.length()); - FileUtil.deleteFile(filePath); + context.addDownload("local", fileName, size, url); - context.println("File deleted: " + filePath); + FileUtil.deleteFile(filePath); - } else { - context.println("No external Workspace set."); - } - } + context.println("File deleted: " + filePath); - writer.close(); + } else { + context.println("No external Workspace set."); + } + } - // delete temporary files - HdfsUtil.delete(output); + writer.close(); - // Export calculated risk scores - if (pgsPanel != null) { + // delete temporary files + HdfsUtil.delete(output); + } else { + // Export calculated risk scores context.println("Exporting PGS scores..."); @@ -310,7 +311,7 @@ public boolean run(WorkflowContext context) { String fileName = "scores.zip"; String filePath = FileUtil.path(pgsOutput, fileName); File file = new File(filePath); - createEncryptedZipFile(file, new File(outputFileScores), password, aesEncryption); + createZipFile(file, new File(outputFileScores)); context.println("Exported PGS scores to " + fileName + "."); @@ -354,7 +355,7 @@ public boolean run(WorkflowContext context) { String fileNameReport = "scores.report.zip"; File fileReport = new File(FileUtil.path(pgsOutput, fileNameReport)); - createEncryptedZipFileFromFolder(fileReport, new File(extendedHtmlFolder), password, aesEncryption); + createZipFile(fileReport, new File(extendedHtmlFolder)); context.println("Created reports " + outputFileHtml + " and " + fileReport.getPath() + "."); @@ -385,26 +386,41 @@ public boolean run(WorkflowContext context) { Object mail = context.getData("cloudgene.user.mail"); Object name = context.getData("cloudgene.user.name"); - if (mail != null) { + if (mail != null && !mail.toString().isEmpty()) { String subject = "Job " + context.getJobId() + " is complete."; - String message = "Dear " + name + ",\nthe password for the imputation results is: " + password - + "\n\nThe results can be downloaded from " + serverUrl + "/start.html#!jobs/" - + context.getJobId() + "/results"; + String message = ""; + if (pgsPanel == null) { + message = "Dear " + name + ",\nthe password for the imputation results is: " + password + + "\n\nThe results can be downloaded from " + serverUrl + "/start.html#!jobs/" + + context.getJobId() + "/results"; + } else { + message = "Dear " + name + ",\nThe results can be downloaded from " + serverUrl + "/start.html#!jobs/" + + context.getJobId() + "/results"; + } try { context.sendMail(subject, message); - context.ok("We have sent an email to " + mail + " with the password."); + if (pgsPanel == null) { + context.ok("We have sent an email to " + mail + " with the password."); + } else { + context.ok("We have sent a notification email to " + mail + "."); + } return true; } catch (Exception e) { - context.println("Data compression failed: " + ExceptionUtils.getStackTrace(e)); - context.error("Data compression failed: " + e.getMessage()); + context.println("Sending notification email failed: " + ExceptionUtils.getStackTrace(e)); + context.error("Sending notification email failed: " + e.getMessage()); return false; } } else { - context.error("No email address found. Please enter your email address (Account -> Profile)."); - return false; + if (pgsPanel == null) { + context.error("No email address found. Please enter your email address (Account -> Profile)."); + return false; + } else { + context.ok("PGS report created successfully."); + return true; + } } } else { @@ -477,4 +493,15 @@ public void createEncryptedZipFileFromFolder(File file, File folder, String pass zipFile.close(); } + public void createZipFile(File file, File folder) throws IOException { + ZipFile zipFile = new ZipFile(file); + if (folder.isFile()){ + zipFile.addFile(folder); + } else { + zipFile.addFolder(folder); + } + zipFile.close(); + } + + } diff --git a/src/main/java/genepi/imputationserver/steps/FastQualityControl.java b/src/main/java/genepi/imputationserver/steps/FastQualityControl.java index 6acea130..5456093b 100644 --- a/src/main/java/genepi/imputationserver/steps/FastQualityControl.java +++ b/src/main/java/genepi/imputationserver/steps/FastQualityControl.java @@ -189,6 +189,7 @@ public boolean run(WorkflowContext context) { double sampleCallrate = panel.getQcFilterByKey("sampleCallrate"); double mixedGenotypesChrX = panel.getQcFilterByKey("mixedGenotypeschrX"); int strandFlips = (int) (panel.getQcFilterByKey("strandFlips")); + int alleleSwitches = (int) (panel.getQcFilterByKey("alleleSwitches")); String ranges = panel.getRange(); if (ranges != null) { @@ -336,6 +337,15 @@ else if (task.getStrandFlipSimple() + task.getStrandFlipAndAlleleSwitch() > stra return false; } + // Check if too many allele switches are detected + else if (task.getAlleleSwitch() > alleleSwitches) { + text.append("
Error: More than " + alleleSwitches + + " allele switches have been detected. Imputation cannot be started!"); + context.error(text.toString()); + + return false; + } + else if (task.isChrXMissingRate()) { text.append( "
Error: Chromosome X nonPAR region includes > 10 % mixed genotypes. Imputation cannot be started!"); diff --git a/src/main/java/genepi/imputationserver/steps/Imputation.java b/src/main/java/genepi/imputationserver/steps/Imputation.java index 173cbaa6..f5050c48 100644 --- a/src/main/java/genepi/imputationserver/steps/Imputation.java +++ b/src/main/java/genepi/imputationserver/steps/Imputation.java @@ -19,6 +19,7 @@ import genepi.imputationserver.util.RefPanelList; import genepi.io.FileUtil; import genepi.io.text.LineReader; +import genepi.riskscore.commands.FilterMetaCommand; public class Imputation extends ParallelHadoopJobStep { @@ -63,6 +64,7 @@ public boolean run(WorkflowContext context) { String mode = context.get("mode"); String phasing = context.get("phasing"); PgsPanel pgsPanel = PgsPanel.loadFromProperties(context.getData("pgsPanel")); + String pgsCategory = context.get("pgsCategory"); String r2Filter = context.get("r2Filter"); if (r2Filter == null) { @@ -123,10 +125,29 @@ public boolean run(WorkflowContext context) { context.println(" " + entry.getKey() + "/" + entry.getValue()); } } + + String includeScoreFilenameHdfs = null; if (pgsPanel != null) { - context.println(" PGS: " + pgsPanel.getScores().size() + " scores"); + context.println(" PGS: " + FileUtil.getFilename(pgsPanel.getScores())); + + if (pgsCategory != null && !pgsCategory.isEmpty() && !pgsCategory.equals("all")) { + String includeScoreFilename = FileUtil.path(context.getLocalTemp(), "include-scores.txt"); + FilterMetaCommand filter = new FilterMetaCommand(); + filter.setCategory(pgsCategory); + filter.setMeta(pgsPanel.getMeta()); + filter.setOut(includeScoreFilename); + int result = 0; + try { + result = filter.call(); + } catch (Exception e) { + throw new RuntimeException(e); + } + includeScoreFilenameHdfs = HdfsUtil.path(context.getHdfsTemp(), "include-scores.txt"); + HdfsUtil.put(includeScoreFilename, includeScoreFilenameHdfs); + } + } else { - context.println(" PGS: no scores selected"); + context.println(" PGS: no score file selected"); } // execute one job per chromosome @@ -229,6 +250,9 @@ protected void readConfigFile() { } if (pgsPanel != null) { + if (includeScoreFilenameHdfs != null) { + job.setIncludeScoreFilenameHDFS(includeScoreFilenameHdfs); + } job.setScores(pgsPanel.getScores()); } job.setRefPanel(reference); diff --git a/src/main/java/genepi/imputationserver/steps/InputValidation.java b/src/main/java/genepi/imputationserver/steps/InputValidation.java index 4db36e3a..11958a36 100644 --- a/src/main/java/genepi/imputationserver/steps/InputValidation.java +++ b/src/main/java/genepi/imputationserver/steps/InputValidation.java @@ -236,7 +236,7 @@ private boolean checkVcfFiles(WorkflowContext context) { + (phased ? "phased" : "unphased") + "\n" + "Build: " + (build == null ? "hg19" : build) + "\n" + "Reference Panel: " + reference + " (" + panel.getBuild() + ")" + "\n" + "Population: " + population + "\n" + "Phasing: " + phasing + "\n" + "Mode: " + mode - + (pgsPanel != null ? "\n" + "PGS-Calculation: " + pgsPanel.getScores().size() + " scores" + + (pgsPanel != null ? "\n" + "PGS-Calculation: " + context.get("pgsPanel") + " (" + context.get("pgsCategory") + ")" : ""); if (r2Filter != null && !r2Filter.isEmpty() && !r2Filter.equals("0")) { diff --git a/src/main/java/genepi/imputationserver/steps/ancestry/TraceStep.java b/src/main/java/genepi/imputationserver/steps/ancestry/TraceStep.java index 0935dca3..c2158afc 100644 --- a/src/main/java/genepi/imputationserver/steps/ancestry/TraceStep.java +++ b/src/main/java/genepi/imputationserver/steps/ancestry/TraceStep.java @@ -128,7 +128,7 @@ public boolean prepareTraceJobs(WorkflowContext context) { HdfsUtil.put(mergedFile, HdfsUtil.path(vcfHdfsDir, "study.merged.vcf.gz")); // read number of samples from first vcf file - VcfFile vcfFile = VcfFileUtil.load(mergedFile, 200000, false); + VcfFile vcfFile = VcfFileUtil.load(files[0], 200000, false); int nIndividuals = vcfFile.getNoSamples(); int batch = 0; @@ -168,8 +168,9 @@ public boolean prepareTraceJobs(WorkflowContext context) { return true; } catch (IOException e) { - context.error("An internal server error occured."); - e.printStackTrace(); + + context.error("An internal server error occurred.\n" + exceptionToString(e)); + } context.error("Execution failed. Please, contact administrator."); @@ -209,8 +210,7 @@ public boolean checkDataAndMerge(WorkflowContext context, String[] files, String return true; } catch (IOException e) { - context.error("Input Validation failed: " + e); - e.printStackTrace(); + context.error("Input Validation failed:\n" + exceptionToString(e)); return false; } } @@ -288,21 +288,11 @@ public boolean estimateAncestries(WorkflowContext context) { return true; - } catch (IOException e) { - context.error("An internal server error occured while launching Hadoop job."); - e.printStackTrace(); - } catch (InterruptedException e) { - context.error("An internal server error occured while launching Hadoop job."); - e.printStackTrace(); - } catch (ClassNotFoundException e) { - context.error("An internal server error occured while launching Hadoop job."); - e.printStackTrace(); - } catch (URISyntaxException e) { - context.error("An internal server error occured while launching Hadoop job."); - e.printStackTrace(); + } catch (IOException | InterruptedException | ClassNotFoundException | URISyntaxException e) { + context.error("An internal server error occurred while launching Hadoop job.\n" + exceptionToString(e)); } - context.error("Execution failed. Please, contact administrator."); + context.error("Execution failed. Please, contact administrator."); return false; } @@ -326,37 +316,43 @@ public void progress(String message) { } return results; } catch (InterruptedException e) { - e.printStackTrace(); TaskResults result = new TaskResults(); result.setSuccess(false); result.setMessage(e.getMessage()); - StringWriter s = new StringWriter(); - e.printStackTrace(new PrintWriter(s)); - context.println("Task '" + task.getName() + "' failed.\nException:" + s.toString()); + context.println("Task '" + task.getName() + "' failed.\nException:" + exceptionToString(e)); context.endTask(e.getMessage(), WorkflowContext.ERROR); return result; } catch (Exception e) { - e.printStackTrace(); TaskResults result = new TaskResults(); result.setSuccess(false); result.setMessage(e.getMessage()); - StringWriter s = new StringWriter(); - e.printStackTrace(new PrintWriter(s)); - context.println("Task '" + task.getName() + "' failed.\nException:" + s.toString()); + context.println("Task '" + task.getName() + "' failed.\nException:" + exceptionToString(e)); context.endTask(task.getName() + " failed.", WorkflowContext.ERROR); return result; } catch (Error e) { - e.printStackTrace(); TaskResults result = new TaskResults(); result.setSuccess(false); result.setMessage(e.getMessage()); - StringWriter s = new StringWriter(); - e.printStackTrace(new PrintWriter(s)); - context.println("Task '" + task.getName() + "' failed.\nException:" + s.toString()); + context.println("Task '" + task.getName() + "' failed.\nException:" + exceptionToString(e)); context.endTask(task.getName() + " failed.", WorkflowContext.ERROR); return result; } } + private static String exceptionToString(Exception e) { + StringWriter sw = new StringWriter(); + PrintWriter pw = new PrintWriter(sw); + e.printStackTrace(pw); + return sw.toString(); + } + + private static String exceptionToString(Error e) { + StringWriter sw = new StringWriter(); + PrintWriter pw = new PrintWriter(sw); + e.printStackTrace(pw); + return sw.toString(); + } + } + diff --git a/src/main/java/genepi/imputationserver/steps/imputation/ImputationJob.java b/src/main/java/genepi/imputationserver/steps/imputation/ImputationJob.java index 1c3ca88c..abbefb70 100644 --- a/src/main/java/genepi/imputationserver/steps/imputation/ImputationJob.java +++ b/src/main/java/genepi/imputationserver/steps/imputation/ImputationJob.java @@ -2,7 +2,6 @@ import java.io.IOException; import java.util.List; -import java.util.stream.Collectors; import org.apache.commons.logging.Log; import org.apache.hadoop.io.Text; @@ -44,7 +43,9 @@ public class ImputationJob extends HadoopJob { public static final String PHASING_ENGINE = "PHASING_ENGINE"; - public static final String SCORES = "SCORES"; + public static final String SCORE_FILE = "SCORES"; + + public static final String INCLUDE_SCORE_FILE = "INCLUDE_SCORE_FILE"; private String refPanelHdfs; @@ -62,7 +63,9 @@ public class ImputationJob extends HadoopJob { private String binariesHDFS; - private List scores; + private String scores; + + private String includeScoreFilenameHDFS; public ImputationJob(String name, Log log) { super(name, log); @@ -168,20 +171,33 @@ protected void setupDistributedCache(CacheStore cache) throws IOException { } } - // add scores to cache3 + // add scores to cache if (scores != null) { - log.info("Add " + scores.size() + " scores to distributed cache..."); - for (String score : scores) { - if (HdfsUtil.exists(score)) { - cache.addFile(score); - if (HdfsUtil.exists(score + ".format")) { - cache.addFile(score + ".format"); - } + log.info("Add " + scores + " scores to distributed cache..."); + if (HdfsUtil.exists(scores)) { + cache.addFile(scores); + if (HdfsUtil.exists(scores + ".info")) { + log.info("Add " + scores + ".info to distributed cache..."); + cache.addFile(scores + ".info"); + } + if (HdfsUtil.exists(scores + ".tbi")) { + log.info("Add " + scores + ".tbi to distributed cache..."); + cache.addFile(scores + ".tbi"); + } + } else { + log.info("PGS score file '" + scores + "' not found."); + throw new IOException("PGS score file '" + scores + "' not found."); + } + + if (includeScoreFilenameHDFS != null){ + if (HdfsUtil.exists(includeScoreFilenameHDFS)) { + cache.addFile(includeScoreFilenameHDFS); } else { - log.info("PGS score file '" + score + "' not found."); - throw new IOException("PGS score file '" + score + "' not found."); + log.info("Include score file '" + scores + "' not found."); + throw new IOException("Include score file '" + scores + "' not found."); } } + log.info("All scores added to distributed cache."); } @@ -283,10 +299,8 @@ public void setPhasingEngine(String phasing) { set(PHASING_ENGINE, phasing); } - public void setScores(List scores) { - - String scoresNames = scores.stream().collect(Collectors.joining(",")); - set(SCORES, scoresNames); + public void setScores(String scores) { + set(SCORE_FILE, scores); this.scores = scores; } @@ -294,4 +308,8 @@ public void setBinariesHDFS(String binariesHDFS) { this.binariesHDFS = binariesHDFS; } + public void setIncludeScoreFilenameHDFS(String includeScoreFilenameHDFS) { + set(INCLUDE_SCORE_FILE, includeScoreFilenameHDFS); + this.includeScoreFilenameHDFS = includeScoreFilenameHDFS; + } } \ No newline at end of file diff --git a/src/main/java/genepi/imputationserver/steps/imputation/ImputationMapper.java b/src/main/java/genepi/imputationserver/steps/imputation/ImputationMapper.java index 35c26be6..77a896be 100644 --- a/src/main/java/genepi/imputationserver/steps/imputation/ImputationMapper.java +++ b/src/main/java/genepi/imputationserver/steps/imputation/ImputationMapper.java @@ -33,7 +33,9 @@ public class ImputationMapper extends Mapper { private String outputScores; - private String[] scores; + private String scores; + + private String includeScoresFilename = null; private String refFilename = ""; @@ -117,7 +119,6 @@ protected void setup(Context context) throws IOException, InterruptedException { String referenceName = parameters.get(ImputationJob.REF_PANEL); imputationParameters.setPhasing(phasingEngine); imputationParameters.setReferencePanelName(referenceName); - imputationParameters.setMinR2(minR2); imputationParameters.setPhasingRequired(phasingRequired); // get cached files @@ -153,11 +154,11 @@ protected void setup(Context context) throws IOException, InterruptedException { mapBeagleFilename = cache.getFile(mapBeagle); } - String minimacCommand = cache.getFile("Minimac4"); + String minimacCommand = cache.getFile("minimac4"); String eagleCommand = cache.getFile("eagle"); String beagleCommand = cache.getFile("beagle.jar"); String tabixCommand = cache.getFile("tabix"); - + // create temp directory DefaultPreferenceStore store = new DefaultPreferenceStore(context.getConfiguration()); folder = store.getString("minimac.tmp"); @@ -169,28 +170,35 @@ protected void setup(Context context) throws IOException, InterruptedException { } // scores - String scoresFilenames = parameters.get(ImputationJob.SCORES); - if (scoresFilenames != null) { - String[] filenames = scoresFilenames.split(","); - scores = new String[filenames.length]; - for (int i = 0; i < scores.length; i++) { - String filename = filenames[i]; - String name = FileUtil.getFilename(filename); - String localFilename = cache.getFile(name); - scores[i] = localFilename; - // check if score file has format file - String formatFile = cache.getFile(name + ".format"); - if (formatFile != null) { - // create symbolic link to format file. they have to be in the same folder - Files.createSymbolicLink(Paths.get(FileUtil.path(folder,name)), Paths.get(localFilename)); - Files.createSymbolicLink(Paths.get(FileUtil.path(folder,name+".format")), Paths.get(formatFile)); - scores[i] = FileUtil.path(folder,name); - } + String scoresFilename = parameters.get(ImputationJob.SCORE_FILE); + if (scoresFilename != null) { + String name = FileUtil.getFilename(scoresFilename); + String localFilename = cache.getFile(name); + scores = localFilename; + // check if score file has info and tbi file + String infoFile = cache.getFile(name + ".info"); + String tbiFile = cache.getFile(name + ".tbi"); + if (infoFile != null && tbiFile != null) { + // create symbolic link to format file. they have to be in the same folder + Files.createSymbolicLink(Paths.get(FileUtil.path(folder, name)), Paths.get(localFilename)); + Files.createSymbolicLink(Paths.get(FileUtil.path(folder, name + ".info")), Paths.get(infoFile)); + Files.createSymbolicLink(Paths.get(FileUtil.path(folder, name + ".tbi")), Paths.get(tbiFile)); + scores = FileUtil.path(folder, name); + } else { + throw new IOException("*info or *tbi file not available"); } - System.out.println("Loaded " + scores.length + " score files from distributed cache"); + System.out.println("Loaded " + FileUtil.getFilename(scoresFilename) + " from distributed cache"); + + String hdfsIncludeScoresFilename = parameters.get(ImputationJob.INCLUDE_SCORE_FILE); + if (hdfsIncludeScoresFilename != null){ + String includeScoresName = FileUtil.getFilename(hdfsIncludeScoresFilename); + includeScoresFilename = cache.getFile(includeScoresName); + } + + } else { - System.out.println("No scores files et."); + System.out.println("No scores file set."); } // create symbolic link --> index file is in the same folder as data @@ -212,6 +220,7 @@ protected void setup(Context context) throws IOException, InterruptedException { int phasingWindow = Integer.parseInt(store.getString("phasing.window")); int window = Integer.parseInt(store.getString("minimac.window")); + int decay = Integer.parseInt(store.getString("minimac.decay")); String minimacParams = store.getString("minimac.command"); String eagleParams = store.getString("eagle.command"); @@ -226,6 +235,8 @@ protected void setup(Context context) throws IOException, InterruptedException { pipeline.setPhasingWindow(phasingWindow); pipeline.setBuild(build); pipeline.setMinimacWindow(window); + pipeline.setMinR2(minR2); + pipeline.setDecay(decay); } @@ -262,6 +273,7 @@ public void map(LongWritable key, Text value, Context context) throws IOExceptio pipeline.setPhasingEngine(phasingEngine); pipeline.setPhasingOnly(phasingOnly); pipeline.setScores(scores); + pipeline.setIncludeScoreFilename(includeScoresFilename); boolean succesful = pipeline.execute(chunk, outputChunk); ImputationStatistic statistics = pipeline.getStatistic(); @@ -288,17 +300,12 @@ public void map(LongWritable key, Text value, Context context) throws IOExceptio statistics.setImportTime((end - start) / 1000); - } else { - if (imputationParameters.getMinR2() > 0) { - // filter by r2 - String filteredInfoFilename = outputChunk.getInfoFilename() + "_filtered"; - filterInfoFileByR2(outputChunk.getInfoFilename(), filteredInfoFilename, - imputationParameters.getMinR2()); - HdfsUtil.put(filteredInfoFilename, HdfsUtil.path(output, chunk + ".info")); + } - } else { - HdfsUtil.put(outputChunk.getInfoFilename(), HdfsUtil.path(output, chunk + ".info")); - } + // push results only if not in PGS mode + else if (scores == null) { + + HdfsUtil.put(outputChunk.getInfoFilename(), HdfsUtil.path(output, chunk + ".info")); long start = System.currentTimeMillis(); @@ -328,9 +335,7 @@ public void map(LongWritable key, Text value, Context context) throws IOExceptio System.out.println("Time filter and put: " + (end - start) + " ms"); - } - - if (scores != null) { + } else { HdfsUtil.put(outputChunk.getScoreFilename(), HdfsUtil.path(outputScores, chunk + ".scores.txt")); HdfsUtil.put(outputChunk.getScoreFilename() + ".json", @@ -359,41 +364,4 @@ public void map(LongWritable key, Text value, Context context) throws IOExceptio } } - public void filterInfoFileByR2(String input, String output, double minR2) throws IOException { - - LineReader readerInfo = new LineReader(input); - LineWriter writerInfo = new LineWriter(output); - - readerInfo.next(); - String header = readerInfo.get(); - - // find index for Rsq - String[] headerTiles = header.split("\t"); - int index = -1; - for (int i = 0; i < headerTiles.length; i++) { - if (headerTiles[i].equals("Rsq")) { - index = i; - } - } - - writerInfo.write(header); - - while (readerInfo.next()) { - String line = readerInfo.get(); - String[] tiles = line.split("\t"); - String value = tiles[index]; - try { - double r2 = Double.parseDouble(value); - if (r2 > minR2) { - writerInfo.write(line); - } - } catch (NumberFormatException e) { - writerInfo.write(line); - } - } - - readerInfo.close(); - writerInfo.close(); - - } } diff --git a/src/main/java/genepi/imputationserver/steps/imputation/ImputationPipeline.java b/src/main/java/genepi/imputationserver/steps/imputation/ImputationPipeline.java index 428ead2f..61e9a505 100644 --- a/src/main/java/genepi/imputationserver/steps/imputation/ImputationPipeline.java +++ b/src/main/java/genepi/imputationserver/steps/imputation/ImputationPipeline.java @@ -24,9 +24,9 @@ public class ImputationPipeline { - public static final String PIPELINE_VERSION = "michigan-imputationserver-1.7.4"; + public static final String PIPELINE_VERSION = "michigan-imputationserver-2.0.0"; - public static final String IMPUTATION_VERSION = "minimac4-1.0.2"; + public static final String IMPUTATION_VERSION = "minimac-v4.1.6"; public static final String BEAGLE_VERSION = "beagle.18May20.d20.jar"; @@ -48,8 +48,12 @@ public class ImputationPipeline { private int minimacWindow; + private int minimacDecay; + private int phasingWindow; + private double minR2; + private String refFilename; private String mapMinimac; @@ -62,13 +66,15 @@ public class ImputationPipeline { private String mapBeagleFilename = ""; + private String includeScoreFilename = null; + private String build = "hg19"; private boolean phasingOnly; private String phasingEngine = ""; - private String[] scores; + private String scores; private ImputationStatistic statistic = new ImputationStatistic(); @@ -172,7 +178,7 @@ public boolean execute(VcfChunk chunk, VcfChunkOutput output) throws Interrupted return false; } - if (scores != null && scores.length >= 0) { + if (scores != null) { System.out.println(" Starting PGS calculation '" + scores + "'..."); @@ -288,6 +294,16 @@ public boolean phaseWithBeagle(VcfChunk input, VcfChunkOutput output, String ref public boolean imputeVCF(VcfChunkOutput output) throws InterruptedException, IOException, CompilationFailedException { + // create tabix index + Command tabix = new Command(tabixCommand); + tabix.setSilent(false); + tabix.setParams(output.getPhasedVcfFilename()); + System.out.println("Command: " + tabix.getExecutedCommand()); + if (tabix.execute() != 0) { + System.out.println("Error during index creation: " + tabix.getStdOut()); + return false; + } + String chr = ""; if (build.equals("hg38")) { chr = "chr" + output.getChromosome(); @@ -306,6 +322,8 @@ public boolean imputeVCF(VcfChunkOutput output) binding.put("chr", chr); binding.put("unphased", false); binding.put("mapMinimac", mapMinimac); + binding.put("minR2", minR2); + binding.put("decay", minimacDecay); String[] params = createParams(minimacParams, binding); @@ -330,7 +348,7 @@ private boolean runPgsCalc(VcfChunkOutput output) { String cacheDir = new File(output.getScoreFilename()).getParent(); PGSCatalog.CACHE_DIR = cacheDir; - if (scores == null || scores.length == 0) { + if (scores == null) { System.out.println("PGS calcuation failed. No score files set. "); return false; } @@ -344,25 +362,20 @@ private boolean runPgsCalc(VcfChunkOutput output) { ApplyScoreTask task = new ApplyScoreTask(); task.setVcfFilename(output.getImputedVcfFilename()); task.setChunk(scoreChunk); - task.setRiskScoreFilenames(scores); - - //TODO: enable fix-strand-flips - //task.setFixStrandFlips(true); - //task.setRemoveAmbiguous(true); - - for (String file : scores) { - String autoFormat = file + ".format"; - if (new File(autoFormat).exists()) { - task.setRiskScoreFormat(file, RiskScoreFormat.MAPPING_FILE); - } + task.setRiskScoreFilenames(new String[] { scores }); + if (includeScoreFilename != null && !includeScoreFilename.isEmpty()){ + task.setIncludeScoreFilename(includeScoreFilename); } + // TODO: enable fix-strand-flips + // task.setFixStrandFlips(true); + // task.setRemoveAmbiguous(true); + task.setOutputReportFilename(output.getScoreFilename() + ".json"); task.setOutput(output.getScoreFilename()); TaskService.setAnsiSupport(false); List runningTasks = TaskService.run(task); - for (Task runningTask : runningTasks) { if (!runningTask.getStatus().isSuccess()) { System.out.println("PGS-Calc failed: " + runningTask.getStatus().getThrowable()); @@ -407,6 +420,10 @@ public void setRefBeagleFilename(String refBeagleFilename) { this.refBeagleFilename = refBeagleFilename; } + public void setIncludeScoreFilename(String includeScoreFilename) { + this.includeScoreFilename = includeScoreFilename; + } + public void setMinimacCommand(String minimacCommand, String minimacParams) { this.minimacCommand = minimacCommand; this.minimacParams = minimacParams; @@ -442,7 +459,7 @@ public void setPhasingOnly(boolean phasingOnly) { this.phasingOnly = phasingOnly; } - public void setScores(String[] scores) { + public void setScores(String scores) { this.scores = scores; } @@ -474,4 +491,13 @@ public void setMapBeagleFilename(String mapBeagleFilename) { this.mapBeagleFilename = mapBeagleFilename; } + public void setMinR2(double minR2) { + this.minR2 = minR2; + } + + public void setDecay(int decay) { + this.minimacDecay = decay; + + } + } diff --git a/src/main/java/genepi/imputationserver/util/DefaultPreferenceStore.java b/src/main/java/genepi/imputationserver/util/DefaultPreferenceStore.java index 01852193..5d086e91 100644 --- a/src/main/java/genepi/imputationserver/util/DefaultPreferenceStore.java +++ b/src/main/java/genepi/imputationserver/util/DefaultPreferenceStore.java @@ -71,11 +71,12 @@ public static Properties defaults() { defaults.setProperty("chunksize", "20000000"); defaults.setProperty("phasing.window", "5000000"); defaults.setProperty("minimac.window", "500000"); + defaults.setProperty("minimac.decay", "0"); defaults.setProperty("minimac.sendmail", "no"); defaults.setProperty("server.url", "https://imputationserver.sph.umich.edu"); defaults.setProperty("minimac.tmp", "/tmp"); defaults.setProperty("minimac.command", - "--refHaps ${ref} --haps ${vcf} --start ${start} --end ${end} --window ${window} --prefix ${prefix} --chr ${chr} --cpus 1 --noPhoneHome --format GT,DS,GP --allTypedSites --meta --minRatio 0.00001 ${chr =='MT' ? '--myChromosome ' + chr : ''} ${unphased ? '--unphasedOutput' : ''} ${mapMinimac != null ? '--referenceEstimates --map ' + mapMinimac : ''}"); + "--region ${chr}:${start}-${end} --overlap ${window} --output ${prefix}.dose.vcf.gz --output-format vcf.gz --format GT,DS,GP,HDS --min-ratio 0.00001 --decay ${decay} --all-typed-sites --sites ${prefix}.info --empirical-output ${prefix}.empiricalDose.vcf.gz ${minR2 != 0 ? '--min-r2 ' + minR2 : ''} ${mapMinimac != null ? '--map ' + mapMinimac : ''} ${ref} ${vcf}"); defaults.setProperty("eagle.command", "--vcfRef ${ref} --vcfTarget ${vcf} --geneticMapFile ${map} --outPrefix ${prefix} --bpStart ${start} --bpEnd ${end} --allowRefAltSwap --vcfOutFormat z --keepMissingPloidyX"); defaults.setProperty("beagle.command", diff --git a/src/main/java/genepi/imputationserver/util/FileMerger.java b/src/main/java/genepi/imputationserver/util/FileMerger.java index d78413af..7acdafe3 100644 --- a/src/main/java/genepi/imputationserver/util/FileMerger.java +++ b/src/main/java/genepi/imputationserver/util/FileMerger.java @@ -24,35 +24,18 @@ public static void splitIntoHeaderAndData(String input, OutputStream outHeader, while (reader.next()) { String line = reader.get(); + if (!line.startsWith("#")) { - if (parameters.getMinR2() > 0) { - // rsq set. parse line and check rsq - String info = parseInfo(line); - if (info != null) { - boolean keep = keepVcfLineByInfo(info, R2_FLAG, parameters.getMinR2()); - if (keep) { - outData.write(line.getBytes()); - outData.write("\n".getBytes()); - } - } else { - // no valid vcf line. keep line - outData.write(line.getBytes()); - outData.write("\n".getBytes()); - } - } else { - // no rsq set. keep all lines without parsing - outData.write(line.getBytes()); - outData.write("\n".getBytes()); - } + outData.write(line.getBytes()); + outData.write("\n".getBytes()); } else { // write filter command before ID List starting with #CHROM if (line.startsWith("#CHROM")) { - outHeader.write(("##pipeline=" + ImputationPipeline.PIPELINE_VERSION + "\n").getBytes()); - outHeader.write(("##imputation=" + ImputationPipeline.IMPUTATION_VERSION + "\n").getBytes()); - outHeader.write(("##phasing=" + parameters.getPhasingMethod() + "\n").getBytes()); - outHeader.write(("##panel=" + parameters.getReferencePanelName() + "\n").getBytes()); - outHeader.write(("##r2Filter=" + parameters.getMinR2() + "\n").getBytes()); + outHeader.write(("##mis_pipeline=" + ImputationPipeline.PIPELINE_VERSION + "\n").getBytes()); + outHeader.write(("##mis_imputation=" + ImputationPipeline.IMPUTATION_VERSION + "\n").getBytes()); + outHeader.write(("##mis_phasing=" + parameters.getPhasingMethod() + "\n").getBytes()); + outHeader.write(("##mis_panel=" + parameters.getReferencePanelName() + "\n").getBytes()); } // write all headers except minimac4 command @@ -85,9 +68,9 @@ public static void splitPhasedIntoHeaderAndData(String input, OutputStream outHe // write filter command before ID List starting with #CHROM if (line.startsWith("#CHROM")) { - outHeader.write(("##pipeline=" + ImputationPipeline.PIPELINE_VERSION + "\n").getBytes()); - outHeader.write(("##phasing=" + parameters.getPhasingMethod() + "\n").getBytes()); - outHeader.write(("##panel=" + parameters.getReferencePanelName() + "\n").getBytes()); + outHeader.write(("##mis_pipeline=" + ImputationPipeline.PIPELINE_VERSION + "\n").getBytes()); + outHeader.write(("##mis_phasing=" + parameters.getPhasingMethod() + "\n").getBytes()); + outHeader.write(("##mis_panel=" + parameters.getReferencePanelName() + "\n").getBytes()); } // write all headers except eagle command @@ -129,24 +112,30 @@ public static void mergeAndGzInfo(List hdfs, String local) throws IOExce LineReader reader = new LineReader(in); - boolean header = true; + boolean lineBreak = false; while (reader.next()) { String line = reader.get(); - if (header) { + if (line.startsWith("#")) { + if (firstFile) { + + if (lineBreak) { + out.write('\n'); + } out.write(line.toString().getBytes()); - firstFile = false; + lineBreak = true; } - header = false; } else { out.write('\n'); out.write(line.toString().getBytes()); } } + firstFile = false; + in.close(); } diff --git a/src/main/java/genepi/imputationserver/util/ImputationParameters.java b/src/main/java/genepi/imputationserver/util/ImputationParameters.java index 24116c5a..671efb6b 100644 --- a/src/main/java/genepi/imputationserver/util/ImputationParameters.java +++ b/src/main/java/genepi/imputationserver/util/ImputationParameters.java @@ -6,8 +6,6 @@ public class ImputationParameters { private String referencePanelName; - private double minR2; - private String phasing; private boolean phasingRequired; @@ -20,14 +18,6 @@ public void setReferencePanelName(String referencePanelName) { this.referencePanelName = referencePanelName; } - public double getMinR2() { - return minR2; - } - - public void setMinR2(double minR2) { - this.minR2 = minR2; - } - public String getPhasing() { return phasing; } diff --git a/src/main/java/genepi/imputationserver/util/PgsPanel.java b/src/main/java/genepi/imputationserver/util/PgsPanel.java index 73757ae8..65175c25 100644 --- a/src/main/java/genepi/imputationserver/util/PgsPanel.java +++ b/src/main/java/genepi/imputationserver/util/PgsPanel.java @@ -1,8 +1,6 @@ package genepi.imputationserver.util; -import java.util.List; import java.util.Map; -import java.util.Vector; import genepi.hadoop.HdfsUtil; @@ -14,7 +12,7 @@ public class PgsPanel { private String meta = null; - private List scores = new Vector<>(); + private String scores = null; private PgsPanel() { @@ -35,8 +33,7 @@ public static PgsPanel loadFromProperties(Object properties) { panel.meta = map.get("meta").toString(); } if (map.containsKey("scores")) { - List list = (List) map.get("scores"); - panel.scores = list; + panel.scores = map.get("scores").toString(); return panel; } else { return null; @@ -47,11 +44,8 @@ public static PgsPanel loadFromProperties(Object properties) { } - public List getScores() { - List scoresPath = new Vector(); - for (String score : scores) { - scoresPath.add(HdfsUtil.path(location, score)); - } + public String getScores() { + String scoresPath = HdfsUtil.path(scores); return scoresPath; } diff --git a/src/main/java/genepi/imputationserver/util/RefPanel.java b/src/main/java/genepi/imputationserver/util/RefPanel.java index 1f009f48..31f57692 100644 --- a/src/main/java/genepi/imputationserver/util/RefPanel.java +++ b/src/main/java/genepi/imputationserver/util/RefPanel.java @@ -13,6 +13,7 @@ public class RefPanel { + public static final String ALLELE_SWITCHES = String.valueOf(Integer.MAX_VALUE); public static final String STRAMD_FLIPS = "100"; public static final String SAMPLE_CALL_RATE = "0.5"; public static final String MIN_SNPS = "3"; @@ -57,6 +58,7 @@ public RefPanel() { defaultQcFilter.put("sampleCallrate", SAMPLE_CALL_RATE); defaultQcFilter.put("mixedGenotypeschrX", CHR_X_MIXED_GENOTYPES); defaultQcFilter.put("strandFlips", STRAMD_FLIPS); + defaultQcFilter.put("alleleSwitches", ALLELE_SWITCHES); } public String getId() { diff --git a/src/test/java/genepi/imputationserver/steps/FastQualityControlTest.java b/src/test/java/genepi/imputationserver/steps/FastQualityControlTest.java index ef798b40..435cd987 100644 --- a/src/test/java/genepi/imputationserver/steps/FastQualityControlTest.java +++ b/src/test/java/genepi/imputationserver/steps/FastQualityControlTest.java @@ -519,6 +519,47 @@ public void testQcStatisticsDontAllowStrandFlips() throws IOException { "Error: More than -1 obvious strand flips have been detected. Please check strand. Imputation cannot be started!")); } + + public void testQcStatisticsAllowAlleleSwitches() throws IOException { + + String configFolder = "test-data/configs/hapmap-3chr"; + String inputFolder = "test-data/data/simulated-chip-3chr-imputation-switches"; + + // create workflow context + WorkflowTestContext context = buildContext(inputFolder, "hapmap2"); + + // create step instance + FastQualityControlMock qcStats = new FastQualityControlMock(configFolder); + + // run and test + boolean result = run(context, qcStats); + + // check statistics + + assertTrue(context.hasInMemory("Excluded sites in total: 2,967")); + assertTrue(context.hasInMemory("Allele switch: 118,209")); + } + + public void testQcStatisticsDontAllowAlleleSwitches() throws IOException { + + String configFolder = "test-data/configs/hapmap-3chr"; + String inputFolder = "test-data/data/simulated-chip-3chr-imputation-switches"; + + // create workflow context + WorkflowTestContext context = buildContext(inputFolder, "hapmap2-qcfilter-alleleswitches"); + + // create step instance + FastQualityControlMock qcStats = new FastQualityControlMock(configFolder); + + // run and test + boolean result = run(context, qcStats); + + // check statistics + + assertTrue(context.hasInMemory("Excluded sites in total: 2,967")); + assertTrue(context.hasInMemory("Allele switch: 118,209")); + assertTrue(context.hasInMemory("Error: More than 33 allele switches have been detected. Imputation cannot be started!")); + } public void testQcStatisticsFilterOverlap() throws IOException { diff --git a/src/test/java/genepi/imputationserver/steps/ImputationTest.java b/src/test/java/genepi/imputationserver/steps/ImputationTest.java index 30972b90..00b1d440 100644 --- a/src/test/java/genepi/imputationserver/steps/ImputationTest.java +++ b/src/test/java/genepi/imputationserver/steps/ImputationTest.java @@ -99,7 +99,7 @@ public void testPipelineWithPhased() throws IOException, ZipException { assertEquals(true, file.isPhased()); assertEquals(TOTAL_REFPANEL_CHR20_B37 + ONLY_IN_INPUT, file.getNoSnps()); - // FileUtil.deleteDirectory("test-data/tmp"); + FileUtil.deleteDirectory("test-data/tmp"); } @@ -150,7 +150,7 @@ public void testPipelineWithPhasedAndMetaOption() throws IOException, ZipExcepti assertEquals(true, file.isPhased()); assertEquals(TOTAL_REFPANEL_CHR20_B37 + ONLY_IN_INPUT, file.getNoSnps()); - // FileUtil.deleteDirectory("test-data/tmp"); + FileUtil.deleteDirectory("test-data/tmp"); } @@ -315,7 +315,7 @@ public void testPipelineWithEagle() throws IOException, ZipException { assertEquals(true, file.isPhased()); assertEquals(TOTAL_REFPANEL_CHR20_B37, file.getNoSnps()); - int snpInInfo = getLineCount("test-data/tmp/chr20.info.gz") - 1; + int snpInInfo = getLineCount("test-data/tmp/chr20.info.gz"); assertEquals(snpInInfo, file.getNoSnps()); FileUtil.deleteDirectory("test-data/tmp"); @@ -358,10 +358,10 @@ public void testValidatePanelWithEagle() throws IOException, ZipException { VCFFileReader reader = new VCFFileReader(new File("test-data/tmp/chr20.dose.vcf.gz"), false); VCFHeader header = reader.getFileHeader(); - assertEquals("hapmap2", header.getOtherHeaderLine("panel").getValue()); - assertEquals(ImputationPipeline.EAGLE_VERSION, header.getOtherHeaderLine("phasing").getValue()); - assertEquals(ImputationPipeline.IMPUTATION_VERSION, header.getOtherHeaderLine("imputation").getValue()); - assertEquals(ImputationPipeline.PIPELINE_VERSION, header.getOtherHeaderLine("pipeline").getValue()); + assertEquals("hapmap2", header.getOtherHeaderLine("mis_panel").getValue()); + assertEquals(ImputationPipeline.EAGLE_VERSION, header.getOtherHeaderLine("mis_phasing").getValue()); + assertEquals(ImputationPipeline.IMPUTATION_VERSION, header.getOtherHeaderLine("mis_imputation").getValue()); + assertEquals(ImputationPipeline.PIPELINE_VERSION, header.getOtherHeaderLine("mis_pipeline").getValue()); FileUtil.deleteDirectory("test-data/tmp"); @@ -404,10 +404,10 @@ public void testValidatePanelWithBeagle() throws IOException, ZipException { VCFFileReader reader = new VCFFileReader(new File("test-data/tmp/chr20.dose.vcf.gz"), false); VCFHeader header = reader.getFileHeader(); - assertEquals("hapmap2", header.getOtherHeaderLine("panel").getValue()); - assertEquals(ImputationPipeline.BEAGLE_VERSION, header.getOtherHeaderLine("phasing").getValue()); - assertEquals(ImputationPipeline.IMPUTATION_VERSION, header.getOtherHeaderLine("imputation").getValue()); - assertEquals(ImputationPipeline.PIPELINE_VERSION, header.getOtherHeaderLine("pipeline").getValue()); + assertEquals("hapmap2", header.getOtherHeaderLine("mis_panel").getValue()); + assertEquals(ImputationPipeline.BEAGLE_VERSION, header.getOtherHeaderLine("mis_phasing").getValue()); + assertEquals(ImputationPipeline.IMPUTATION_VERSION, header.getOtherHeaderLine("mis_imputation").getValue()); + assertEquals(ImputationPipeline.PIPELINE_VERSION, header.getOtherHeaderLine("mis_pipeline").getValue()); FileUtil.deleteDirectory("test-data/tmp"); @@ -451,9 +451,9 @@ public void testValidatePanelPhasingOnly() throws IOException, ZipException { VCFFileReader reader = new VCFFileReader(new File("test-data/tmp/chr20.phased.vcf.gz"), false); VCFHeader header = reader.getFileHeader(); - assertEquals("hapmap2", header.getOtherHeaderLine("panel").getValue()); - assertEquals(ImputationPipeline.EAGLE_VERSION, header.getOtherHeaderLine("phasing").getValue()); - assertEquals(ImputationPipeline.PIPELINE_VERSION, header.getOtherHeaderLine("pipeline").getValue()); + assertEquals("hapmap2", header.getOtherHeaderLine("mis_panel").getValue()); + assertEquals(ImputationPipeline.EAGLE_VERSION, header.getOtherHeaderLine("mis_phasing").getValue()); + assertEquals(ImputationPipeline.PIPELINE_VERSION, header.getOtherHeaderLine("mis_pipeline").getValue()); FileUtil.deleteDirectory("test-data/tmp"); @@ -497,11 +497,11 @@ public void testValidatePanelPhasedInput() throws IOException, ZipException { VCFFileReader reader = new VCFFileReader(new File("test-data/tmp/chr20.dose.vcf.gz"), false); VCFHeader header = reader.getFileHeader(); - assertEquals("hapmap2", header.getOtherHeaderLine("panel").getValue()); - assertEquals("n/a", header.getOtherHeaderLine("phasing").getValue()); - assertEquals(ImputationPipeline.PIPELINE_VERSION, header.getOtherHeaderLine("pipeline").getValue()); + assertEquals("hapmap2", header.getOtherHeaderLine("mis_panel").getValue()); + assertEquals("n/a", header.getOtherHeaderLine("mis_phasing").getValue()); + assertEquals(ImputationPipeline.PIPELINE_VERSION, header.getOtherHeaderLine("mis_pipeline").getValue()); - // FileUtil.deleteDirectory("test-data/tmp"); + FileUtil.deleteDirectory("test-data/tmp"); } @@ -551,25 +551,22 @@ public void testPipelineWithEagleAndScores() throws IOException, ZipException { String inputFolder = "test-data/data/chr20-unphased"; // import scores into hdfs - String score1 = PGSCatalog.getFilenameById("PGS000018"); - String score2 = PGSCatalog.getFilenameById("PGS000027"); + String targetScores = HdfsUtil.path("scores-hdfs", "scores.txt.gz"); + HdfsUtil.put("test-data/data/pgs/test-scores.chr20.txt.gz", targetScores); - String targetScore1 = HdfsUtil.path("scores-hdfs", "PGS000018.txt.gz"); - HdfsUtil.put(score1, targetScore1); + String targetIndex = HdfsUtil.path("scores-hdfs", "scores.txt.gz.tbi"); + HdfsUtil.put("test-data/data/pgs/test-scores.chr20.txt.gz.tbi", targetIndex); - String targetScore2 = HdfsUtil.path("scores-hdfs", "PGS000027.txt.gz"); - HdfsUtil.put(score2, targetScore2); + String targetInfo = HdfsUtil.path("scores-hdfs", "scores.txt.gz.info"); + HdfsUtil.put("test-data/data/pgs/test-scores.chr20.txt.gz.info", targetInfo); // create workflow context and set scores WorkflowTestContext context = buildContext(inputFolder, "hapmap2"); context.setOutput("outputScores", "cloudgene2-hdfs"); Map pgsPanel = new HashMap(); - List scores = new Vector(); - scores.add("PGS000018.txt.gz"); - scores.add("PGS000027.txt.gz"); - pgsPanel.put("location", "scores-hdfs"); - pgsPanel.put("scores", scores); + pgsPanel.put("scores", targetScores); + pgsPanel.put("meta", "test-data/data/pgs/test-scores.chr20.json"); pgsPanel.put("build", "hg19"); context.setData("pgsPanel", pgsPanel); @@ -601,27 +598,9 @@ public void testPipelineWithEagleAndScores() throws IOException, ZipException { result = run(context, export); assertTrue(result); - ZipFile zipFile = new ZipFile("test-data/tmp/local/chr_20.zip", PASSWORD.toCharArray()); - zipFile.extractAll("test-data/tmp"); - - VcfFile file = VcfFileUtil.load("test-data/tmp/chr20.dose.vcf.gz", 100000000, false); - - assertEquals("20", file.getChromosome()); - assertEquals(51, file.getNoSamples()); - assertEquals(true, file.isPhased()); - assertEquals(TOTAL_REFPANEL_CHR20_B37, file.getNoSnps()); - - int snpInInfo = getLineCount("test-data/tmp/chr20.info.gz") - 1; - assertEquals(snpInInfo, file.getNoSnps()); - - String[] args = { "test-data/tmp/chr20.dose.vcf.gz", "--ref", "PGS000018,PGS000027", "--out", - "test-data/tmp/expected.txt" }; - int resultScore = new CommandLine(new ApplyScoreCommand()).execute(args); - assertEquals(0, resultScore); - - zipFile = new ZipFile("test-data/tmp/pgs_output/scores.zip", PASSWORD.toCharArray()); + ZipFile zipFile = new ZipFile("test-data/tmp/pgs_output/scores.zip"); zipFile.extractAll("test-data/tmp"); - CsvTableReader readerExpected = new CsvTableReader("test-data/tmp/expected.txt", ','); + CsvTableReader readerExpected = new CsvTableReader("test-data/data/pgs/expected.txt", ','); CsvTableReader readerActual = new CsvTableReader("test-data/tmp/scores.txt", ','); while (readerExpected.next() && readerActual.next()) { @@ -635,37 +614,36 @@ public void testPipelineWithEagleAndScores() throws IOException, ZipException { new File("test-data/tmp/local/scores.html").exists(); FileUtil.deleteDirectory("test-data/tmp"); + zipFile.close(); } @Test - public void testPipelineWithEagleAndScoresAndFormat() throws IOException, ZipException { + public void testPipelineWithEagleAndScoresAndCategory() throws IOException, ZipException { String configFolder = "test-data/configs/hapmap-chr20"; String inputFolder = "test-data/data/chr20-unphased"; // import scores into hdfs - String score1 = "test-data/data/prsweb/PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS.txt"; - String format1 = "test-data/data/prsweb/PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS.txt.format"; + String targetScores = HdfsUtil.path("scores-hdfs", "scores.txt.gz"); + HdfsUtil.put("test-data/data/pgs/test-scores.chr20.txt.gz", targetScores); - String targetScore1 = HdfsUtil.path("scores-hdfs", "PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS.txt"); - HdfsUtil.put(score1, targetScore1); + String targetIndex = HdfsUtil.path("scores-hdfs", "scores.txt.gz.tbi"); + HdfsUtil.put("test-data/data/pgs/test-scores.chr20.txt.gz.tbi", targetIndex); - String targetFormat1 = HdfsUtil.path("scores-hdfs", - "PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS.txt.format"); - HdfsUtil.put(format1, targetFormat1); + String targetInfo = HdfsUtil.path("scores-hdfs", "scores.txt.gz.info"); + HdfsUtil.put("test-data/data/pgs/test-scores.chr20.txt.gz.info", targetInfo); // create workflow context and set scores WorkflowTestContext context = buildContext(inputFolder, "hapmap2"); context.setOutput("outputScores", "cloudgene2-hdfs"); Map pgsPanel = new HashMap(); - List scores = new Vector(); - scores.add("PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS.txt"); - pgsPanel.put("location", "scores-hdfs"); - pgsPanel.put("scores", scores); + pgsPanel.put("scores", targetScores); + pgsPanel.put("meta", "test-data/data/pgs/test-scores.chr20.json"); pgsPanel.put("build", "hg19"); context.setData("pgsPanel", pgsPanel); + context.setInput("pgsCategory","Body measurement"); //only PGS000027 // run qc to create chunkfile @@ -678,6 +656,7 @@ public void testPipelineWithEagleAndScoresAndFormat() throws IOException, ZipExc result = run(context, qcStats); assertTrue(result); + assertTrue(context.hasInMemory("Remaining sites in total: 7,735")); // add panel to hdfs importRefPanel(FileUtil.path(configFolder, "ref-panels")); @@ -694,31 +673,14 @@ public void testPipelineWithEagleAndScoresAndFormat() throws IOException, ZipExc result = run(context, export); assertTrue(result); - ZipFile zipFile = new ZipFile("test-data/tmp/local/chr_20.zip", PASSWORD.toCharArray()); + ZipFile zipFile = new ZipFile("test-data/tmp/pgs_output/scores.zip"); zipFile.extractAll("test-data/tmp"); - - VcfFile file = VcfFileUtil.load("test-data/tmp/chr20.dose.vcf.gz", 100000000, false); - - assertEquals("20", file.getChromosome()); - assertEquals(51, file.getNoSamples()); - assertEquals(true, file.isPhased()); - assertEquals(TOTAL_REFPANEL_CHR20_B37, file.getNoSnps()); - - int snpInInfo = getLineCount("test-data/tmp/chr20.info.gz") - 1; - assertEquals(snpInInfo, file.getNoSnps()); - - String[] args = { "test-data/tmp/chr20.dose.vcf.gz", "--ref", score1, "--out", "test-data/tmp/expected.txt" }; - int resultScore = new CommandLine(new ApplyScoreCommand()).execute(args); - assertEquals(0, resultScore); - - zipFile = new ZipFile("test-data/tmp/pgs_output/scores.zip", PASSWORD.toCharArray()); - zipFile.extractAll("test-data/tmp"); - CsvTableReader readerExpected = new CsvTableReader("test-data/tmp/expected.txt", ','); + CsvTableReader readerExpected = new CsvTableReader("test-data/data/pgs/expected.txt", ','); CsvTableReader readerActual = new CsvTableReader("test-data/tmp/scores.txt", ','); + assertEquals(2, readerActual.getColumns().length); //only sample and PGS000027 while (readerExpected.next() && readerActual.next()) { - assertEquals(readerExpected.getDouble("PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS"), - readerActual.getDouble("PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS"), 0.00001); + assertEquals(readerExpected.getDouble("PGS000027"), readerActual.getDouble("PGS000027"), 0.00001); } readerExpected.close(); readerActual.close(); @@ -727,6 +689,7 @@ public void testPipelineWithEagleAndScoresAndFormat() throws IOException, ZipExc new File("test-data/tmp/local/scores.html").exists(); FileUtil.deleteDirectory("test-data/tmp"); + zipFile.close(); } @@ -773,7 +736,7 @@ public void testPipelineWithEaglePhasingOnlyWithPhasedData() throws IOException, assertEquals(true, file.isPhased()); assertEquals(TOTAL_SNPS_INPUT - SNPS_MONOMORPHIC, file.getNoSnps()); - // FileUtil.deleteDirectory("test-data/tmp"); + FileUtil.deleteDirectory("test-data/tmp"); } @@ -962,7 +925,7 @@ public void testPipelineWithEagleAndR2Filter() throws IOException, ZipException // TODO: update SNPS_WITH_R2_BELOW_05 assertTrue(TOTAL_REFPANEL_CHR20_B37 > file.getNoSnps()); - int snpInInfo = getLineCount("test-data/tmp/chr20.info.gz") - 1; + int snpInInfo = getLineCount("test-data/tmp/chr20.info.gz"); assertEquals(snpInInfo, file.getNoSnps()); FileUtil.deleteDirectory("test-data/tmp"); @@ -973,7 +936,12 @@ private int getLineCount(String filename) throws IOException { LineReader reader = new LineReader(filename); int lines = 0; while (reader.next()) { - lines++; + + String line = reader.get(); + { + if (!line.startsWith("#")) + lines++; + } } return lines; } @@ -1001,12 +969,12 @@ private boolean checkSortPositionInfo(String filename) throws IOException { String line = reader.get(); - if (!line.startsWith("SNP")) { - String snp = line.split("\t")[0]; - if (Integer.valueOf(snp.split(":")[1]) <= pos) { + if (!line.startsWith("#")) { + String snp = line.split("\\s+")[1]; + if (Integer.valueOf(snp) <= pos) { return false; } - pos = Integer.valueOf(snp.split(":")[1]); + pos = Integer.valueOf(snp); } } @@ -1081,7 +1049,7 @@ public void testCompareInfoAndDosageSize() throws IOException, ZipException { // subtract header int infoCount = getLineCount("test-data/tmp/chr20.info.gz"); - assertEquals(infoCount - 1, file.getNoSnps()); + assertEquals(infoCount, file.getNoSnps()); FileUtil.deleteDirectory("test-data/tmp"); } diff --git a/test-data/configs/beagle/panels.txt b/test-data/configs/beagle/panels.txt index 9099b68c..e0c6f302 100644 --- a/test-data/configs/beagle/panels.txt +++ b/test-data/configs/beagle/panels.txt @@ -1,7 +1,7 @@ panels: - id: hapmap2 - hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.m3vcf.gz + hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.msav legend: ref-panels/hapmap_r22.chr$chr.CEU.hg19_impute.legend.gz refBeagle: ref-panels/hapmap_r22.chr$chr.CEU.hg19.recode.bref3 mapBeagle: ref-panels/plink.chr$chr.GRCh37.map diff --git a/test-data/configs/beagle/ref-panels/hapmap_r22.chr20.CEU.hg19.msav b/test-data/configs/beagle/ref-panels/hapmap_r22.chr20.CEU.hg19.msav new file mode 100644 index 00000000..88f63812 Binary files /dev/null and b/test-data/configs/beagle/ref-panels/hapmap_r22.chr20.CEU.hg19.msav differ diff --git a/test-data/configs/hapmap-3chr/panels.txt b/test-data/configs/hapmap-3chr/panels.txt index 6d7461e6..bf1e7ae0 100644 --- a/test-data/configs/hapmap-3chr/panels.txt +++ b/test-data/configs/hapmap-3chr/panels.txt @@ -1,7 +1,7 @@ panels: - id: hapmap2 - hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.m3vcf.gz + hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.msav legend: ref-panels/hapmap_r22.chr$chr.CEU.hg19_impute.legend.gz mapEagle: ref-panels/genetic_map_hg19_chr1.txt refEagle: ref-panels/hapmap_r22.eagle/hapmap_r22.chr$chr.CEU.hg19.recode.bcf @@ -13,7 +13,7 @@ panels: mixed: Mixed - id: hapmap2-qcfilter-strandflips - hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.m3vcf.gz + hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.msav legend: ref-panels/hapmap_r22.chr$chr.CEU.hg19_impute.legend.gz mapEagle: ref-panels/genetic_map_hg19_chr1.txt refEagle: ref-panels/hapmap_r22.eagle/hapmap_r22.chr$chr.CEU.hg19.recode.bcf @@ -27,7 +27,7 @@ panels: strandFlips: -1 - id: hapmap2-qcfilter-ref-overlap - hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.m3vcf.gz + hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.msav legend: ref-panels/hapmap_r22.chr$chr.CEU.hg19_impute.legend.gz mapEagle: ref-panels/genetic_map_hg19_chr1.txt refEagle: ref-panels/hapmap_r22.eagle/hapmap_r22.chr$chr.CEU.hg19.recode.bcf @@ -43,7 +43,7 @@ panels: minSnps: 1000 - id: hapmap2-qcfilter-min-snps - hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.m3vcf.gz + hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.msav legend: ref-panels/hapmap_r22.chr$chr.CEU.hg19_impute.legend.gz mapEagle: ref-panels/genetic_map_hg19_chr1.txt refEagle: ref-panels/hapmap_r22.eagle/hapmap_r22.chr$chr.CEU.hg19.recode.bcf @@ -58,7 +58,7 @@ panels: minSnps: 1000 - id: hapmap2-qcfilter-low-callrate - hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.m3vcf.gz + hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.msav legend: ref-panels/hapmap_r22.chr$chr.CEU.hg19_impute.legend.gz mapEagle: ref-panels/genetic_map_hg19_chr1.txt refEagle: ref-panels/hapmap_r22.eagle/hapmap_r22.chr$chr.CEU.hg19.recode.bcf @@ -71,4 +71,18 @@ panels: qcFilter: sampleCallrate: 1.01 strandFlips: 100 + + - id: hapmap2-qcfilter-alleleswitches + hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.msav + legend: ref-panels/hapmap_r22.chr$chr.CEU.hg19_impute.legend.gz + mapEagle: ref-panels/genetic_map_hg19_chr1.txt + refEagle: ref-panels/hapmap_r22.eagle/hapmap_r22.chr$chr.CEU.hg19.recode.bcf + samples: + eur: 60 + mixed: -1 + populations: + eur: EUR + mixed: Mixed + qcFilter: + alleleSwitches: 33 \ No newline at end of file diff --git a/test-data/configs/hapmap-3chr/ref-panels/hapmap_r22.chr1.CEU.hg19.msav b/test-data/configs/hapmap-3chr/ref-panels/hapmap_r22.chr1.CEU.hg19.msav new file mode 100644 index 00000000..fcf93795 Binary files /dev/null and b/test-data/configs/hapmap-3chr/ref-panels/hapmap_r22.chr1.CEU.hg19.msav differ diff --git a/test-data/configs/hapmap-3chr/ref-panels/hapmap_r22.chr2.CEU.hg19.msav b/test-data/configs/hapmap-3chr/ref-panels/hapmap_r22.chr2.CEU.hg19.msav new file mode 100644 index 00000000..01774555 Binary files /dev/null and b/test-data/configs/hapmap-3chr/ref-panels/hapmap_r22.chr2.CEU.hg19.msav differ diff --git a/test-data/configs/hapmap-3chr/ref-panels/hapmap_r22.chr3.CEU.hg19.msav b/test-data/configs/hapmap-3chr/ref-panels/hapmap_r22.chr3.CEU.hg19.msav new file mode 100644 index 00000000..a9d58b4b Binary files /dev/null and b/test-data/configs/hapmap-3chr/ref-panels/hapmap_r22.chr3.CEU.hg19.msav differ diff --git a/test-data/configs/hapmap-chr1/panels.txt b/test-data/configs/hapmap-chr1/panels.txt index c8c8f642..a6552542 100644 --- a/test-data/configs/hapmap-chr1/panels.txt +++ b/test-data/configs/hapmap-chr1/panels.txt @@ -1,7 +1,7 @@ panels: - id: hapmap2 - hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.m3vcf.gz + hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.msav legend: ref-panels/hapmap_r22.chr$chr.CEU.hg19_impute.legend.gz mapEagle: ref-panels/genetic_map_hg19_chr1.txt refEagle: ref-panels/hapmap_r22.chr$chr.CEU.hg19.recode.bcf @@ -13,7 +13,7 @@ panels: mixed: Mixed - id: hrc-fake - hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.m3vcf.gz + hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.msav legend: ref-panels/hapmap_r22.chr$chr.CEU.hg19_impute.legend.gz mapEagle: ref-panels/genetic_map_hg19_chr1.txt refEagle: ref-panels/hapmap_r22.chr$chr.CEU.hg19.recode.bcf @@ -25,7 +25,7 @@ panels: mixed: Mixed - id: phase3-fake - hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.m3vcf.gz + hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.msav legend: ref-panels/hapmap_r22.chr$chr.CEU.hg19_impute.legend.gz mapEagle: ref-panels/genetic_map_hg19_chr1.txt refEagle: ref-panels/hapmap_r22.chr$chr.CEU.hg19.recode.bcf @@ -45,7 +45,7 @@ panels: mixed: Mixed - id: TOPMedfreeze6-fake - hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.m3vcf.gz + hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.msav legend: ref-panels/hapmap_r22.chr$chr.CEU.hg19_impute.legend.gz mapEagle: ref-panels/genetic_map_hg19_chr1.txt refEagle: ref-panels/hapmap_r22.chr$chr.CEU.hg19.recode.bcf @@ -66,7 +66,7 @@ panels: - id: hapmap2-region-simple - hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.m3vcf.gz + hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.msav legend: ref-panels/hapmap_r22.chr$chr.CEU.hg19_impute.legend.gz mapEagle: ref-panels/genetic_map_hg19_chr1.txt refEagle: ref-panels/hapmap_r22.eagle/hapmap_r22.chr$chr.CEU.hg19.recode.bcf @@ -79,7 +79,7 @@ panels: range: 1:565111-752566 - id: hapmap2-region-complex - hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.m3vcf.gz + hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.msav legend: ref-panels/hapmap_r22.chr$chr.CEU.hg19_impute.legend.gz mapEagle: ref-panels/genetic_map_hg19_chr1.txt refEagle: ref-panels/hapmap_r22.eagle/hapmap_r22.chr$chr.CEU.hg19.recode.bcf diff --git a/test-data/configs/hapmap-chr1/ref-panels/hapmap_r22.chr1.CEU.hg19.msav b/test-data/configs/hapmap-chr1/ref-panels/hapmap_r22.chr1.CEU.hg19.msav new file mode 100644 index 00000000..62a915f4 Binary files /dev/null and b/test-data/configs/hapmap-chr1/ref-panels/hapmap_r22.chr1.CEU.hg19.msav differ diff --git a/test-data/configs/hapmap-chr20-hg38/panels.txt b/test-data/configs/hapmap-chr20-hg38/panels.txt index 1bf29b2d..7dc3cae8 100644 --- a/test-data/configs/hapmap-chr20-hg38/panels.txt +++ b/test-data/configs/hapmap-chr20-hg38/panels.txt @@ -1,7 +1,7 @@ panels: - id: hapmap2 - hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg38.m3vcf.gz + hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg38.msav legend: ref-panels/hapmap_r22.chr$chr.CEU.hg38_impute.legend.gz mapEagle: ref-panels/genetic_map_hg38_withX.txt.gz refEagle: ref-panels/hapmap_r22.chr$chr.CEU.hg38.bcf diff --git a/test-data/configs/hapmap-chr20-hg38/ref-panels/hapmap_r22.chr20.CEU.hg38.msav b/test-data/configs/hapmap-chr20-hg38/ref-panels/hapmap_r22.chr20.CEU.hg38.msav new file mode 100644 index 00000000..12ca8d65 Binary files /dev/null and b/test-data/configs/hapmap-chr20-hg38/ref-panels/hapmap_r22.chr20.CEU.hg38.msav differ diff --git a/test-data/configs/hapmap-chr20/panels.txt b/test-data/configs/hapmap-chr20/panels.txt index 5679b82b..710d07a5 100644 --- a/test-data/configs/hapmap-chr20/panels.txt +++ b/test-data/configs/hapmap-chr20/panels.txt @@ -1,7 +1,7 @@ panels: - id: hapmap2 - hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.m3vcf.gz + hdfs: ref-panels/hapmap_r22.chr$chr.CEU.hg19.msav legend: ref-panels/hapmap_r22.chr$chr.CEU.hg19_impute.legend.gz mapEagle: ref-panels/genetic_map_hg19_withX.txt.gz refEagle: ref-panels/hapmap_r22.chr$chr.CEU.hg19.recode.bcf diff --git a/test-data/configs/hapmap-chr20/ref-panels/hapmap_r22.chr20.CEU.hg19.msav b/test-data/configs/hapmap-chr20/ref-panels/hapmap_r22.chr20.CEU.hg19.msav new file mode 100644 index 00000000..8267f99d Binary files /dev/null and b/test-data/configs/hapmap-chr20/ref-panels/hapmap_r22.chr20.CEU.hg19.msav differ diff --git a/test-data/configs/hapmap-chrX-hg38/panels.txt b/test-data/configs/hapmap-chrX-hg38/panels.txt index 216b690f..f16cfc05 100644 --- a/test-data/configs/hapmap-chrX-hg38/panels.txt +++ b/test-data/configs/hapmap-chrX-hg38/panels.txt @@ -1,7 +1,7 @@ panels: - id: hapmap2 - hdfs: ref-panels/$chr.1000g.Phase1.v3.With.Parameter.Estimates.hg38.m3vcf.gz + hdfs: ref-panels/$chr.1000g.Phase1.v3.With.Parameter.Estimates.hg38.msav legend: ref-panels/1000g_chrX_impute.hg38.legend.gz mapEagle: ref-panels/genetic_map_hg38_withX.txt.gz refEagle: ref-panels/ALL.$chr.phase1_v3.snps_indels_svs.genotypes.all.noSingleton.recode.hg38.bcf diff --git a/test-data/configs/hapmap-chrX-hg38/ref-panels/X.PAR1.1000g.Phase1.v3.With.Parameter.Estimates.hg38.msav b/test-data/configs/hapmap-chrX-hg38/ref-panels/X.PAR1.1000g.Phase1.v3.With.Parameter.Estimates.hg38.msav new file mode 100644 index 00000000..37c4a3f3 Binary files /dev/null and b/test-data/configs/hapmap-chrX-hg38/ref-panels/X.PAR1.1000g.Phase1.v3.With.Parameter.Estimates.hg38.msav differ diff --git a/test-data/configs/hapmap-chrX-hg38/ref-panels/X.PAR2.1000g.Phase1.v3.With.Parameter.Estimates.hg38.msav b/test-data/configs/hapmap-chrX-hg38/ref-panels/X.PAR2.1000g.Phase1.v3.With.Parameter.Estimates.hg38.msav new file mode 100644 index 00000000..46cc3120 Binary files /dev/null and b/test-data/configs/hapmap-chrX-hg38/ref-panels/X.PAR2.1000g.Phase1.v3.With.Parameter.Estimates.hg38.msav differ diff --git a/test-data/configs/hapmap-chrX-hg38/ref-panels/X.nonPAR.1000g.Phase1.v3.With.Parameter.Estimates.hg38.msav b/test-data/configs/hapmap-chrX-hg38/ref-panels/X.nonPAR.1000g.Phase1.v3.With.Parameter.Estimates.hg38.msav new file mode 100644 index 00000000..3989f9e0 Binary files /dev/null and b/test-data/configs/hapmap-chrX-hg38/ref-panels/X.nonPAR.1000g.Phase1.v3.With.Parameter.Estimates.hg38.msav differ diff --git a/test-data/configs/hapmap-chrX/panels.txt b/test-data/configs/hapmap-chrX/panels.txt index 8769c7b2..cc3ea964 100644 --- a/test-data/configs/hapmap-chrX/panels.txt +++ b/test-data/configs/hapmap-chrX/panels.txt @@ -1,7 +1,7 @@ panels: - id: phase1 - hdfs: ref-panels/$chr.1000g.Phase1.v3.With.Parameter.Estimates.m3vcf.gz + hdfs: ref-panels/$chr.1000g.Phase1.v3.With.Parameter.Estimates.msav legend: ref-panels/1000g_chr$chr_impute.legend.gz mapEagle: ref-panels/genetic_map_hg19_withX.txt.gz refEagle: ref-panels/ALL.chr$chr.phase1_v3.snps_indels_svs.genotypes.all.noSingleton.recode.bcf diff --git a/test-data/configs/hapmap-chrX/ref-panels/X.PAR1.1000g.Phase1.v3.With.Parameter.Estimates.msav b/test-data/configs/hapmap-chrX/ref-panels/X.PAR1.1000g.Phase1.v3.With.Parameter.Estimates.msav new file mode 100644 index 00000000..272cad05 Binary files /dev/null and b/test-data/configs/hapmap-chrX/ref-panels/X.PAR1.1000g.Phase1.v3.With.Parameter.Estimates.msav differ diff --git a/test-data/configs/hapmap-chrX/ref-panels/X.PAR2.1000g.Phase1.v3.With.Parameter.Estimates.msav b/test-data/configs/hapmap-chrX/ref-panels/X.PAR2.1000g.Phase1.v3.With.Parameter.Estimates.msav new file mode 100644 index 00000000..ecd532ec Binary files /dev/null and b/test-data/configs/hapmap-chrX/ref-panels/X.PAR2.1000g.Phase1.v3.With.Parameter.Estimates.msav differ diff --git a/test-data/configs/hapmap-chrX/ref-panels/X.nonPAR.1000g.Phase1.v3.With.Parameter.Estimates.msav b/test-data/configs/hapmap-chrX/ref-panels/X.nonPAR.1000g.Phase1.v3.With.Parameter.Estimates.msav new file mode 100644 index 00000000..71f9a0c9 Binary files /dev/null and b/test-data/configs/hapmap-chrX/ref-panels/X.nonPAR.1000g.Phase1.v3.With.Parameter.Estimates.msav differ diff --git a/test-data/configs/phylotree-chrMT/panels.txt b/test-data/configs/phylotree-chrMT/panels.txt index 0677f49e..86aceed1 100644 --- a/test-data/configs/phylotree-chrMT/panels.txt +++ b/test-data/configs/phylotree-chrMT/panels.txt @@ -1,6 +1,6 @@ panels: - id: phylotree - hdfs: ref-panels/chrMT.phylotree17.m3vcf.gz + hdfs: ref-panels/chrMT.phylotree17.msav legend: ref-panels/chrMT.phylotree17.legend.gz mapEagle: ref-panels/genetic_map_hg19_withX.txt.gz samples: diff --git a/test-data/configs/phylotree-chrMT/ref-panels/chrMT.phylotree17.msav b/test-data/configs/phylotree-chrMT/ref-panels/chrMT.phylotree17.msav new file mode 100644 index 00000000..3e9db6f4 Binary files /dev/null and b/test-data/configs/phylotree-chrMT/ref-panels/chrMT.phylotree17.msav differ diff --git a/test-data/data/chr20-phased-hg38/chr20.R50.merged.1.330k.recode.small.hg38.vcf.gz b/test-data/data/chr20-phased-hg38/chr20.R50.merged.1.330k.recode.small.hg38.vcf.gz index 48d22d5c..c946eff9 100644 Binary files a/test-data/data/chr20-phased-hg38/chr20.R50.merged.1.330k.recode.small.hg38.vcf.gz and b/test-data/data/chr20-phased-hg38/chr20.R50.merged.1.330k.recode.small.hg38.vcf.gz differ diff --git a/test-data/data/chr20-phased/chr20.R50.merged.1.330k.recode.small.vcf.gz b/test-data/data/chr20-phased/chr20.R50.merged.1.330k.recode.small.vcf.gz index 832668f3..61d81f4a 100644 Binary files a/test-data/data/chr20-phased/chr20.R50.merged.1.330k.recode.small.vcf.gz and b/test-data/data/chr20-phased/chr20.R50.merged.1.330k.recode.small.vcf.gz differ diff --git a/test-data/data/chr20-phased/chr20.R50.merged.1.330k.recode.small.vcf.gz.tbi b/test-data/data/chr20-phased/chr20.R50.merged.1.330k.recode.small.vcf.gz.tbi index 95fb4fd0..1d0eaf81 100644 Binary files a/test-data/data/chr20-phased/chr20.R50.merged.1.330k.recode.small.vcf.gz.tbi and b/test-data/data/chr20-phased/chr20.R50.merged.1.330k.recode.small.vcf.gz.tbi differ diff --git a/test-data/data/chr20-unphased-hg38/chr20.R50.merged.1.330k.recode.unphased.small.hg38.vcf.gz b/test-data/data/chr20-unphased-hg38/chr20.R50.merged.1.330k.recode.unphased.small.hg38.vcf.gz index 75ff749b..58bd6476 100644 Binary files a/test-data/data/chr20-unphased-hg38/chr20.R50.merged.1.330k.recode.unphased.small.hg38.vcf.gz and b/test-data/data/chr20-unphased-hg38/chr20.R50.merged.1.330k.recode.unphased.small.hg38.vcf.gz differ diff --git a/test-data/data/chr20-unphased/chr20.R50.merged.1.330k.recode.unphased.small.vcf.gz b/test-data/data/chr20-unphased/chr20.R50.merged.1.330k.recode.unphased.small.vcf.gz index 6d728f34..21368b85 100644 Binary files a/test-data/data/chr20-unphased/chr20.R50.merged.1.330k.recode.unphased.small.vcf.gz and b/test-data/data/chr20-unphased/chr20.R50.merged.1.330k.recode.unphased.small.vcf.gz differ diff --git a/test-data/data/chr20-unphased/chr20.R50.merged.1.330k.recode.unphased.small.vcf.gz.tbi b/test-data/data/chr20-unphased/chr20.R50.merged.1.330k.recode.unphased.small.vcf.gz.tbi index 24eeeeea..6f0a5cc5 100644 Binary files a/test-data/data/chr20-unphased/chr20.R50.merged.1.330k.recode.unphased.small.vcf.gz.tbi and b/test-data/data/chr20-unphased/chr20.R50.merged.1.330k.recode.unphased.small.vcf.gz.tbi differ diff --git a/test-data/data/pgs/expected.txt b/test-data/data/pgs/expected.txt new file mode 100644 index 00000000..588c2451 --- /dev/null +++ b/test-data/data/pgs/expected.txt @@ -0,0 +1,52 @@ +"sample","PGS000018","PGS000027" +"FB00000","0.05313937446647911","-0.023816253458824904" +"FB00001","0.045917222207399355","-0.019797123689728618" +"FB00002","0.021708847564541695","-0.04673578016981638" +"FB00005","0.09648429952959627","-0.005867172391953264" +"FB00006","0.0026945615426237635","-0.028585821197501655" +"FB00012","0.11515414724643888","-0.016262550736242526" +"FB00015","-0.02863926274282929","-0.0058560594715711765" +"FB00016","0.09708058770183478","-0.006615503674890271" +"FB00017","0.0333183545495552","-0.09133852506614625" +"FB00021","0.02125234476039485","-0.03385883398967005" +"FB00022","0.08063914063225523","-0.023863179573550077" +"FB00023","0.08486816825786414","-0.019237921064168134" +"FB00025","0.007633970504953569","-0.00821490769121821" +"FB00027","0.06589368184603889","6.881747037498858E-4" +"FB00031","0.08159354733483709","-0.05865219358524136" +"FB00032","0.059559124423313986","-0.011932065906053195" +"FB00034","0.07329317227154561","-0.0314670162155752" +"FB00035","0.0331774804672276","-0.03781256556917445" +"FB00037","0.0016465439310346247","-0.019176108365452083" +"FB00038","0.0017311541871692232","-0.015071171783794392" +"FB00039","0.09066393193180469","0.015333069126141343" +"FB00040","0.015603500129263245","-0.018604292801794806" +"FB00041","0.0309823224616549","-0.016796346366294696" +"FB00044","0.05134259495424047","0.004675911477700524" +"FB00049","0.06968220638930554","-0.02991454296882224" +"FB00051","0.052175511361486654","-0.05707273500672397" +"FB00052","0.029502600308498023","-0.019905815607306344" +"FB00053","0.06161116407293038","-0.031026560813588313" +"FB00055","0.054493671491316106","0.00431830324515882" +"FB00056","0.02320194194946318","0.00614156436749441" +"FB00058","0.07582055523278575","-0.006918139126677136" +"FB00059","0.08255746970293422","-0.009353011329235092" +"FB00062","0.057664896380924285","-0.014677392669547516" +"FB00064","0.09473688564666033","-0.008888761239558821" +"FB00066","0.037741025828820024","-0.045262319349685054" +"FB00068","0.02146662718024195","-0.02550422945740293" +"FB00069","0.05478741439964831","-0.0026899084361405347" +"FB00071","0.062334100154450026","-0.013246084749458435" +"FB00072","0.06290342067593024","-0.016947750575916215" +"FB00074","0.05314067711554828","-0.007028060519271768" +"FB00075","-0.017286646355253094","0.009813629909409457" +"FB00077","0.004136677400486163","0.0038938051485948692" +"FB00078","0.08205304671458324","-0.012855452407862854" +"FB00082","0.026206583635414803","-0.0016588620378834643" +"FB00086","-0.026879636463039175","-0.002094128807407887" +"FB00089","0.04857095566430543","-0.03379098615229034" +"FB00090","-0.026723652883937743","-0.02661307087901709" +"FB00091","0.06103986214408777","-0.028456093825954442" +"FB00093","-0.0017740509916735758","-0.010945215908210024" +"FB00094","0.032863183164308585","-0.026344531584280163" +"FB00095","0.05236004026455726","-0.014738808207713519" diff --git a/test-data/data/pgs/test-scores.chr20.json b/test-data/data/pgs/test-scores.chr20.json new file mode 100644 index 00000000..dd843f15 --- /dev/null +++ b/test-data/data/pgs/test-scores.chr20.json @@ -0,0 +1,129 @@ +{ + "PGS000018": { + "id": "PGS000018", + "trait": "Coronary artery disease", + "efo": [ + { + "id": "EFO_0001645", + "label": "coronary artery disease", + "description": "Narrowing of the coronary arteries due to fatty deposits inside the arterial walls. The diagnostic criteria may include documented history of any of the following: documented coronary artery stenosis greater than or equal to 50% (by cardiac catheterization or other modality of direct imaging of the coronary arteries); previous coronary artery bypass surgery (CABG); previous percutaneous coronary intervention (PCI); previous myocardial infarction. (ACC) [NCIT: C26732]", + "url": "http://www.ebi.ac.uk/efo/EFO_0001645" + } + ], + "populations": { + "items": { + "MAE": { + "name": "MAE", + "count": -1, + "percentage": 0.509, + "color": "eeeeee", + "label": "Multi-Ancestry (including Europeans)" + }, + "EUR": { + "name": "EUR", + "count": -1, + "percentage": 0.37, + "color": "#0099E6", + "label": "European" + }, + "SAS": { + "name": "SAS", + "count": -1, + "percentage": 0.067, + "color": "#F90026", + "label": "South Asian" + }, + "AMR": { + "name": "AMR", + "count": -1, + "percentage": 0.011, + "color": "#800080", + "label": "Hispanic or Latin American" + }, + "EAS": { + "name": "EAS", + "count": -1, + "percentage": 0.03, + "color": "#FF99E6", + "label": "East Asian" + }, + "AFR": { + "name": "AFR", + "count": -1, + "percentage": 0.008, + "color": "#FF6600", + "label": "African" + }, + "GME": { + "name": "GME", + "count": -1, + "percentage": 0.006, + "color": "#DBEE06", + "label": "Greater Middle Eastern" + } + }, + "total": 382026 + }, + "publication": { + "date": "2018-10-01", + "journal": "J Am Coll Cardiol", + "firstauthor": "Inouye M", + "doi": "10.1016/j.jacc.2018.07.079" + }, + "categories": ["Cardiovascular disease"], + "variants": 1745179, + "repository": "PGS-Catalog", + "link": "https://www.pgscatalog.org/score/PGS000018", + "samples": 382026 + }, + + "PGS000027": { + "id": "PGS000027", + "trait": "Body Mass Index", + "efo": [ + { + "id": "EFO_0004340", + "label": "body mass index", + "description": "An indicator of body density as determined by the relationship of BODY WEIGHT to BODY HEIGHT. BMI\u003dweight (kg)/height squared (m2). BMI correlates with body fat (ADIPOSE TISSUE). Their relationship varies with age and gender. For adults, BMI falls into these categories: below 18.5 (underweight); 18.5-24.9 (normal); 25.0-29.9 (overweight); 30.0 and above (obese). (National Center for Health Statistics, Centers for Disease Control and Prevention)", + "url": "http://www.ebi.ac.uk/efo/EFO_0004340" + } + ], + "populations": { + "items": { + "EUR": { + "name": "EUR", + "count": -1, + "percentage": 0.991, + "color": "#0099E6", + "label": "European" + }, + "AMR": { + "name": "AMR", + "count": -1, + "percentage": 0.005, + "color": "#800080", + "label": "Hispanic or Latin American" + }, + "AFR": { + "name": "AFR", + "count": -1, + "percentage": 0.004, + "color": "#FF6600", + "label": "African" + } + }, + "total": 238944 + }, + "publication": { + "date": "2019-04-01", + "journal": "Cell", + "firstauthor": "Khera AV", + "doi": "10.1016/j.cell.2019.03.028" + }, + "categories": ["Body measurement"], + "variants": 2100302, + "repository": "PGS-Catalog", + "link": "https://www.pgscatalog.org/score/PGS000027", + "samples": 238944 + } +} diff --git a/test-data/data/pgs/test-scores.chr20.txt.gz b/test-data/data/pgs/test-scores.chr20.txt.gz new file mode 100644 index 00000000..b59f6546 Binary files /dev/null and b/test-data/data/pgs/test-scores.chr20.txt.gz differ diff --git a/test-data/data/pgs/test-scores.chr20.txt.gz.info b/test-data/data/pgs/test-scores.chr20.txt.gz.info new file mode 100644 index 00000000..fef2aaf4 --- /dev/null +++ b/test-data/data/pgs/test-scores.chr20.txt.gz.info @@ -0,0 +1,7 @@ +# PGS-Collection v1 +# Date=Fri Dec 08 08:38:43 CET 2023 +# Scores=2 +# Updated by pgs-calc 1.6.0 +score variants ignored +PGS000018 1745179 0 +PGS000027 2100302 0 diff --git a/test-data/data/pgs/test-scores.chr20.txt.gz.tbi b/test-data/data/pgs/test-scores.chr20.txt.gz.tbi new file mode 100644 index 00000000..a18baec6 Binary files /dev/null and b/test-data/data/pgs/test-scores.chr20.txt.gz.tbi differ diff --git a/test-data/data/prsweb/PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS.txt b/test-data/data/prsweb/PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS.txt deleted file mode 100644 index 1a6ec0de..00000000 --- a/test-data/data/prsweb/PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS.txt +++ /dev/null @@ -1,100 +0,0 @@ -## PRSweb reference PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608 -## PRSweb LD reference MGI -## PRSweb date 20200608 -## GWAS source 30510241 -## GWAS reference PUBMED -## GWAS phenotype Colorectal cancer -## GWAS id CRC_Huyghe -## GWAS URL https://www.nature.com/articles/s41588-018-0286-6#Sec35 -## PRS method LD Clumping (MAF >= 1%, r^2 <= 0.1) & P-value thresholding (see tuning parameter) -## PRS tuning parameter 7.8e-06 -## PRS evaluation in UKB -## Genome build GRCh37/hg19 -CHROM POS REF ALT EA OA PVALUE WEIGHT -1 38455891 G C G C 3.8e-09 0.0523 -1 55246035 T C C T 3.3e-11 0.0665 -1 183002639 A G A G 2.4e-16 0.073 -1 222112634 A G G A 6.1e-16 0.0877 -2 159964552 T C C T 4.4e-08 0.0511 -2 199612407 T C C T 5e-09 0.0535 -2 199781586 T C T C 3.7e-11 0.0627 -2 219191256 T C T C 1.5e-11 0.0613 -3 40915239 A G G A 1.2e-16 0.0994 -3 66365163 G A A G 7.1e-08 0.0597 -3 112999560 G A G A 1.4e-08 0.1761 -3 133701119 G A A G 3.8e-09 0.0597 -3 169517436 C T C T 7.8e-06 0.0453 -4 94938618 C A A C 1.2e-08 0.052 -4 106128760 G A A G 1.6e-08 0.0522 -4 145659064 T C C T 2.9e-08 0.0842 -5 1240204 C T T C 5.1e-09 0.1119 -5 1296486 A G G A 1.4e-22 0.0865 -5 40102443 G A A G 4.2e-09 0.0545 -5 40280076 G A A G 9.3e-25 0.1013 -5 134467220 C T C T 4.8e-15 0.0693 -6 31449620 C T C T 1.8e-10 0.1118 -6 32593080 A G G A 4.9e-14 0.0889 -6 35569562 A G A G 3.6e-08 0.0778 -6 36623379 G A A G 8.6e-08 0.054 -6 55712124 C T C T 1.1e-11 0.0724 -7 45136423 T C T C 4.7e-08 0.065 -8 117630683 A C C A 7.3e-28 0.2099 -8 128413305 G T G T 1.1e-15 0.1052 -8 128571855 G T G T 1.8e-09 0.0608 -9 22103183 G T G T 1.4e-08 0.0504 -9 101679752 T G T G 3.1e-08 0.0818 -9 113671403 T C C T 2.8e-09 0.0637 -10 8739580 T A T A 1.3e-25 0.1064 -10 52648454 C T C T 5e-10 0.073 -10 80819132 A G G A 1.8e-17 0.0765 -10 101351704 A G G A 1e-17 0.0889 -10 114288619 T C C T 1.3e-11 0.0975 -10 114722621 G A A G 7e-07 0.0527 -11 61549025 G A G A 1.2e-11 0.0636 -11 74280012 T G G T 8.9e-19 0.078 -11 74427921 C T C T 3.7e-16 0.1934 -11 101656397 T A T A 1.1e-09 0.0537 -11 111156836 T C T C 1.9e-31 0.1122 -12 4368607 T C C T 3.6e-14 0.089 -12 4388271 C T T C 1.6e-15 0.1181 -12 4400808 C T T C 2.4e-09 0.055 -12 6421174 A T T A 4.1e-09 0.0597 -12 43134191 A G G A 1.3e-09 0.053 -12 51171090 A G G A 1.9e-23 0.0896 -12 57533690 C A A C 9.4e-09 0.053 -12 111973358 A G G A 2.6e-16 0.0737 -12 115890922 T C C T 8.1e-14 0.066 -13 34092164 C T C T 3.4e-07 0.0468 -13 37462010 A G G A 6.3e-13 0.0758 -13 73791554 T C C T 2.6e-08 0.0982 -13 111075881 C T T C 1.8e-09 0.0549 -14 54419106 A C C A 2.1e-23 0.0912 -14 54445157 G A G A 3.1e-07 0.0465 -14 59189361 G A G A 9.9e-07 0.0691 -15 32992836 G A G A 1.1e-06 0.0464 -15 33010736 G A A G 2.3e-29 0.1248 -15 33156386 G A A G 1.5e-10 0.0705 -15 67402824 T C C T 2.4e-13 0.0689 -16 68743939 A C A C 3.1e-08 0.055 -16 80043258 C A C A 2.1e-08 0.0498 -16 86339315 T C T C 2.8e-08 0.0487 -16 86703949 C T T C 6.6e-06 0.0481 -17 809643 G A G A 6.8e-08 0.0514 -17 10707241 G A A G 6.6e-12 0.0748 -17 70413253 G A A G 5.6e-09 0.0595 -18 46453156 A T A T 3.8e-74 0.1606 -19 16417198 C T T C 4.2e-10 0.0868 -19 33519927 T G T G 3.7e-23 0.1939 -19 41871573 G A A G 9.5e-07 0.0441 -19 59079096 C T T C 4.2e-08 0.0632 -20 6376457 G C G C 1.1e-16 0.0795 -20 6603622 C T C T 6.9e-12 0.0627 -20 6699595 T G G T 2.3e-18 0.0819 -20 6762221 C T T C 3.3e-14 0.0714 -20 7740976 A G G A 3.4e-13 0.0874 -20 33213196 A C C A 3e-07 0.045 -20 42666475 C T T C 6.8e-09 0.0597 -20 47340117 A G A G 5.9e-15 0.0719 -20 49055318 C T C T 3.3e-09 0.0547 -20 60932414 T C C T 1.1e-26 0.1146 -20 62308612 T G T G 5.3e-08 0.0593 diff --git a/test-data/data/prsweb/PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS.txt.format b/test-data/data/prsweb/PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS.txt.format deleted file mode 100644 index bef65f66..00000000 --- a/test-data/data/prsweb/PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS.txt.format +++ /dev/null @@ -1,7 +0,0 @@ -{ - "chromosome": "CHROM", - "position": "POS", - "effect_weight": "WEIGHT", - "otherAllele": "OA", - "effectAllele": "EA" -} diff --git a/test-data/data/simulated-chip-3chr-imputation-switches/1000genomes.chr1.HumanHap550.small.recode.switches.vcf.gz b/test-data/data/simulated-chip-3chr-imputation-switches/1000genomes.chr1.HumanHap550.small.recode.switches.vcf.gz new file mode 100644 index 00000000..17370ef6 Binary files /dev/null and b/test-data/data/simulated-chip-3chr-imputation-switches/1000genomes.chr1.HumanHap550.small.recode.switches.vcf.gz differ diff --git a/test-data/data/simulated-chip-3chr-imputation-switches/1000genomes.chr1.HumanHap550.small.recode.switches.vcf.gz.tbi b/test-data/data/simulated-chip-3chr-imputation-switches/1000genomes.chr1.HumanHap550.small.recode.switches.vcf.gz.tbi new file mode 100644 index 00000000..7c5d9220 Binary files /dev/null and b/test-data/data/simulated-chip-3chr-imputation-switches/1000genomes.chr1.HumanHap550.small.recode.switches.vcf.gz.tbi differ diff --git a/test-data/data/simulated-chip-3chr-imputation-switches/1000genomes.chr2.HumanHap550.small.recode.switches.vcf.gz b/test-data/data/simulated-chip-3chr-imputation-switches/1000genomes.chr2.HumanHap550.small.recode.switches.vcf.gz new file mode 100644 index 00000000..17370ef6 Binary files /dev/null and b/test-data/data/simulated-chip-3chr-imputation-switches/1000genomes.chr2.HumanHap550.small.recode.switches.vcf.gz differ diff --git a/test-data/data/simulated-chip-3chr-imputation-switches/1000genomes.chr2.HumanHap550.small.recode.switches.vcf.gz.tbi b/test-data/data/simulated-chip-3chr-imputation-switches/1000genomes.chr2.HumanHap550.small.recode.switches.vcf.gz.tbi new file mode 100644 index 00000000..7c5d9220 Binary files /dev/null and b/test-data/data/simulated-chip-3chr-imputation-switches/1000genomes.chr2.HumanHap550.small.recode.switches.vcf.gz.tbi differ diff --git a/test-data/data/simulated-chip-3chr-imputation-switches/1000genomes.chr3.HumanHap550.small.recode.switches.vcf.gz b/test-data/data/simulated-chip-3chr-imputation-switches/1000genomes.chr3.HumanHap550.small.recode.switches.vcf.gz new file mode 100644 index 00000000..17370ef6 Binary files /dev/null and b/test-data/data/simulated-chip-3chr-imputation-switches/1000genomes.chr3.HumanHap550.small.recode.switches.vcf.gz differ diff --git a/test-data/data/simulated-chip-3chr-imputation-switches/1000genomes.chr3.HumanHap550.small.recode.switches.vcf.gz.tbi b/test-data/data/simulated-chip-3chr-imputation-switches/1000genomes.chr3.HumanHap550.small.recode.switches.vcf.gz.tbi new file mode 100644 index 00000000..7c5d9220 Binary files /dev/null and b/test-data/data/simulated-chip-3chr-imputation-switches/1000genomes.chr3.HumanHap550.small.recode.switches.vcf.gz.tbi differ