Skip to content

Commit

Permalink
Merge tag 'v2.0.0' of https://github.com/genepi/imputationserver into…
Browse files Browse the repository at this point in the history
… release
  • Loading branch information
abought committed Mar 21, 2024
2 parents 9b1d9b5 + 3b54ddd commit d24f2e5
Show file tree
Hide file tree
Showing 92 changed files with 1,346 additions and 590 deletions.
14 changes: 14 additions & 0 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# .readthedocs.yaml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

build:
os: ubuntu-22.04
tools:
python: "3.11"

mkdocs:
configuration: mkdocs.yml
16 changes: 16 additions & 0 deletions docs/pgs/faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Frequently Asked Questions

## Can I use the Polygenic Score Calculation extension without an email address?
Yes, the extension can also be used with a username without an email. However, without an email, notifications are not sent, and access to genotyped data may be limited.

## Extending expiration date or reset download counter
Your data is available for 7 days. In case you need an extension, please let [us](/contact) know.

## How can I improve the download speed?
[aria2](https://aria2.github.io/) tries to utilize your maximum download bandwidth. Please keep in mind to raise the k parameter significantly (-k, --min-split-size=SIZE). You will otherwise hit the Michigan Imputation Server download limit for each file (thanks to Anthony Marcketta for point this out).

## Can I download all results at once?
We provide wget command for all results. Please open the results tab. The last column in each row includes direct links to all files.

## Can I perform PGS calculation locally?
Imputationserveris using a standalone tool called pgs-calc. It reads the imputed dosages from VCF files and uses them to calculate scores. It supports imputed genotypes from Michigan Imputation Server or TOPMed Imputation Server out of the box and score files from PGS Catalog or PRSWeb instances. In addition, own created score files containing chromosomal positions, both alleles and the effect size can be used easily. pgs-calc uses the chromosomal positions and alleles to find the corresponding dosages in genotype files, but provides also tools to resolve rsIDs in score files using dbSNP. Therefore, it can be applied to genotype files with variants that were not annotated with rsIDs. Moreover, the standalone version provides options to improve the coverage by using the provided proxy mapping file for Europeans or a custom population specific mapping file. pgs-calc is available at https://github.com/lukfor/pgs-calc.
109 changes: 109 additions & 0 deletions docs/pgs/getting-started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Polygenic Score Calculation

We provide an easy to use and user-friendly web interface to apply thousands of published polygenic risk scores to imputed genotypes in an efficient way.
By extending the popular Michigan Imputation Server the module integrates it seamless into the existing imputation workflow and enables users without knowledge in that field to take advantage of this method.
The graphical report includes all meta-data about the scores in a single place and helps users to understand and screen thousands of scores in an easy and intuitive way.

![pipeline.png](images%2Fpipeline.png)

An extensive quality control pipeline is executed automatically to detect and fix possible strand-flips and to filter out missing SNPs to prevent systematic errors (e.g. lower scores for individuals with missing or wrong aligned genetic data).

## Getting started

To utilize the Polygenic Score Calculation extension on ImputationServer, you must first [register](https://imputationserver.sph.umich.edu/index.html#!pages/register) for an account.
An activation email will be sent to the provided address. Once your email address is verified, you can access the service at no cost.

**Please note that the extension can also be used with a username without an email. However, without an email, notifications are not sent, and access to genotyped data may be limited.**

No dataset at hand? No problem, download our example dataset to test the PGS extension: [50-samples.zip](https://imputationserver.sph.umich.edu/resources/50-samples.zip).


When incorporating the Polygenic Score Calculation extension in your research, please cite the following papers:

> Das S, Forer L, Schönherr S, Sidore C, Locke AE, Kwong A, Vrieze S, Chew EY, Levy S, McGue M, Schlessinger D, Stambolian D, Loh PR, Iacono WG, Swaroop A, Scott LJ, Cucca F, Kronenberg F, Boehnke M, Abecasis GR, Fuchsberger C. [Next-generation genotype imputation service and methods](https://www.ncbi.nlm.nih.gov/pubmed/27571263). Nature Genetics 48, 1284–1287 (2016).
> Samuel A. Lambert, Laurent Gil, Simon Jupp, Scott C. Ritchie, Yu Xu, Annalisa Buniello, Aoife McMahon, Gad Abraham, Michael Chapman, Helen Parkinson, John Danesh, Jacqueline A. L. MacArthur and Michael Inouye. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nature Genetics. doi: 10.1038/s41588-021-00783-5 (2021).
## Setting up your first Polygenic Score Calculation job

1. [Log in](https://imputationserver.sph.umich.edu/index.html#!pages/login) with your credentials and navigate to the **Run** tab to initiate a new Polygenic Score Calculation job.
2. Please click on **"Polygenic Score calculation"** and the submission dialog appears.
3. The submission dialog allows you to specify job properties.

![](images/submit-job01.png)

The following options are available:


### Reference Panel

Our PGS extension offers genotype imputation from different reference panels. The most accurate and largest panel is **HRC (Version r1.1 2016)**. Please select one that fulfills your needs and supports the population of your input data:

- HRC (Version r1.1 2016)
- 1000 Genomes Phase 3 (Version 5)
- 1000 Genomes Phase 1 (Version 3)
- HapMap 2

More details about all available reference panels can be found [here](/pgs/reference-panels/).

### Upload VCF files from your computer

When using the file upload, data is uploaded from your local file system to Michigan Imputation Server. By clicking on **Select Files** an open dialog appears where you can select your VCF files:

![](images/upload-data01.png)

Multiple files can be selected using the `ctrl`, `cmd` or `shift` keys, depending on your operating system.
After you have confirmed your choice, all selected files are listed in the submission dialog:

![](images/upload-data02.png)

Please make sure that all files fulfill the [requirements](/prepare-your-data).


!!! important
Since version 1.7.2 URL-based uploads (sftp and http) are no longer supported. Please use direct file uploads instead.

### Build
Please select the build of your data. Currently the options **hg19** and **hg38** are supported. Michigan Imputation Server automatically updates the genome positions (liftOver) of your data. All reference panels are based on hg19 coordinates.

### Scores and Trait Category

Choose the precomputed Polygenic Score repository relevant to your study from the available options. Based on the selected repository, different trait categories appear and can be selected (e.g. Cancer scores):

![](images/pgs-repository.png)

More details about all available PGS repositories can be found [here](/pgs/scores/).

### Ancestry Estimation

You can enable ancestry estimation by selecting a reference population used to classify your uploaded samples. Currently, we support a worldwide panel based on HGDP.

## Start Polygenic Score Calculation

After agreeing to the *Terms of Service*, initiate the calculation by clicking on **Submit job**. The system will perform Input Validation and Quality Control immediately. If your data passes these steps, the job is added to the queue for processing.

![](images/queue01.png)

## Monitoring and Retrieving Results

- **Input Validation**: Verify the validity of your uploaded files and review basic statistics.

![](images/input-validation01.png)

- **Quality Control**: Examine the QC report and download statistics after the system filters variants based on various criteria.

![](images/quality-control02.png)

- **Polygenic Score Calculation**: Monitor the progress of the imputation and polygenic scores calculation in real time for each chromosome.

![](images/imputation01.png)

## Downloading Results

Upon completion, you will be notified by email if you enter your address on registration. A zip archive containing results can be downloaded directly from the server.

![](images/job-results.png)

Click on the filename to download results directly via a web-browser. For command line downloads, use the **share** symbol to obtain private links.

**Important**: All data is automatically deleted after 7 days. Download needed data within this timeframe. A reminder is sent 48 hours before data deletion.
Binary file added docs/pgs/images/imputation01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/pgs/images/input-validation01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/pgs/images/pgs-repository.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/pgs/images/pipeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/pgs/images/quality-control02.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/pgs/images/report-01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/pgs/images/report-02.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/pgs/images/submit-job01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/pgs/images/upload-data01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/pgs/images/upload-data02.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
38 changes: 38 additions & 0 deletions docs/pgs/output-files.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Output Files

The Polygenic Score Calculation Results CSV file provides Polygenic Score (PGS) values for different samples and associated identifiers.
Users can leverage this CSV file to analyze and compare Polygenic Score values across different samples. The data facilitates the investigation of genetic associations and their impact on specific traits or conditions.

## CSV Format

The CSV file consists of a header row and data rows:

### Header Row

- **sample**: Represents the identifier for each sample.
- **PGS000001, PGS000002, PGS000003, ...**: Columns representing different Polygenic Score values associated with the respective identifiers.

### Data Rows

- Each row corresponds to a sample and provides the following information:
- **sample**: Identifier for the sample.
- **PGS000001, PGS000002, PGS000003, ...**: Polygenic Score values associated with the respective identifiers for the given sample.

### Example

Here's an example row:

```csv
sample, PGS000001, PGS000002, PGS000003, ...
sample1, -4.485780284301654, 4.119604924228042, 0.0, -4.485780284301654
```

- **sample1**: Sample identifier.
- **-4.485780284301654**: Polygenic Score value for `PGS000001`.
- **4.119604924228042**: Polygenic Score value for `PGS000002`.
- **0.0**: Polygenic Score value for `PGS000003`.

**Note:**

- Polygenic Score values are provided as floating-point numbers.
- The absence of values (e.g., `0.0`) indicates a lack of Polygenic Score information for a particular identifier in a given sample.
11 changes: 11 additions & 0 deletions docs/pgs/pipeline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Pipeline

![pipeline.png](images%2Fpipeline.png)






## Ancestry estimation
We use LASER to perform principal components analysis (PCA) based on the genotypes of each sample and to place them into a reference PCA space which was constructed using a set of reference individuals [14]. We built reference coordinates based on 938 samples from the Human Genome Diversity Project (HGDP) [15] and labeled them by the ancestry categories proposed by the GWASCatalog [16] which are also used in PGS Catalog.
45 changes: 45 additions & 0 deletions docs/pgs/reference-panels.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Reference Panels for PGS Calculation

Our server offers PGS calculation from the following reference panels:


## HRC (Version r1.1 2016)

The HRC panel consists of 64,940 haplotypes of predominantly European ancestry.

| ||
| | |
| Number of Samples | 32,470 |
| Sites (chr1-22) | 39,635,008 |
| Chromosomes | 1-22, X|
| Website | [http://www.haplotype-reference-consortium.org](http://www.haplotype-reference-consortium.org); [HRC r1.1 Release Note](https://imputationserver.sph.umich.edu/start.html#!pages/hrc-r1.1) |

## 1000 Genomes Phase 3 (Version 5)

Phase 3 of the 1000 Genomes Project consists of 5,008 haplotypes from 26 populations across the world.

| ||
| | |
| Number of Samples | 2,504 |
| Sites (chr1-22) | 49,143,605 |
| Chromosomes | 1-22, X|
| Website | [http://www.internationalgenome.org](http://www.internationalgenome.org) |


## 1000 Genomes Phase 1 (Version 3)

| ||
| | |
| Number of Samples | 1,092 |
| Sites (chr1-22) | 28,975,367 |
| Chromosomes | 1-22, X|
| Website | [http://www.internationalgenome.org](http://www.internationalgenome.org) |

## HapMap 2

| ||
| | |
| Number of Samples | 60 |
| Sites (chr1-22) | 2,542,916 |
| Chromosomes | 1-22 |
| Website: | [http://www.hapmap.org](http://www.hapmap.org) |
14 changes: 14 additions & 0 deletions docs/pgs/report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Interactive Report

The created report contains a list of all scores, where each score has a different color based on its coverage. The color green indicates that the coverage is very high and nearly all SNPs from the score were also found in the imputed dataset. The color red indicates that very few SNPs were found and the coverage is therefore low.

![report.png](images/report-01.png)

In addition, the report includes detailed metadata for each score such as the number of variants, the number of well-imputed genotypes and the population used to construct the score. A direct link to PGS Catalog, Cancer PRSWeb or ExPRSWeb is also available for further investigation (e.g. for getting information about the method that was used to construct the score). Further, the report displays the distribution of the scores of all uploaded samples and can be interactively explored. This allows users to detect samples with either a high or low risk immediately.

Moreover, the report gives an overview of all estimated ancestries from the uploaded genotypes and compares them with the populations of the GWAS that was used to create the score.

![report.png](images/report-02.png)


If an uploaded sample with an unsupported population is detected, a warning message is provided and the sample is excluded from the summary statistics.
21 changes: 21 additions & 0 deletions docs/pgs/scores.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Scores

We support currently the following PGS repositories out of the box:

## PGS-Catalog

We use PGS Catalog as the source of scores for PGS Server (version 19. Jan 2023). the PGS Catalog is an online database that collects and annotates published scores and currently provides access to over 3,900 scores encompassing more than 580 traits.

> Samuel A. Lambert, Laurent Gil, Simon Jupp, Scott C. Ritchie, Yu Xu, Annalisa Buniello, Aoife McMahon, Gad Abraham, Michael Chapman, Helen Parkinson, John Danesh, Jacqueline A. L. MacArthur and Michael Inouye. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nature Genetics. doi: 10.1038/s41588-021-00783-5 (2021).
## Cancer-PRSweb

Collection of scores for major cancer traits.

> Fritsche LG, Patil S, Beesley LJ, VandeHaar P, Salvatore M, Ma Y, Peng RB, Taliun D, Zhou X, Mukherjee B: Cancer PRSweb: An Online Repository with Polygenic Risk Scores for Major Cancer Traits and Their Evaluation in Two Independent Biobanks. Am J Hum Genet 2020, 107(5):815-836.
## ExPRSweb

Collection of scores for common health-related exposures like body mass index or alcohol consumption.

> Ma Y, Patil S, Zhou X, Mukherjee B, Fritsche LG: ExPRSweb: An online repository with polygenic risk scores for common health-related exposures. Am J Hum Genet 2022, 109(10):1742-1760.
44 changes: 44 additions & 0 deletions docs/pgs/tutorial.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Testing Imputationserver PGS: Step by Step


To test Imputationserver PGS, please execute the following steps:

**0. Signup and create a login:**

Imputationserver PGS requires a login to access the Polygenic Risk Score (PGS) calculation service.
This login is crucial to maintain the security and privacy of any uploaded human genotype data.

Users have the flexibility to create this login either with or **without providing an email address**. Please visit the [signup](https://imputationserver.sph.umich.edu/index.html#!pages/register) page and proceed to create a login.

**1. Download the Example Dataset:**
Start by downloading the example dataset provided for testing the PGS extension. You can obtain the dataset by clicking on the following link: [50-samples.zip](https://imputationserver.sph.umich.edu/resources/50-samples.zip).

**2. Unpack the Data:**
After downloading the zip file, unzip or extract its contents to a location of your choice on your computer.

**3. Access Polygenic Risk Score Application:**
Navigate to the "Run" menu and select "Polygenic Risk Score" from the options.

**4. Choose 1000 Genomes Phase 3 Panel:**
In the Polygenic Risk Score application, select the "1000 Genomes Phase 3" panel as the reference dataset for imputation.

**5. Specify PGS Catalog and Trait:**
Identify and specify the Polygenic Score (PGS) Catalog you want to use for scoring.
Choose a relevant trait for the analysis, such as "Cancer".

**6. Optional Ancestry Estimation:**
Optionally, you can choose to include ancestry estimation in your analysis. This step may enhance the precision and interpreation of the results.

**7. Agree to Terms of Service:**
Before proceeding, make sure to read and agree to the Terms of Service provided by Imputationserver. It is essential to comply with the platform's terms and conditions.

**8. Submit the Job:**
After configuring all the necessary parameters, click on the "Submit" button to initiate the PGS calculation.

**9. Monitor Progress:**
Depending on the server load, the calculation may take a certain amount of time (about 30 minutes). Allow the process to complete.

**10. Download Results:**
Once the calculation is finished, you can view the results provided by Imputationserver PGS and download a report and all calcuated scores.

Congratulations! You have successfully tested Imputationserver PGS using the provided example dataset and configuration settings. Now you are ready to use the service with yout own dataset!
27 changes: 27 additions & 0 deletions docs/workshops/ASHG2023.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
**Workshop ASHG2023**

# Welcome to the Michigan Imputation Server Workshop!

## Workshop Title
The Michigan Imputation Server: Data Preparation, Genotype Imputation, and Data Analysis

## Topic
Statistical Genetics and Genetic Epidemiology

## Target Audience
Attendees interested in learning how to perform genotype imputation and use imputed genotypes in their research, especially trainees. There are no prerequisites for this workshop. Attendees are expected to follow materials on their personal laptops.

## Workshop Slides
You can download the slides of all workshop sessions [here](https://github.com/genepi/imputationserver-ashg/raw/main/slides/MIS_Workshop_2023.pdf). Please also have a look at the individual sessions below for additional training material.

## Links
- [Interactive Poll](http://pollev.com/ashg)
- [Workshop Website](https://www.ashg.org/meetings/2023meeting/2023-ashg-invited-workshop-schedule/)

## Workshop Facilitator(s)
- Christian Fuchsberger, christian.fuchsberger@eurac.edu (Eurac Research)
- Sebastian Schönherr, sebastian.schoenherr@i-med.ac.at (Medical University of Innsbruck)
- Lukas Forer, lukas.forer@i-med.ac.at (Medical University of Innsbruck)
- Xueling Sim, ephsx@nus.edu.sg (National University of Singapore)
- Saori Sakaue, ssakaue@broadinstitute.org (Broad Institute)
- Albert Smith, albertvs@umich.edu (University of Michigan)
18 changes: 18 additions & 0 deletions docs/workshops/ASHG2023/Session1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
**Workshop ASHG2023 > Session 1: Imputation and the Server**

# Server Links

[Michigan Imputation Server](https://imputationserver.sph.umich.edu)

[TOPMed Imputation Server](https://imputation.biodatacatalyst.nhlbi.nih.gov)


# Selected Literature

[Das S, Forer L, Schönherr S, Sidore C, Locke AE, Kwong A, Vrieze SI, Chew EY, Levy S, McGue M, Schlessinger D, Stambolian D, Loh PR, Iacono WG, Swaroop A, Scott LJ, Cucca F, Kronenberg F, Boehnke M, Abecasis GR, Fuchsberger C. Next-generation genotype imputation service and methods. Nat Genet. 2016 Oct;48(10):1284-1287. doi: 10.1038/ng.3656.](https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/27571263/)

[Das S, Abecasis GR, Browning BL. Genotype Imputation from Large Reference Panels. Annu Rev Genomics Hum Genet. 2018 Aug 31;19:73-96. doi: 10.1146/annurev-genom-083117-021602.](https://arjournals.annualreviews.org/doi/10.1146/annurev-genom-083117-021602?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed)

[Fuchsberger C, Abecasis GR, Hinds DA. minimac2: faster genotype imputation. Bioinformatics. 2015 Mar 1;31(5):782-4. doi: 10.1093/bioinformatics/btu704. Epub 2014 Oct 22. PMID: 25338720; PMCID: PMC4341061.](https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/25338720/)

[Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet. 2012 Jul 22;44(8):955-9. doi: 10.1038/ng.2354. PMID: 22820512; PMCID: PMC3696580.](https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/22820512/)
Loading

0 comments on commit d24f2e5

Please sign in to comment.