Skip to content

lindgrengroup/p50_Infertility

Repository files navigation

p50 Infertility Project

This repository contains code used for the p50 infertility project. There are two separate tasks:

  • Running CELLECT on ovary datasets with infertility and hormone GWAS sumstats to prioritise etiologic cell types.
  • Finding marker genes for clusters from the ovary datasets.

Using CELLEX and CELLECT on single cell RNA-seq ovary datasets with infertility GWAS summary statistics

To use CELLEX and CELLECT, follow the instructions on their github repositories. Once the CELLECT directory is cloned from their github, create a subdirectory p50 for this project.

Directory structure

The basic directory structure for the p50 infertility project work:

p50
|-- CELLECT_OUT_p50
|   |-- CELLECT-GENES
|   |-- CELLECT-LDSC
|   `-- CELLECT-MAGMA
|-- cluster_markers
|   |-- GSE118127
|   |-- GSE202601
|   `-- GSE213216
|-- data
|   |-- counts
|   |-- esmu
|   `-- sumstats
|-- dbSNP
|-- logs
`-- plots
  • CELLECT_OUT_p50 - Created when CELLECT is run. Contains CELLECT output files.
  • cluster_markers - Store cluster_marker_genes output files here.
  • data - Store input data for CELLEX and CELLECT here.
  • dbSNP - Store the MarkerName to RSID map file here.
  • logs - For logs.
  • plots - Store plots generated from CELLECT results here.

Pipeline

Once the directory structure is set up, follow the pipeline below.

  1. Download data
    We need scRNA-seq count data and the corresponding cell type annotations metadata. We also need GWAS summary statistics (in-house). See datasets for more information.
  2. Set up environments
    Download required packages/create the recommended conda environments. More information is given in set_up.
  3. Prepare ESMU files (run CELLEX)
    Using the counts and cell type annotations metadata as input, we use CELLEX to produce expression specificity files (ESMU). See prepare_esmu for R and python code used to prepare data and run CELLEX.
  4. Prepare sumstats file
    Use the pipeline provided in prepare_sumstats to prepare the GWAS summary statistics for input to CELLECT.
  5. Run CELLECT
    Using munged summary stats and ESMU files as input, we use CELLECT to prioritise etilogical cell types. Use the config_p50.yml file provided. See run_cellect.
  6. Visualisation
    Use R to visualise the results. See visualisation.

Finding marker genes for clusters

Find cluster gene markers for three single cell RNA-seq ovary datasets. See cluster_marker_genes.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published