Significance Analysis for Clustering Single-Cell RNA-Sequencing Data

Contact information: igrabski[at]nygenome[dot]org

We introduce a model-based hypothesis testing approach for evaluating single-cell RNA-sequencing (scRNA-seq) clusters. This approach is implemented in two ways: (1) a stand-alone clustering pipeline with built-in hypothesis testing to produce clusters corresponding to distinct cell populations and (2) a post-hoc method that can evaluate the statistical significance of any provided set of clusters.

Our package can be installed as follows:

# install.packages("devtools")
devtools::install_github("igrabski/sc-SHC")

Usage

To use the stand-alone clustering pipeline, sc-SHC (single-cell significance of hierarchical clustering), the following command can be used:

library(scSHC)
clusters <- scSHC(data)

Here, data should be a (possibly sparse) matrix where the rows are genes and the columns are cells. Optionally, the following parameters can be adjusted:

batch, which is NULL by default, can be a character vector of known batch labels.
alpha, which controls the family-wise error rate (default 0.05). If the goal is discovery, consider setting a more lenient alpha, such as 0.25.
num_features, which controls the number of genes used (default 2500).
num_PCs, which controls the number of principal components (default 30).
parallel, which is TRUE by default, can be set to FALSE to disable parallelization.
cores, which controls the number of cores used if parallel = T (default 2).

To evaluate the significance of any provided set of clusters, the following command can be used:

library(scSHC)
new_clusters <- testClusters(data, as.character(clusters))

Here, data is the same as before, and clusters should be a character vector of cluster labels, corresponding to cells in the same order as the columns of the data matrix. The same parameters as above can be adjusted. Additionally, if desired, a given set of genes can be provided through the parameter var.genes rather than allowing our approach to identify informative genes on its own.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
R		R
data		data
man		man
vignettes		vignettes
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md
scSHC.Rproj		scSHC.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Significance Analysis for Clustering Single-Cell RNA-Sequencing Data

Usage

About

Releases 1

Packages

Contributors 2

Languages

igrabski/sc-SHC

Folders and files

Latest commit

History

Repository files navigation

Significance Analysis for Clustering Single-Cell RNA-Sequencing Data

Usage

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages