Skip to content

Commit

Permalink
added doc on DNAme and smoothing
Browse files Browse the repository at this point in the history
  • Loading branch information
plger committed May 19, 2024
1 parent 7eee155 commit b0a852b
Show file tree
Hide file tree
Showing 6 changed files with 88 additions and 20 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Package: epiwraps
Type: Package
Title: epiwraps: Wrappers for plotting and dealing with epigenomics data
Version: 0.99.92
Date: 2024-05-16
Version: 0.99.93
Date: 2024-05-20
Authors@R: c(
person("Pierre-Luc", "Germain", email="pierre-luc.germain@hest.ethz.ch",
role=c("cre","aut"), comment=c(ORCID="0000-0003-3418-4218")),
Expand Down
12 changes: 0 additions & 12 deletions R/ESE.R
Original file line number Diff line number Diff line change
@@ -1,15 +1,3 @@
#' @rdname exampleESE
#' @name exampleESE
#' @aliases exampleESE
#'
#' @title Example EnrichmentSE object
#'
#' @description
#' Small sample signal from ENCODE ChIP-seq for H3K27ac, H3K4me3 and p300,
#' around some p300 binding sites and TSS in mESC.
#'
#' @return a named character vector of length 1.
NULL

#' @import methods
#' @importClassesFrom SummarizedExperiment SummarizedExperiment RangedSummarizedExperiment
Expand Down
28 changes: 28 additions & 0 deletions R/data.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#' @rdname exampleESE
#' @name exampleESE
#' @aliases exampleESE
#'
#' @title Example EnrichmentSE object
#'
#' @description
#' Small sample signal from ENCODE ChIP-seq for H3K27ac, H3K4me3 and p300,
#' around some p300 binding sites and TSS in mESC.
#'
#' @return a named character vector of length 1.
NULL


#' @rdname exampleDNAme
#' @name exampleDNAme
#' @aliases geneBodies
#'
#' @title Example DNAme data on some active gene bodies
#'
#' @description
#' A GRanges object (called `geneBodies`) containing the coordinates of some
#' gene bodies on chr9 of hg38, as well as a GPos object (called `exampleDNAme`)
#' containing methylation percentages at CpGs in those regions. Taken from WG
#' bisulfite sequencing data from A549, from ENCODE accession ID ENCFF948WVD.
#'
#' @return a named character vector of length 2.
NULL
8 changes: 4 additions & 4 deletions R/signal2Matrix.R
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,10 @@
#' @param verbose Logical; whether to print processing information
#' @param ret The type of output to return, either an "EnrichmentSE" object
#' (default), or a simple list of signal matrices ("list").
#' @param ... Passed to \code{\link[EnrichedHeatmap]{as.normalizedMatrix}} when
#' reading bigwig files, or to \code{\link{bam2bw}} when reading bam files.
#' For example, this can be used to pass arguments to
#' `normalizeToMatrix` such as `smooth=TRUE`.
#' @param ... Passed to \code{\link[EnrichedHeatmap]{normalizeToMatrix}} or
#' \code{\link[EnrichedHeatmap]{as.normalizedMatrix}}, or to
#' \code{\link{bam2bw}} when reading bam files. For example, this can be used
#' to pass arguments to `normalizeToMatrix` such as `smooth=TRUE`.
#'
#' @return A list of `normalizeToMatrix` objects
#' @export
Expand Down
Binary file added data/exampleDNAme.RData
Binary file not shown.
56 changes: 54 additions & 2 deletions vignettes/multiRegionPlot.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -283,8 +283,6 @@ the `score` method:
head(score(exampleESE))
```



# Plotting aggregated signals

It is also possible to plot only the average signals across regions. To do this,
Expand Down Expand Up @@ -321,6 +319,60 @@ ggplot(d, aes(position, mean, colour=sample)) +
```



# Visualizing DNAme and sparse signals

Nucleotide-resolution DNA methylation (as obtained from bisulfite sequencing)
signal differs from the signals used throughout this vignette in that it is
not continuous across the genome, but specifically at C or CpG nucleotides which
have a variable density throughout the genome. As a consequence, it is likely
that some of the plotting bins do not contain a CpG, in which case they get
assigned a value of 0, even though they could be in a completely methylated
region. For this reason, it is advisable to smooth DNA methylation signals for
the purpose of visualization.

As an example, let's look at the gene bodies of some active genes from chr8 of
the A549 cell lines:

```{r}
data("exampleDNAme")
head(exampleDNAme)
```

As is typical of DNAme data, the object is a GRanges object (or more
specifically a GPos object, since all ranges have a width of 1 nucleotide) with,
in the score column, the percentage of DNA methylation. Let's see what happens
if we plot a heatmap of this signal, with and without smoothing (we use
`type="scaled"` to scale the gene bodies to the same size, since these can have
very different sizes) :

```{r}
o1 <- signal2Matrix(list(noSmooth=exampleDNAme), geneBodies, type="scaled")
o2 <- signal2Matrix(list(smoothed=exampleDNAme), geneBodies, type="scaled",
smooth=TRUE)
o <- cbind(o1,o2)
plotEnrichedHeatmaps(o, scale_title="%\nmethylation", axis_name=c("TSS","TES"))
```

Both heatmaps show a very clear absence of DNA methylation at the promoter of
these genes (upstream of the TSS) an predominantly methylated gene bodies.
However they disagree substantially on the methylation levels upstream the
promoter and downstream the transcription end sites (TES). This is because of
the density of these regions in (covered) CpG nucleotides. Since most genes are
rather long, most of the bins in the heatmap contain a CpG, leading to an actual
methylation signal. In the flanking regions, however, this is not necessarily
the case, and the non-smoothed heatmap does not distinguish bins that are
unmethylated form bins for which there is no information. Instead, the smoothed
heatmap on the right uses neighborhing bins to estimate the methylation status
of each bin, effectively filling out the gaps. In doing so it provides the
truthful representation, i.e. that the regions downstream of the genes and
upstream of the promoters are, most of the time, as methylated as the gene
bodies.

Smoothing is performed by `r BiocStyle::Biocpkg("EnrichedHeatmap")`; see
`?EnrichedHeatmap::normalizeToMatrix` for more information/customization.


<br/><br/>


Expand Down

0 comments on commit b0a852b

Please sign in to comment.