Skip to content

A Julia package for exponential family principal component analysis (E-PCA).

License

Notifications You must be signed in to change notification settings

sisl/ExpFamilyPCA.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ExpFamilyPCA.jl

Build Status Dev-Docs

ExpFamilyPCA.jl is a Julia package for performing exponential principal component analysis (EPCA). ExpFamilyPCA.jl supports custom objectives and includes fast implementations for several common distributions.

Documentation

For detailed documentation on each function and additional examples, please refer to the documentation.

Installation

To install the package, use the Julia package manager. In the Julia REPL, type:

using Pkg; Pkg.add("ExpFamilyPCA")

Quickstart

using ExpFamilyPCA

indim = 5
X = rand(1:100, (10, indim))  # data matrix to compress
outdim = 3  # target compression dimension

poisson_epca = PoissonEPCA(indim, outdim)

X_compressed = fit!(poisson_epca, X; maxiter=200, verbose=true)

Y = rand(1:100, (3, indim))  # test data
Y_compressed = compress(poisson_epca, Y; maxiter=200, verbose=true)

X_reconstructed = decompress(poisson_epca, X_compressed)
Y_reconstructed = decompress(poisson_epca, Y_compressed)

Supported Models

Distribution ExpFamilyPCA.jl Objective Link Function $g(\theta)$
Bernoulli BernoulliEPCA $\log(1 + e^{\theta-2x\theta})$ $\frac{e^\theta}{1+e^\theta}$
Binomial BinomialEPCA $n \log(1 + e^\theta) - x\theta$ $\frac{ne^\theta}{1+e^\theta}$
Continuous Bernoulli ContinuousBernoulliEPCA $\log\Bigg(\frac{e^\theta -1}{\theta}\Bigg) - x\theta$ $\frac{\theta - 1}{\theta} + \frac{1}{e^\theta - 1}$
Gamma1 GammaEPCA or ItakuraSaitoEPCA $-\log(-\theta) - x\theta$ $-1/\theta$
Gaussian2 GaussianEPCA or NormalEPCA $\frac{1}{2}(x - \theta)^2$ $\theta$
Negative Binomial NegativeBinomialEPCA $-r \log(1 - e^\theta) - x\theta$ $\frac{-re^\theta}{e^\theta - 1}$
Pareto ParetoEPCA $-\log(-1-\theta) + \theta \log m - x \theta$ $\log m - \frac{1}{\theta+1}$
Poisson3 PoissonEPCA $e^\theta - x \theta$ $e^\theta$
Weibull WeibullEPCA $-\log(-\theta) - x \theta$ $-1/\theta$

1: The gamma EPCA objective is equivalent to minimizing the Itakura-Saito distance.

2: The Gaussian EPCA objective is equivalent to usual PCA

3: The Poisson EPCA objective is equivalent to minimizing the generalized KL divergence.

Custom Distributions

When working with custom distributions, it is often the case that certain specifications are more convenient than others. For example, writing the log-partition of the gamma distribution $G(\theta) = -\log(-\theta)$ and its derivative $g(\theta) = -1 / \theta$ is much simpler than coding the Itakura-Saito distance

$$ \frac{1}{2\pi} \int_{-\pi}^{\pi} \Bigg[ \frac{P(\omega)}{\hat{P}(\omega)} - \log \frac{P(\omega)}{\hat{P}{\omega}} - 1\Bigg] d\omega $$

effeciently in Julia even though the two are equivalent.

ExpFamilyPCA.jl includes 10 constructors for custom distributions. All constrcutors are theoretically equivalent though some may be faster in practice. To showcase each constructor, we walk through how to construct a Poisson EPCA instance with each constructor. First, we provide a quick recap on notation.

  1. $G$ is the log-partition function. $G$ is strictly convex and continuously differentiable.
  2. $g$ is the link function. It is the derivative of the log-partition $\nabla_\theta G(\theta) = g(\theta)$ and the inverse of the derivative of the convex conjugate of the log-parition $g = f^{-1}$.
  3. $F$ is the convex conjugate (under the Legendre transform) of the log-partition $F = G^*$.
  4. $f$ is the derivative of the convex conjugate $\nabla_x F(x) = f(x)$ and the inverse of the link function $f = g^{-1}$.
  5. $B_F(p | q)$ is the Bregman divergence induced from $F$.

For the Poisson distribution, these terms take the following values.

Term Math Julia
$G(\theta)$ $e^x$ G = exp
$g(\theta)$ $e^x$ g = exp
$F(x)$ $x \log x - x$ F(x) = x * log(x) - x
$f(x)$ $\log x$ f(x) = log(x)
$B_F(p | q)$ $p \log(p/q) + q - p$ B(p, q) = p * log(p / q) + q - p
$B_F(x | g(\theta))$ $e^\theta - x\theta + x \log x - x$ Bg(x, θ) = e^θ - x * θ + x * log(x) - x

The Bregman distance can also be specified using Distances.jl

using Distances

B = Distances.gkl_divergence

Constructors

EPCA(indim, outdim, F, g, Val((:F, :g)))
EPCA(indim, outdim, F, f, Val((:F, :f)))
EPCA(indim, outdim, F, Val((:F)))
EPCA(indim, outdim, F, G, Val((:F, :G)))
EPCA(indim, outdim, G, g, Val((:G, :g)))
EPCA(indim, outdim, G, Val((:G)))
EPCA(indim, outdim, B, g, Val((:B, :g)))
EPCA(indim, outdim, B, G, Val((:B, :G)))
EPCA(indim, outdim, Bg, g, Val((:Bg, :g)))
EPCA(indim, outdim, Bg, G, Val((:Bg, :G)))

Tips and Tricks

Metaprogramming

Dropping Constants

Selecting Constructors

Sobol Initialization

Contributing

Contributions are welcome! If you want to contribute, please fork the repository, create a new branch, and submit a pull request. Before contributing, please make sure to update tests as appropriate.