This repository presents a modular, research-level pipeline for simulating, compressing, and performing rigorous likelihood-based inference on high-dimensional Gaussian Random Fields (GRFs). The approach leverages state-of-the-art deep learning techniques—autoencoders and normalizing flows (RealNVP)—to enable scientific parameter estimation in cosmology and beyond.
Modern cosmological analysis often involves high-dimensional data (such as cosmic fields or sky maps) where direct likelihood evaluation is intractable. This project demonstrates how to:
- Generate synthetic cosmological fields with controlled parameters,
- Compress them using deep convolutional autoencoders into informative latent representations,
- Learn flexible probabilistic models (normalizing flows) for these latent spaces,
- Perform explicit, likelihood-based inference on the original physical parameters.
The resulting pipeline not only enables parameter recovery from simulated data but also provides a framework for deploying these techniques on real astronomical observations or other scientific fields.
new_autoencoder.ipynb
- End-to-end notebook for synthetic GRF data generation, autoencoder model training, validation, and visualization of reconstructions.
- Modular code blocks for adjusting model architecture and dataset parameters.
NVP_Flow:_conditional.ipynb
- Implements conditional RealNVP flows in the autoencoder latent space.
- Enables likelihood evaluation and parameter inference as a function of cosmological parameters.
- Includes visualization routines for likelihood surfaces and parameter posteriors.
NVP_flow.ipynb
- Standalone demo of RealNVP on toy problems (e.g., two moons) to build intuition before applying to scientific data.
grf_autoencoder.pth
- Pretrained model weights for the autoencoder, ready for immediate use or fine-tuning.
figs/
- Collection of sample figures: reconstructions, loss curves, flow samples, and likelihood maps.
-
Synthetic Data with Physical Motivation:
The code generates GRFs from parametric power spectra, mimicking real cosmological field statistics. This allows controlled benchmarking of inference pipelines. -
Modular Deep Learning Components:
Easily swap or extend model architectures. Autoencoders are built for flexibility (depth, bottleneck size, activation), supporting rapid experimentation. -
Explicit Likelihood Evaluation:
Conditional normalizing flows provide tractable likelihoods in compressed spaces—enabling rigorous, Bayesian-like parameter estimation. -
Visualization & Diagnostics:
Notebooks include clear visual outputs: sample fields, reconstructions, latent space structure, flow-generated samples, and likelihood surfaces over parameter space. -
Fully Reproducible & Extensible:
Synthetic data generation is integrated—no external datasets required. All hyperparameters and random seeds can be controlled for reproducibility.
- Clone the repository and install dependencies:
pip install numpy scipy matplotlib torch scikit-learn tqdm
- Run
new_autoencoder.ipynb
to:- Generate synthetic GRF data,
- Train the convolutional autoencoder,
- Visualize reconstructions and training curves.
- Run
NVP_Flow:_conditional.ipynb
to:- Train a conditional normalizing flow in the latent space,
- Evaluate and visualize likelihood surfaces over cosmological parameter space.
- (Optional) Run
NVP_flow.ipynb
to experiment with RealNVP on toy datasets for intuition-building. - Browse the
figs/
directory to see generated figures and outcomes.
The pipeline is designed to be run out-of-the-box; if data files are missing, they will be generated automatically.
├── figs/
│ └── [Generated figures: reconstruction plots, likelihood maps, etc.]
├── grf_autoencoder.pth
├── new_autoencoder.ipynb
├── NVP_Flow:_conditional.ipynb
├── NVP_flow.ipynb
├── requirements.txt
-
More expressive generative models:
Add deeper or alternative architectures (e.g., ResNet-based autoencoders, GLOW/MAF/Neural Spline Flows). -
Uncertainty quantification:
Overlay credible contours on likelihood maps; integrate full Bayesian posteriors. -
Application to real datasets:
Adapt data pipeline for telescope or simulation outputs; add data augmentation or physical systematics. -
Experiment tracking & reproducibility:
Integrate MLflow or Weights & Biases for experiment management and hyperparameter sweeps. -
Comprehensive testing:
Add unit tests for all model components and utility functions.
Mohammad Farhan Hassan
hassan.farhan7777@gmail.com
This project demonstrates the intersection of deep generative modeling and scientific inference—showcasing modern ML techniques applied to synthetic cosmological data, with extensibility to many domains in science and engineering.
Feel free to fork this repository or open issues for questions, suggestions, or collaborations!