Skip to content

Modular deep learning pipeline for simulating, compressing, and performing likelihood-based inference on high-dimensional cosmological fields using autoencoders and normalizing flows.

Notifications You must be signed in to change notification settings

hassanfarhan777/autoencoder-normalizing-flow-cosmology

Repository files navigation

Deep Generative Modeling for Cosmological Fields

This repository presents a modular, research-level pipeline for simulating, compressing, and performing rigorous likelihood-based inference on high-dimensional Gaussian Random Fields (GRFs). The approach leverages state-of-the-art deep learning techniques—autoencoders and normalizing flows (RealNVP)—to enable scientific parameter estimation in cosmology and beyond.


Project Overview

Modern cosmological analysis often involves high-dimensional data (such as cosmic fields or sky maps) where direct likelihood evaluation is intractable. This project demonstrates how to:

  • Generate synthetic cosmological fields with controlled parameters,
  • Compress them using deep convolutional autoencoders into informative latent representations,
  • Learn flexible probabilistic models (normalizing flows) for these latent spaces,
  • Perform explicit, likelihood-based inference on the original physical parameters.

The resulting pipeline not only enables parameter recovery from simulated data but also provides a framework for deploying these techniques on real astronomical observations or other scientific fields.


Contents

  • new_autoencoder.ipynb
    • End-to-end notebook for synthetic GRF data generation, autoencoder model training, validation, and visualization of reconstructions.
    • Modular code blocks for adjusting model architecture and dataset parameters.
  • NVP_Flow:_conditional.ipynb
    • Implements conditional RealNVP flows in the autoencoder latent space.
    • Enables likelihood evaluation and parameter inference as a function of cosmological parameters.
    • Includes visualization routines for likelihood surfaces and parameter posteriors.
  • NVP_flow.ipynb
    • Standalone demo of RealNVP on toy problems (e.g., two moons) to build intuition before applying to scientific data.
  • grf_autoencoder.pth
    • Pretrained model weights for the autoencoder, ready for immediate use or fine-tuning.
  • figs/
    • Collection of sample figures: reconstructions, loss curves, flow samples, and likelihood maps.

Key Strengths

  • Synthetic Data with Physical Motivation:
    The code generates GRFs from parametric power spectra, mimicking real cosmological field statistics. This allows controlled benchmarking of inference pipelines.

  • Modular Deep Learning Components:
    Easily swap or extend model architectures. Autoencoders are built for flexibility (depth, bottleneck size, activation), supporting rapid experimentation.

  • Explicit Likelihood Evaluation:
    Conditional normalizing flows provide tractable likelihoods in compressed spaces—enabling rigorous, Bayesian-like parameter estimation.

  • Visualization & Diagnostics:
    Notebooks include clear visual outputs: sample fields, reconstructions, latent space structure, flow-generated samples, and likelihood surfaces over parameter space.

  • Fully Reproducible & Extensible:
    Synthetic data generation is integrated—no external datasets required. All hyperparameters and random seeds can be controlled for reproducibility.


How to Use

  1. Clone the repository and install dependencies:
    pip install numpy scipy matplotlib torch scikit-learn tqdm
    
  2. Run new_autoencoder.ipynb to:
    • Generate synthetic GRF data,
    • Train the convolutional autoencoder,
    • Visualize reconstructions and training curves.
  3. Run NVP_Flow:_conditional.ipynb to:
    • Train a conditional normalizing flow in the latent space,
    • Evaluate and visualize likelihood surfaces over cosmological parameter space.
  4. (Optional) Run NVP_flow.ipynb to experiment with RealNVP on toy datasets for intuition-building.
  5. Browse the figs/ directory to see generated figures and outcomes.

The pipeline is designed to be run out-of-the-box; if data files are missing, they will be generated automatically.



Repository Structure

├── figs/
│   └── [Generated figures: reconstruction plots, likelihood maps, etc.]
├── grf_autoencoder.pth
├── new_autoencoder.ipynb
├── NVP_Flow:_conditional.ipynb
├── NVP_flow.ipynb
├── requirements.txt


Future Improvements

  • More expressive generative models:
    Add deeper or alternative architectures (e.g., ResNet-based autoencoders, GLOW/MAF/Neural Spline Flows).

  • Uncertainty quantification:
    Overlay credible contours on likelihood maps; integrate full Bayesian posteriors.

  • Application to real datasets:
    Adapt data pipeline for telescope or simulation outputs; add data augmentation or physical systematics.

  • Experiment tracking & reproducibility:
    Integrate MLflow or Weights & Biases for experiment management and hyperparameter sweeps.

  • Comprehensive testing:
    Add unit tests for all model components and utility functions.


Contact

Mohammad Farhan Hassan
hassan.farhan7777@gmail.com


This project demonstrates the intersection of deep generative modeling and scientific inference—showcasing modern ML techniques applied to synthetic cosmological data, with extensibility to many domains in science and engineering.


Feel free to fork this repository or open issues for questions, suggestions, or collaborations!

About

Modular deep learning pipeline for simulating, compressing, and performing likelihood-based inference on high-dimensional cosmological fields using autoencoders and normalizing flows.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published