ScaDS · haesleinhuepf · Aug 12, 2024 · Aug 12, 2024 · Aug 12, 2024 · Aug 12, 2024
diff --git a/.gitignore b/.gitignore
@@ -21,4 +21,5 @@ docs/29_algorithm_validation/ideas.ipynb
 docs/29_algorithm_validation/solution for exercise - metrics to investigate segmentation results.ipynb
 docs/22_feature_extraction/blobs_analysis.csv
 data/S-BIAD634
-docs/71_fine_tuning_hf/haesleinhuepf
+docs/71_fine_tuning_hf/haesleinhuepf
+docs/90_variational_auto_encoders/data
diff --git a/docs/90_variational_auto_encoders/01_intro_to_vae.ipynb b/docs/90_variational_auto_encoders/01_intro_to_vae.ipynb
diff --git a/docs/90_variational_auto_encoders/02_vae_architecture.ipynb b/docs/90_variational_auto_encoders/02_vae_architecture.ipynb
@@ -0,0 +1,215 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# VAE Architecture\n",
+    "\n",
+    "In this notebook, we will delve into the architecture of Variational Auto-Encoders (VAEs). We will explain the components of a VAE, including the encoder and decoder, and provide code examples for building a VAE architecture using PyTorch."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Encoder\n",
+    "\n",
+    "The encoder is a neural network that takes the input data and maps it to a latent space. The output of the encoder is the parameters of a probability distribution in the latent space, typically the mean and log variance."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "\n",
+    "class Encoder(nn.Module):\n",
+    "    def __init__(self, input_dim, hidden_dim, latent_dim):\n",
+    "        super(Encoder, self).__init__()\n",
+    "        self.fc1 = nn.Linear(input_dim, hidden_dim)\n",
+    "        self.fc2_mean = nn.Linear(hidden_dim, latent_dim)\n",
+    "        self.fc2_log_var = nn.Linear(hidden_dim, latent_dim)\n",
+    "        self.relu = nn.ReLU()\n",
+    "    \n",
+    "    def forward(self, x):\n",
+    "        h = self.relu(self.fc1(x))\n",
+    "        mean = self.fc2_mean(h)\n",
+    "        log_var = self.fc2_log_var(h)\n",
+    "        return mean, log_var"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Decoder\n",
+    "\n",
+    "The decoder is a neural network that takes samples from the latent distribution and maps them back to the original data space. The output of the decoder is the reconstructed data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class Decoder(nn.Module):\n",
+    "    def __init__(self, latent_dim, hidden_dim, output_dim):\n",
+    "        super(Decoder, self).__init__()\n",
+    "        self.fc1 = nn.Linear(latent_dim, hidden_dim)\n",
+    "        self.fc2 = nn.Linear(hidden_dim, output_dim)\n",
+    "        self.relu = nn.ReLU()\n",
+    "        self.sigmoid = nn.Sigmoid()\n",
+    "    \n",
+    "    def forward(self, z):\n",
+    "        h = self.relu(self.fc1(z))\n",
+    "        x_reconstructed = self.sigmoid(self.fc2(h))\n",
+    "        return x_reconstructed"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## VAE Model\n",
+    "\n",
+    "The VAE model combines the encoder and decoder, and includes a sampling layer to sample from the latent distribution."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class VAE(nn.Module):\n",
+    "    def __init__(self, encoder, decoder):\n",
+    "        super(VAE, self).__init__()\n",
+    "        self.encoder = encoder\n",
+    "        self.decoder = decoder\n",
+    "    \n",
+    "    def forward(self, x):\n",
+    "        mean, log_var = self.encoder(x)\n",
+    "        std = torch.exp(0.5 * log_var)\n",
+    "        epsilon = torch.randn_like(std)\n",
+    "        z = mean + std * epsilon\n",
+    "        x_reconstructed = self.decoder(z)\n",
+    "        return x_reconstructed, mean, log_var"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Loss Function\n",
+    "\n",
+    "The loss function for a VAE consists of two terms: the reconstruction loss and the KL divergence. The reconstruction loss measures how well the decoder can reconstruct the input data from the latent space, while the KL divergence measures how close the learned distribution is to a prior distribution."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def vae_loss(x, x_reconstructed, mean, log_var):\n",
+    "    reconstruction_loss = nn.functional.binary_cross_entropy(x_reconstructed, x, reduction='sum')\n",
+    "    kl_divergence = -0.5 * torch.sum(1 + log_var - mean.pow(2) - log_var.exp())\n",
+    "    return reconstruction_loss + kl_divergence"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Training the VAE\n",
+    "\n",
+    "Let's train the VAE on a simple dataset, such as the MNIST dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch 1, Loss: 182.64177789713543\n",
+      "Epoch 2, Loss: 164.48488276367186\n",
+      "Epoch 3, Loss: 161.2271118815104\n",
+      "Epoch 4, Loss: 159.09910853678386\n",
+      "Epoch 5, Loss: 157.53306800944011\n",
+      "Epoch 6, Loss: 156.28730290527344\n",
+      "Epoch 7, Loss: 155.26126284179688\n",
+      "Epoch 8, Loss: 154.44241954752604\n",
+      "Epoch 9, Loss: 153.73177485351562\n",
+      "Epoch 10, Loss: 153.18850033365885\n"
+     ]
+    }
+   ],
+   "source": [
+    "from torchvision import datasets, transforms\n",
+    "import torch.optim as optim\n",
+    "\n",
+    "# Load the MNIST dataset\n",
+    "transform = transforms.Compose([transforms.ToTensor(), transforms.Lambda(lambda x: x.view(-1))])\n",
+    "train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)\n",
+    "test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)\n",
+    "train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)\n",
+    "test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)\n",
+    "\n",
+    "# Define the VAE model\n",
+    "input_dim = 28 * 28\n",
+    "hidden_dim = 256\n",
+    "latent_dim = 2\n",
+    "encoder = Encoder(input_dim, hidden_dim, latent_dim)\n",
+    "decoder = Decoder(latent_dim, hidden_dim, input_dim)\n",
+    "vae = VAE(encoder, decoder)\n",
+    "\n",
+    "# Define the optimizer\n",
+    "optimizer = optim.Adam(vae.parameters(), lr=1e-3)\n",
+    "\n",
+    "# Train the model\n",
+    "num_epochs = 10\n",
+    "for epoch in range(num_epochs):\n",
+    "    vae.train()\n",
+    "    train_loss = 0\n",
+    "    for x, _ in train_loader:\n",
+    "        optimizer.zero_grad()\n",
+    "        x_reconstructed, mean, log_var = vae(x)\n",
+    "        loss = vae_loss(x, x_reconstructed, mean, log_var)\n",
+    "        loss.backward()\n",
+    "        train_loss += loss.item()\n",
+    "        optimizer.step()\n",
+    "    print(f'Epoch {epoch + 1}, Loss: {train_loss / len(train_loader.dataset)}')"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}