From 02e1b5a85bde9d4da9387ee89b117f9e33babcd7 Mon Sep 17 00:00:00 2001 From: Robert Haase Date: Mon, 12 Aug 2024 10:36:05 +0200 Subject: [PATCH 1/4] VAE notebooks (copilot-generated) --- .../01_intro_to_vae.ipynb | 252 ++++++++++++++++ .../02_vae_architecture.ipynb | 167 +++++++++++ .../03_vae_training.ipynb | 228 +++++++++++++++ .../04_vae_applications.ipynb | 275 ++++++++++++++++++ docs/_toc.yml | 8 + docs/intro.md | 1 + 6 files changed, 931 insertions(+) create mode 100644 docs/90_variational_auto_encoders/01_intro_to_vae.ipynb create mode 100644 docs/90_variational_auto_encoders/02_vae_architecture.ipynb create mode 100644 docs/90_variational_auto_encoders/03_vae_training.ipynb create mode 100644 docs/90_variational_auto_encoders/04_vae_applications.ipynb diff --git a/docs/90_variational_auto_encoders/01_intro_to_vae.ipynb b/docs/90_variational_auto_encoders/01_intro_to_vae.ipynb new file mode 100644 index 00000000..8820a35d --- /dev/null +++ b/docs/90_variational_auto_encoders/01_intro_to_vae.ipynb @@ -0,0 +1,252 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Introduction to Variational Auto-Encoders (VAEs)\n", + "\n", + "Variational Auto-Encoders (VAEs) are a type of generative model that are widely used in machine learning for tasks such as image generation, anomaly detection, and data compression. In this notebook, we will introduce the basic concepts and theory behind VAEs, and provide simple code examples to help you get started with implementing them." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Basic Concepts\n", + "\n", + "VAEs are a type of auto-encoder, which is a neural network used to learn efficient representations of data, typically for the purpose of dimensionality reduction. However, unlike traditional auto-encoders, VAEs are probabilistic models that learn to generate new data points similar to the training data.\n", + "\n", + "The key idea behind VAEs is to learn a latent space representation of the data, which is a lower-dimensional space that captures the essential features of the data. This latent space is then used to generate new data points by sampling from a probability distribution." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Theory\n", + "\n", + "The VAE consists of two main components: the encoder and the decoder. The encoder maps the input data to a latent space, while the decoder maps the latent space back to the original data space.\n", + "\n", + "The encoder is typically a neural network that takes the input data and outputs the parameters of a probability distribution in the latent space. The decoder is another neural network that takes samples from this distribution and generates new data points.\n", + "\n", + "The training of a VAE involves maximizing the likelihood of the data under the model, which is done by minimizing a loss function that consists of two terms: the reconstruction loss and the KL divergence. The reconstruction loss measures how well the decoder can reconstruct the input data from the latent space, while the KL divergence measures how close the learned distribution is to a prior distribution." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Code Example\n", + "\n", + "Let's start with a simple implementation of a VAE using PyTorch." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "import torch\n", + "import torch.nn as nn\n", + "import torch.optim as optim\n", + "from torchvision import datasets, transforms\n", + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the Encoder\n", + "\n", + "The encoder network takes the input data and outputs the parameters of the latent distribution (mean and log variance)." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "class Encoder(nn.Module):\n", + " def __init__(self, input_dim, hidden_dim, latent_dim):\n", + " super(Encoder, self).__init__()\n", + " self.fc1 = nn.Linear(input_dim, hidden_dim)\n", + " self.fc2_mean = nn.Linear(hidden_dim, latent_dim)\n", + " self.fc2_log_var = nn.Linear(hidden_dim, latent_dim)\n", + " self.relu = nn.ReLU()\n", + " \n", + " def forward(self, x):\n", + " h = self.relu(self.fc1(x))\n", + " mean = self.fc2_mean(h)\n", + " log_var = self.fc2_log_var(h)\n", + " return mean, log_var" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the Decoder\n", + "\n", + "The decoder network takes samples from the latent distribution and generates new data points." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "class Decoder(nn.Module):\n", + " def __init__(self, latent_dim, hidden_dim, output_dim):\n", + " super(Decoder, self).__init__()\n", + " self.fc1 = nn.Linear(latent_dim, hidden_dim)\n", + " self.fc2 = nn.Linear(hidden_dim, output_dim)\n", + " self.relu = nn.ReLU()\n", + " self.sigmoid = nn.Sigmoid()\n", + " \n", + " def forward(self, z):\n", + " h = self.relu(self.fc1(z))\n", + " x_reconstructed = self.sigmoid(self.fc2(h))\n", + " return x_reconstructed" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the VAE\n", + "\n", + "The VAE model combines the encoder and decoder, and includes a sampling layer to sample from the latent distribution." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "class VAE(nn.Module):\n", + " def __init__(self, encoder, decoder):\n", + " super(VAE, self).__init__()\n", + " self.encoder = encoder\n", + " self.decoder = decoder\n", + " \n", + " def forward(self, x):\n", + " mean, log_var = self.encoder(x)\n", + " std = torch.exp(0.5 * log_var)\n", + " epsilon = torch.randn_like(std)\n", + " z = mean + std * epsilon\n", + " x_reconstructed = self.decoder(z)\n", + " return x_reconstructed, mean, log_var" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the Loss Function\n", + "\n", + "The loss function consists of the reconstruction loss and the KL divergence." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "def vae_loss(x, x_reconstructed, mean, log_var):\n", + " reconstruction_loss = nn.functional.binary_cross_entropy(x_reconstructed, x, reduction='sum')\n", + " kl_divergence = -0.5 * torch.sum(1 + log_var - mean.pow(2) - log_var.exp())\n", + " return reconstruction_loss + kl_divergence" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Train the VAE\n", + "\n", + "Let's train the VAE on a simple dataset, such as the MNIST dataset." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# Load the MNIST dataset\n", + "transform = transforms.Compose([transforms.ToTensor(), transforms.Lambda(lambda x: x.view(-1))])\n", + "train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)\n", + "test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)\n", + "train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)\n", + "test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)\n", + "\n", + "# Define the VAE model\n", + "input_dim = 28 * 28\n", + "hidden_dim = 256\n", + "latent_dim = 2\n", + "encoder = Encoder(input_dim, hidden_dim, latent_dim)\n", + "decoder = Decoder(latent_dim, hidden_dim, input_dim)\n", + "vae = VAE(encoder, decoder)\n", + "\n", + "# Define the optimizer\n", + "optimizer = optim.Adam(vae.parameters(), lr=1e-3)\n", + "\n", + "# Train the model\n", + "num_epochs = 10\n", + "for epoch in range(num_epochs):\n", + " vae.train()\n", + " train_loss = 0\n", + " for x, _ in train_loader:\n", + " optimizer.zero_grad()\n", + " x_reconstructed, mean, log_var = vae(x)\n", + " loss = vae_loss(x, x_reconstructed, mean, log_var)\n", + " loss.backward()\n", + " train_loss += loss.item()\n", + " optimizer.step()\n", + " print(f'Epoch {epoch + 1}, Loss: {train_loss / len(train_loader.dataset)}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize the Results\n", + "\n", + "Let's visualize the latent space learned by the VAE and generate some new data points." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# Encode the test data to the latent space\n", + "vae.eval()\n", + "with torch.no_grad():\n", + " z_mean, z_log_var = [], []\n", + " for x, _ in test_loader:\n", + " mean, log_var = vae.encoder(x)\n", + " z_mean.append(mean)\n", + " z_log_var.append(log_var)\n", + " z_mean = torch.cat(z_mean)\n", + " z_log_var = torch.cat(z_log_var)\n", + " z = z_mean + torch.exp(0.5 * z_log_var) * torch.randn_like(z_mean)\n", + "\n", + "# Plot the latent space\n", + "plt.figure(figsize=(8, 6))\n", + "plt.scatter(z[:, 0].numpy(), z[:, 1].numpy(), c='blue', alpha=0.5)\n", + "plt.xlabel('z1')\n", + "plt.ylabel('z2')\n", + "plt.title('Latent Space')\n", + "plt.show()\n", + "\n", + "# Generate new data points\n", + "z_new = torch.randn(10, latent_dim)\n", + "with torch.no_grad():\n", + " generated = vae.decoder(z_new)\n", + "\n", + "# Plot the generated data points\n", + "plt.figure(figsize=(10, 2))\n", + "for i in range(10):\n", + " plt.subplot(1, 10, i + 1)\n", + " plt.imshow(generated[i].view(28, 28).numpy(), cmap='gray')\n", + " plt.axis('off')\n", + "plt.show()" + ] + } + ] +} diff --git a/docs/90_variational_auto_encoders/02_vae_architecture.ipynb b/docs/90_variational_auto_encoders/02_vae_architecture.ipynb new file mode 100644 index 00000000..bf6740da --- /dev/null +++ b/docs/90_variational_auto_encoders/02_vae_architecture.ipynb @@ -0,0 +1,167 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# VAE Architecture\n", + "\n", + "In this notebook, we will delve into the architecture of Variational Auto-Encoders (VAEs). We will explain the components of a VAE, including the encoder and decoder, and provide code examples for building a VAE architecture using PyTorch." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Encoder\n", + "\n", + "The encoder is a neural network that takes the input data and maps it to a latent space. The output of the encoder is the parameters of a probability distribution in the latent space, typically the mean and log variance." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "import torch\n", + "import torch.nn as nn\n", + "\n", + "class Encoder(nn.Module):\n", + " def __init__(self, input_dim, hidden_dim, latent_dim):\n", + " super(Encoder, self).__init__()\n", + " self.fc1 = nn.Linear(input_dim, hidden_dim)\n", + " self.fc2_mean = nn.Linear(hidden_dim, latent_dim)\n", + " self.fc2_log_var = nn.Linear(hidden_dim, latent_dim)\n", + " self.relu = nn.ReLU()\n", + " \n", + " def forward(self, x):\n", + " h = self.relu(self.fc1(x))\n", + " mean = self.fc2_mean(h)\n", + " log_var = self.fc2_log_var(h)\n", + " return mean, log_var" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Decoder\n", + "\n", + "The decoder is a neural network that takes samples from the latent distribution and maps them back to the original data space. The output of the decoder is the reconstructed data." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "class Decoder(nn.Module):\n", + " def __init__(self, latent_dim, hidden_dim, output_dim):\n", + " super(Decoder, self).__init__()\n", + " self.fc1 = nn.Linear(latent_dim, hidden_dim)\n", + " self.fc2 = nn.Linear(hidden_dim, output_dim)\n", + " self.relu = nn.ReLU()\n", + " self.sigmoid = nn.Sigmoid()\n", + " \n", + " def forward(self, z):\n", + " h = self.relu(self.fc1(z))\n", + " x_reconstructed = self.sigmoid(self.fc2(h))\n", + " return x_reconstructed" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## VAE Model\n", + "\n", + "The VAE model combines the encoder and decoder, and includes a sampling layer to sample from the latent distribution." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "class VAE(nn.Module):\n", + " def __init__(self, encoder, decoder):\n", + " super(VAE, self).__init__()\n", + " self.encoder = encoder\n", + " self.decoder = decoder\n", + " \n", + " def forward(self, x):\n", + " mean, log_var = self.encoder(x)\n", + " std = torch.exp(0.5 * log_var)\n", + " epsilon = torch.randn_like(std)\n", + " z = mean + std * epsilon\n", + " x_reconstructed = self.decoder(z)\n", + " return x_reconstructed, mean, log_var" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Loss Function\n", + "\n", + "The loss function for a VAE consists of two terms: the reconstruction loss and the KL divergence. The reconstruction loss measures how well the decoder can reconstruct the input data from the latent space, while the KL divergence measures how close the learned distribution is to a prior distribution." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "def vae_loss(x, x_reconstructed, mean, log_var):\n", + " reconstruction_loss = nn.functional.binary_cross_entropy(x_reconstructed, x, reduction='sum')\n", + " kl_divergence = -0.5 * torch.sum(1 + log_var - mean.pow(2) - log_var.exp())\n", + " return reconstruction_loss + kl_divergence" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Training the VAE\n", + "\n", + "Let's train the VAE on a simple dataset, such as the MNIST dataset." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "from torchvision import datasets, transforms\n", + "import torch.optim as optim\n", + "\n", + "# Load the MNIST dataset\n", + "transform = transforms.Compose([transforms.ToTensor(), transforms.Lambda(lambda x: x.view(-1))])\n", + "train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)\n", + "test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)\n", + "train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)\n", + "test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)\n", + "\n", + "# Define the VAE model\n", + "input_dim = 28 * 28\n", + "hidden_dim = 256\n", + "latent_dim = 2\n", + "encoder = Encoder(input_dim, hidden_dim, latent_dim)\n", + "decoder = Decoder(latent_dim, hidden_dim, input_dim)\n", + "vae = VAE(encoder, decoder)\n", + "\n", + "# Define the optimizer\n", + "optimizer = optim.Adam(vae.parameters(), lr=1e-3)\n", + "\n", + "# Train the model\n", + "num_epochs = 10\n", + "for epoch in range(num_epochs):\n", + " vae.train()\n", + " train_loss = 0\n", + " for x, _ in train_loader:\n", + " optimizer.zero_grad()\n", + " x_reconstructed, mean, log_var = vae(x)\n", + " loss = vae_loss(x, x_reconstructed, mean, log_var)\n", + " loss.backward()\n", + " train_loss += loss.item()\n", + " optimizer.step()\n", + " print(f'Epoch {epoch + 1}, Loss: {train_loss / len(train_loader.dataset)}')" + ] + } + ] +} diff --git a/docs/90_variational_auto_encoders/03_vae_training.ipynb b/docs/90_variational_auto_encoders/03_vae_training.ipynb new file mode 100644 index 00000000..bc7d12c3 --- /dev/null +++ b/docs/90_variational_auto_encoders/03_vae_training.ipynb @@ -0,0 +1,228 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Training Variational Auto-Encoders (VAEs)\n", + "\n", + "In this notebook, we will focus on training Variational Auto-Encoders (VAEs). We will provide code examples for training VAEs and explain the loss functions and optimization techniques used in the training process." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Code Example\n", + "\n", + "Let's start with a simple implementation of a VAE using PyTorch." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "import torch\n", + "import torch.nn as nn\n", + "import torch.optim as optim\n", + "from torchvision import datasets, transforms\n", + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the Encoder\n", + "\n", + "The encoder network takes the input data and outputs the parameters of the latent distribution (mean and log variance)." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "class Encoder(nn.Module):\n", + " def __init__(self, input_dim, hidden_dim, latent_dim):\n", + " super(Encoder, self).__init__()\n", + " self.fc1 = nn.Linear(input_dim, hidden_dim)\n", + " self.fc2_mean = nn.Linear(hidden_dim, latent_dim)\n", + " self.fc2_log_var = nn.Linear(hidden_dim, latent_dim)\n", + " self.relu = nn.ReLU()\n", + " \n", + " def forward(self, x):\n", + " h = self.relu(self.fc1(x))\n", + " mean = self.fc2_mean(h)\n", + " log_var = self.fc2_log_var(h)\n", + " return mean, log_var" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the Decoder\n", + "\n", + "The decoder network takes samples from the latent distribution and generates new data points." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "class Decoder(nn.Module):\n", + " def __init__(self, latent_dim, hidden_dim, output_dim):\n", + " super(Decoder, self).__init__()\n", + " self.fc1 = nn.Linear(latent_dim, hidden_dim)\n", + " self.fc2 = nn.Linear(hidden_dim, output_dim)\n", + " self.relu = nn.ReLU()\n", + " self.sigmoid = nn.Sigmoid()\n", + " \n", + " def forward(self, z):\n", + " h = self.relu(self.fc1(z))\n", + " x_reconstructed = self.sigmoid(self.fc2(h))\n", + " return x_reconstructed" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the VAE\n", + "\n", + "The VAE model combines the encoder and decoder, and includes a sampling layer to sample from the latent distribution." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "class VAE(nn.Module):\n", + " def __init__(self, encoder, decoder):\n", + " super(VAE, self).__init__()\n", + " self.encoder = encoder\n", + " self.decoder = decoder\n", + " \n", + " def forward(self, x):\n", + " mean, log_var = self.encoder(x)\n", + " std = torch.exp(0.5 * log_var)\n", + " epsilon = torch.randn_like(std)\n", + " z = mean + std * epsilon\n", + " x_reconstructed = self.decoder(z)\n", + " return x_reconstructed, mean, log_var" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the Loss Function\n", + "\n", + "The loss function consists of the reconstruction loss and the KL divergence." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "def vae_loss(x, x_reconstructed, mean, log_var):\n", + " reconstruction_loss = nn.functional.binary_cross_entropy(x_reconstructed, x, reduction='sum')\n", + " kl_divergence = -0.5 * torch.sum(1 + log_var - mean.pow(2) - log_var.exp())\n", + " return reconstruction_loss + kl_divergence" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Train the VAE\n", + "\n", + "Let's train the VAE on a simple dataset, such as the MNIST dataset." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# Load the MNIST dataset\n", + "transform = transforms.Compose([transforms.ToTensor(), transforms.Lambda(lambda x: x.view(-1))])\n", + "train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)\n", + "test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)\n", + "train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)\n", + "test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)\n", + "\n", + "# Define the VAE model\n", + "input_dim = 28 * 28\n", + "hidden_dim = 256\n", + "latent_dim = 2\n", + "encoder = Encoder(input_dim, hidden_dim, latent_dim)\n", + "decoder = Decoder(latent_dim, hidden_dim, input_dim)\n", + "vae = VAE(encoder, decoder)\n", + "\n", + "# Define the optimizer\n", + "optimizer = optim.Adam(vae.parameters(), lr=1e-3)\n", + "\n", + "# Train the model\n", + "num_epochs = 10\n", + "for epoch in range(num_epochs):\n", + " vae.train()\n", + " train_loss = 0\n", + " for x, _ in train_loader:\n", + " optimizer.zero_grad()\n", + " x_reconstructed, mean, log_var = vae(x)\n", + " loss = vae_loss(x, x_reconstructed, mean, log_var)\n", + " loss.backward()\n", + " train_loss += loss.item()\n", + " optimizer.step()\n", + " print(f'Epoch {epoch + 1}, Loss: {train_loss / len(train_loader.dataset)}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize the Results\n", + "\n", + "Let's visualize the latent space learned by the VAE and generate some new data points." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# Encode the test data to the latent space\n", + "vae.eval()\n", + "with torch.no_grad():\n", + " z_mean, z_log_var = [], []\n", + " for x, _ in test_loader:\n", + " mean, log_var = vae.encoder(x)\n", + " z_mean.append(mean)\n", + " z_log_var.append(log_var)\n", + " z_mean = torch.cat(z_mean)\n", + " z_log_var = torch.cat(z_log_var)\n", + " z = z_mean + torch.exp(0.5 * z_log_var) * torch.randn_like(z_mean)\n", + "\n", + "# Plot the latent space\n", + "plt.figure(figsize=(8, 6))\n", + "plt.scatter(z[:, 0].numpy(), z[:, 1].numpy(), c='blue', alpha=0.5)\n", + "plt.xlabel('z1')\n", + "plt.ylabel('z2')\n", + "plt.title('Latent Space')\n", + "plt.show()\n", + "\n", + "# Generate new data points\n", + "z_new = torch.randn(10, latent_dim)\n", + "with torch.no_grad():\n", + " generated = vae.decoder(z_new)\n", + "\n", + "# Plot the generated data points\n", + "plt.figure(figsize=(10, 2))\n", + "for i in range(10):\n", + " plt.subplot(1, 10, i + 1)\n", + " plt.imshow(generated[i].view(28, 28).numpy(), cmap='gray')\n", + " plt.axis('off')\n", + "plt.show()" + ] + } + ] +} diff --git a/docs/90_variational_auto_encoders/04_vae_applications.ipynb b/docs/90_variational_auto_encoders/04_vae_applications.ipynb new file mode 100644 index 00000000..22f0d70c --- /dev/null +++ b/docs/90_variational_auto_encoders/04_vae_applications.ipynb @@ -0,0 +1,275 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Applications of Variational Auto-Encoders (VAEs)\n", + "\n", + "In this notebook, we will explore some practical applications of Variational Auto-Encoders (VAEs). We will provide examples of using VAEs for image generation and other practical applications." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Image Generation\n", + "\n", + "One of the most popular applications of VAEs is image generation. VAEs can be used to generate new images that are similar to the training data. Let's see how we can use a trained VAE to generate new images." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "import torch\n", + "from torch import nn, optim\n", + "from torchvision import datasets, transforms\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the Encoder\n", + "\n", + "The encoder network takes the input data and outputs the parameters of the latent distribution (mean and log variance)." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "class Encoder(nn.Module):\n", + " def __init__(self, input_dim, hidden_dim, latent_dim):\n", + " super(Encoder, self).__init__()\n", + " self.fc1 = nn.Linear(input_dim, hidden_dim)\n", + " self.fc2_mean = nn.Linear(hidden_dim, latent_dim)\n", + " self.fc2_log_var = nn.Linear(hidden_dim, latent_dim)\n", + " self.relu = nn.ReLU()\n", + " \n", + " def forward(self, x):\n", + " h = self.relu(self.fc1(x))\n", + " mean = self.fc2_mean(h)\n", + " log_var = self.fc2_log_var(h)\n", + " return mean, log_var" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the Decoder\n", + "\n", + "The decoder network takes samples from the latent distribution and generates new data points." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "class Decoder(nn.Module):\n", + " def __init__(self, latent_dim, hidden_dim, output_dim):\n", + " super(Decoder, self).__init__()\n", + " self.fc1 = nn.Linear(latent_dim, hidden_dim)\n", + " self.fc2 = nn.Linear(hidden_dim, output_dim)\n", + " self.relu = nn.ReLU()\n", + " self.sigmoid = nn.Sigmoid()\n", + " \n", + " def forward(self, z):\n", + " h = self.relu(self.fc1(z))\n", + " x_reconstructed = self.sigmoid(self.fc2(h))\n", + " return x_reconstructed" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the VAE\n", + "\n", + "The VAE model combines the encoder and decoder, and includes a sampling layer to sample from the latent distribution." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "class VAE(nn.Module):\n", + " def __init__(self, encoder, decoder):\n", + " super(VAE, self).__init__()\n", + " self.encoder = encoder\n", + " self.decoder = decoder\n", + " \n", + " def forward(self, x):\n", + " mean, log_var = self.encoder(x)\n", + " std = torch.exp(0.5 * log_var)\n", + " epsilon = torch.randn_like(std)\n", + " z = mean + std * epsilon\n", + " x_reconstructed = self.decoder(z)\n", + " return x_reconstructed, mean, log_var" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the Loss Function\n", + "\n", + "The loss function consists of the reconstruction loss and the KL divergence." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "def vae_loss(x, x_reconstructed, mean, log_var):\n", + " reconstruction_loss = nn.functional.binary_cross_entropy(x_reconstructed, x, reduction='sum')\n", + " kl_divergence = -0.5 * torch.sum(1 + log_var - mean.pow(2) - log_var.exp())\n", + " return reconstruction_loss + kl_divergence" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Train the VAE\n", + "\n", + "Let's train the VAE on a simple dataset, such as the MNIST dataset." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# Load the MNIST dataset\n", + "transform = transforms.Compose([transforms.ToTensor(), transforms.Lambda(lambda x: x.view(-1))])\n", + "train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)\n", + "test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)\n", + "train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)\n", + "test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)\n", + "\n", + "# Define the VAE model\n", + "input_dim = 28 * 28\n", + "hidden_dim = 256\n", + "latent_dim = 2\n", + "encoder = Encoder(input_dim, hidden_dim, latent_dim)\n", + "decoder = Decoder(latent_dim, hidden_dim, input_dim)\n", + "vae = VAE(encoder, decoder)\n", + "\n", + "# Define the optimizer\n", + "optimizer = optim.Adam(vae.parameters(), lr=1e-3)\n", + "\n", + "# Train the model\n", + "num_epochs = 10\n", + "for epoch in range(num_epochs):\n", + " vae.train()\n", + " train_loss = 0\n", + " for x, _ in train_loader:\n", + " optimizer.zero_grad()\n", + " x_reconstructed, mean, log_var = vae(x)\n", + " loss = vae_loss(x, x_reconstructed, mean, log_var)\n", + " loss.backward()\n", + " train_loss += loss.item()\n", + " optimizer.step()\n", + " print(f'Epoch {epoch + 1}, Loss: {train_loss / len(train_loader.dataset)}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Generate New Images\n", + "\n", + "Now that we have trained the VAE, we can use it to generate new images." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# Generate new data points\n", + "z_new = torch.randn(10, latent_dim)\n", + "with torch.no_grad():\n", + " generated = vae.decoder(z_new)\n", + "\n", + "# Plot the generated data points\n", + "plt.figure(figsize=(10, 2))\n", + "for i in range(10):\n", + " plt.subplot(1, 10, i + 1)\n", + " plt.imshow(generated[i].view(28, 28).numpy(), cmap='gray')\n", + " plt.axis('off')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Anomaly Detection\n", + "\n", + "VAEs can also be used for anomaly detection. By training a VAE on normal data, we can use the reconstruction error to detect anomalies. Data points with high reconstruction error are likely to be anomalies." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# Calculate reconstruction error for test data\n", + "vae.eval()\n", + "with torch.no_grad():\n", + " reconstructed, _, _ = vae(torch.tensor(x_test, dtype=torch.float32))\n", + " reconstruction_error = torch.mean((torch.tensor(x_test, dtype=torch.float32) - reconstructed) ** 2, axis=1)\n", + "\n", + "# Plot reconstruction error\n", + "plt.figure(figsize=(8, 6))\n", + "plt.hist(reconstruction_error.numpy(), bins=50)\n", + "plt.xlabel('Reconstruction Error')\n", + "plt.ylabel('Frequency')\n", + "plt.title('Reconstruction Error Histogram')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Data Compression\n", + "\n", + "VAEs can be used for data compression by encoding the data into a lower-dimensional latent space. The latent representation can then be used to reconstruct the original data." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# Encode the test data to the latent space\n", + "vae.eval()\n", + "with torch.no_grad():\n", + " mean, log_var = vae.encoder(torch.tensor(x_test, dtype=torch.float32))\n", + " std = torch.exp(0.5 * log_var)\n", + " z = mean + std * torch.randn_like(std)\n", + "\n", + "# Decode the latent representation to reconstruct the data\n", + "with torch.no_grad():\n", + " reconstructed = vae.decoder(z)\n", + "\n", + "# Plot original and reconstructed data\n", + "plt.figure(figsize=(10, 4))\n", + "for i in range(10):\n", + " # Original data\n", + " plt.subplot(2, 10, i + 1)\n", + " plt.imshow(x_test[i].reshape(28, 28), cmap='gray')\n", + " plt.axis('off')\n", + " \n", + " # Reconstructed data\n", + " plt.subplot(2, 10, i + 11)\n", + " plt.imshow(reconstructed[i].view(28, 28).numpy(), cmap='gray')\n", + " plt.axis('off')\n", + "plt.show()" + ] + } + ] +} diff --git a/docs/_toc.yml b/docs/_toc.yml index 24fb3f8f..04061f49 100644 --- a/docs/_toc.yml +++ b/docs/_toc.yml @@ -118,6 +118,14 @@ parts: - file: 80_benchmarking_llms/30_measuring_executability.ipynb - file: 80_benchmarking_llms/40_summarize_error_messages.ipynb + + - caption: Variational Auto-Encoders + chapters: + - file: 90_variational_auto_encoders/01_intro_to_vae.ipynb + - file: 90_variational_auto_encoders/02_vae_architecture.ipynb + - file: 90_variational_auto_encoders/03_vae_training.ipynb + - file: 90_variational_auto_encoders/04_vae_applications.ipynb + - caption: Links chapters: diff --git a/docs/intro.md b/docs/intro.md index 7f416652..83561a3b 100644 --- a/docs/intro.md +++ b/docs/intro.md @@ -14,6 +14,7 @@ The notebook collection aims covering these topics: * Prompt Engineering * Retrieval-augmented-generation * Model fine-tuning +* Variational Auto-Encoders (VAEs) ## Covered Python libraries and software From 748443b527528e641e772854b35a333365b0b0b8 Mon Sep 17 00:00:00 2001 From: Robert Haase Date: Mon, 12 Aug 2024 10:45:59 +0200 Subject: [PATCH 2/4] fixing JSON format using claude and gpt4o --- .../01_intro_to_vae.ipynb | 44 +- .../02_vae_architecture.ipynb | 353 ++++++----- .../03_vae_training.ipynb | 495 ++++++++------- .../04_vae_applications.ipynb | 594 ++++++++++-------- 4 files changed, 820 insertions(+), 666 deletions(-) diff --git a/docs/90_variational_auto_encoders/01_intro_to_vae.ipynb b/docs/90_variational_auto_encoders/01_intro_to_vae.ipynb index 8820a35d..e6fcfa4c 100644 --- a/docs/90_variational_auto_encoders/01_intro_to_vae.ipynb +++ b/docs/90_variational_auto_encoders/01_intro_to_vae.ipynb @@ -1,5 +1,12 @@ { "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
This notebook may contain text, code and images generated by artificial intelligence. Used model: claude-3-5-sonnet-20240620, vision model: claude-3-5-sonnet-20240620, endpoint: None, bia-bob version: 0.21.1.. It is good scientific practice to check the code and results it produces carefully. Read more about code generation using bia-bob
" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -44,7 +51,9 @@ }, { "cell_type": "code", + "execution_count": null, "metadata": {}, + "outputs": [], "source": [ "import torch\n", "import torch.nn as nn\n", @@ -64,7 +73,9 @@ }, { "cell_type": "code", + "execution_count": null, "metadata": {}, + "outputs": [], "source": [ "class Encoder(nn.Module):\n", " def __init__(self, input_dim, hidden_dim, latent_dim):\n", @@ -92,7 +103,9 @@ }, { "cell_type": "code", + "execution_count": null, "metadata": {}, + "outputs": [], "source": [ "class Decoder(nn.Module):\n", " def __init__(self, latent_dim, hidden_dim, output_dim):\n", @@ -119,7 +132,9 @@ }, { "cell_type": "code", + "execution_count": null, "metadata": {}, + "outputs": [], "source": [ "class VAE(nn.Module):\n", " def __init__(self, encoder, decoder):\n", @@ -147,7 +162,9 @@ }, { "cell_type": "code", + "execution_count": null, "metadata": {}, + "outputs": [], "source": [ "def vae_loss(x, x_reconstructed, mean, log_var):\n", " reconstruction_loss = nn.functional.binary_cross_entropy(x_reconstructed, x, reduction='sum')\n", @@ -166,7 +183,9 @@ }, { "cell_type": "code", + "execution_count": null, "metadata": {}, + "outputs": [], "source": [ "# Load the MNIST dataset\n", "transform = transforms.Compose([transforms.ToTensor(), transforms.Lambda(lambda x: x.view(-1))])\n", @@ -212,7 +231,9 @@ }, { "cell_type": "code", + "execution_count": null, "metadata": {}, + "outputs": [], "source": [ "# Encode the test data to the latent space\n", "vae.eval()\n", @@ -248,5 +269,26 @@ "plt.show()" ] } - ] + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 } diff --git a/docs/90_variational_auto_encoders/02_vae_architecture.ipynb b/docs/90_variational_auto_encoders/02_vae_architecture.ipynb index bf6740da..d611d683 100644 --- a/docs/90_variational_auto_encoders/02_vae_architecture.ipynb +++ b/docs/90_variational_auto_encoders/02_vae_architecture.ipynb @@ -1,167 +1,188 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# VAE Architecture\n", - "\n", - "In this notebook, we will delve into the architecture of Variational Auto-Encoders (VAEs). We will explain the components of a VAE, including the encoder and decoder, and provide code examples for building a VAE architecture using PyTorch." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Encoder\n", - "\n", - "The encoder is a neural network that takes the input data and maps it to a latent space. The output of the encoder is the parameters of a probability distribution in the latent space, typically the mean and log variance." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "import torch\n", - "import torch.nn as nn\n", - "\n", - "class Encoder(nn.Module):\n", - " def __init__(self, input_dim, hidden_dim, latent_dim):\n", - " super(Encoder, self).__init__()\n", - " self.fc1 = nn.Linear(input_dim, hidden_dim)\n", - " self.fc2_mean = nn.Linear(hidden_dim, latent_dim)\n", - " self.fc2_log_var = nn.Linear(hidden_dim, latent_dim)\n", - " self.relu = nn.ReLU()\n", - " \n", - " def forward(self, x):\n", - " h = self.relu(self.fc1(x))\n", - " mean = self.fc2_mean(h)\n", - " log_var = self.fc2_log_var(h)\n", - " return mean, log_var" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Decoder\n", - "\n", - "The decoder is a neural network that takes samples from the latent distribution and maps them back to the original data space. The output of the decoder is the reconstructed data." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "class Decoder(nn.Module):\n", - " def __init__(self, latent_dim, hidden_dim, output_dim):\n", - " super(Decoder, self).__init__()\n", - " self.fc1 = nn.Linear(latent_dim, hidden_dim)\n", - " self.fc2 = nn.Linear(hidden_dim, output_dim)\n", - " self.relu = nn.ReLU()\n", - " self.sigmoid = nn.Sigmoid()\n", - " \n", - " def forward(self, z):\n", - " h = self.relu(self.fc1(z))\n", - " x_reconstructed = self.sigmoid(self.fc2(h))\n", - " return x_reconstructed" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## VAE Model\n", - "\n", - "The VAE model combines the encoder and decoder, and includes a sampling layer to sample from the latent distribution." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "class VAE(nn.Module):\n", - " def __init__(self, encoder, decoder):\n", - " super(VAE, self).__init__()\n", - " self.encoder = encoder\n", - " self.decoder = decoder\n", - " \n", - " def forward(self, x):\n", - " mean, log_var = self.encoder(x)\n", - " std = torch.exp(0.5 * log_var)\n", - " epsilon = torch.randn_like(std)\n", - " z = mean + std * epsilon\n", - " x_reconstructed = self.decoder(z)\n", - " return x_reconstructed, mean, log_var" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Loss Function\n", - "\n", - "The loss function for a VAE consists of two terms: the reconstruction loss and the KL divergence. The reconstruction loss measures how well the decoder can reconstruct the input data from the latent space, while the KL divergence measures how close the learned distribution is to a prior distribution." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "def vae_loss(x, x_reconstructed, mean, log_var):\n", - " reconstruction_loss = nn.functional.binary_cross_entropy(x_reconstructed, x, reduction='sum')\n", - " kl_divergence = -0.5 * torch.sum(1 + log_var - mean.pow(2) - log_var.exp())\n", - " return reconstruction_loss + kl_divergence" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Training the VAE\n", - "\n", - "Let's train the VAE on a simple dataset, such as the MNIST dataset." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "from torchvision import datasets, transforms\n", - "import torch.optim as optim\n", - "\n", - "# Load the MNIST dataset\n", - "transform = transforms.Compose([transforms.ToTensor(), transforms.Lambda(lambda x: x.view(-1))])\n", - "train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)\n", - "test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)\n", - "train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)\n", - "test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)\n", - "\n", - "# Define the VAE model\n", - "input_dim = 28 * 28\n", - "hidden_dim = 256\n", - "latent_dim = 2\n", - "encoder = Encoder(input_dim, hidden_dim, latent_dim)\n", - "decoder = Decoder(latent_dim, hidden_dim, input_dim)\n", - "vae = VAE(encoder, decoder)\n", - "\n", - "# Define the optimizer\n", - "optimizer = optim.Adam(vae.parameters(), lr=1e-3)\n", - "\n", - "# Train the model\n", - "num_epochs = 10\n", - "for epoch in range(num_epochs):\n", - " vae.train()\n", - " train_loss = 0\n", - " for x, _ in train_loader:\n", - " optimizer.zero_grad()\n", - " x_reconstructed, mean, log_var = vae(x)\n", - " loss = vae_loss(x, x_reconstructed, mean, log_var)\n", - " loss.backward()\n", - " train_loss += loss.item()\n", - " optimizer.step()\n", - " print(f'Epoch {epoch + 1}, Loss: {train_loss / len(train_loader.dataset)}')" - ] - } - ] -} + "cells": [ + { + "cell_type": "markdown", + "id": "directed-compensation", + "metadata": {}, + "source": [ + "
This notebook may contain text, code and images generated by artificial intelligence. Used model: gpt-4o-2024-08-06, vision model: gpt-4o-2024-08-06, endpoint: None, bia-bob version: 0.21.1.. It is good scientific practice to check the code and results it produces carefully. Read more about code generation using bia-bob
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# VAE Architecture\n", + "\n", + "In this notebook, we will delve into the architecture of Variational Auto-Encoders (VAEs). We will explain the components of a VAE, including the encoder and decoder, and provide code examples for building a VAE architecture using PyTorch." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Encoder\n", + "\n", + "The encoder is a neural network that takes the input data and maps it to a latent space. The output of the encoder is the parameters of a probability distribution in the latent space, typically the mean and log variance." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "import torch\n", + "import torch.nn as nn\n", + "\n", + "class Encoder(nn.Module):\n", + " def __init__(self, input_dim, hidden_dim, latent_dim):\n", + " super(Encoder, self).__init__()\n", + " self.fc1 = nn.Linear(input_dim, hidden_dim)\n", + " self.fc2_mean = nn.Linear(hidden_dim, latent_dim)\n", + " self.fc2_log_var = nn.Linear(hidden_dim, latent_dim)\n", + " self.relu = nn.ReLU()\n", + " \n", + " def forward(self, x):\n", + " h = self.relu(self.fc1(x))\n", + " mean = self.fc2_mean(h)\n", + " log_var = self.fc2_log_var(h)\n", + " return mean, log_var" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Decoder\n", + "\n", + "The decoder is a neural network that takes samples from the latent distribution and maps them back to the original data space. The output of the decoder is the reconstructed data." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "class Decoder(nn.Module):\n", + " def __init__(self, latent_dim, hidden_dim, output_dim):\n", + " super(Decoder, self).__init__()\n", + " self.fc1 = nn.Linear(latent_dim, hidden_dim)\n", + " self.fc2 = nn.Linear(hidden_dim, output_dim)\n", + " self.relu = nn.ReLU()\n", + " self.sigmoid = nn.Sigmoid()\n", + " \n", + " def forward(self, z):\n", + " h = self.relu(self.fc1(z))\n", + " x_reconstructed = self.sigmoid(self.fc2(h))\n", + " return x_reconstructed" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## VAE Model\n", + "\n", + "The VAE model combines the encoder and decoder, and includes a sampling layer to sample from the latent distribution." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "class VAE(nn.Module):\n", + " def __init__(self, encoder, decoder):\n", + " super(VAE, self).__init__()\n", + " self.encoder = encoder\n", + " self.decoder = decoder\n", + " \n", + " def forward(self, x):\n", + " mean, log_var = self.encoder(x)\n", + " std = torch.exp(0.5 * log_var)\n", + " epsilon = torch.randn_like(std)\n", + " z = mean + std * epsilon\n", + " x_reconstructed = self.decoder(z)\n", + " return x_reconstructed, mean, log_var" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Loss Function\n", + "\n", + "The loss function for a VAE consists of two terms: the reconstruction loss and the KL divergence. The reconstruction loss measures how well the decoder can reconstruct the input data from the latent space, while the KL divergence measures how close the learned distribution is to a prior distribution." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "def vae_loss(x, x_reconstructed, mean, log_var):\n", + " reconstruction_loss = nn.functional.binary_cross_entropy(x_reconstructed, x, reduction='sum')\n", + " kl_divergence = -0.5 * torch.sum(1 + log_var - mean.pow(2) - log_var.exp())\n", + " return reconstruction_loss + kl_divergence" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Training the VAE\n", + "\n", + "Let's train the VAE on a simple dataset, such as the MNIST dataset." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "from torchvision import datasets, transforms\n", + "import torch.optim as optim\n", + "\n", + "# Load the MNIST dataset\n", + "transform = transforms.Compose([transforms.ToTensor(), transforms.Lambda(lambda x: x.view(-1))])\n", + "train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)\n", + "test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)\n", + "train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)\n", + "test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)\n", + "\n", + "# Define the VAE model\n", + "input_dim = 28 * 28\n", + "hidden_dim = 256\n", + "latent_dim = 2\n", + "encoder = Encoder(input_dim, hidden_dim, latent_dim)\n", + "decoder = Decoder(latent_dim, hidden_dim, input_dim)\n", + "vae = VAE(encoder, decoder)\n", + "\n", + "# Define the optimizer\n", + "optimizer = optim.Adam(vae.parameters(), lr=1e-3)\n", + "\n", + "# Train the model\n", + "num_epochs = 10\n", + "for epoch in range(num_epochs):\n", + " vae.train()\n", + " train_loss = 0\n", + " for x, _ in train_loader:\n", + " optimizer.zero_grad()\n", + " x_reconstructed, mean, log_var = vae(x)\n", + " loss = vae_loss(x, x_reconstructed, mean, log_var)\n", + " loss.backward()\n", + " train_loss += loss.item()\n", + " optimizer.step()\n", + " print(f'Epoch {epoch + 1}, Loss: {train_loss / len(train_loader.dataset)}')" + ], + "outputs": [], + "execution_count": null + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/docs/90_variational_auto_encoders/03_vae_training.ipynb b/docs/90_variational_auto_encoders/03_vae_training.ipynb index bc7d12c3..76c7ad1e 100644 --- a/docs/90_variational_auto_encoders/03_vae_training.ipynb +++ b/docs/90_variational_auto_encoders/03_vae_training.ipynb @@ -1,228 +1,271 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Training Variational Auto-Encoders (VAEs)\n", - "\n", - "In this notebook, we will focus on training Variational Auto-Encoders (VAEs). We will provide code examples for training VAEs and explain the loss functions and optimization techniques used in the training process." - ] + "cells": [ + { + "cell_type": "markdown", + "id": "directed-compensation", + "metadata": {}, + "source": [ + "
This notebook may contain text, code and images generated by artificial intelligence. Used model: gpt-4o-2024-08-06, vision model: gpt-4o-2024-08-06, endpoint: None, bia-bob version: 0.21.1.. It is good scientific practice to check the code and results it produces carefully. Read more about code generation using bia-bob
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Training Variational Auto-Encoders (VAEs)\n", + "\n", + "In this notebook, we will focus on training Variational Auto-Encoders (VAEs). We will provide code examples for training VAEs and explain the loss functions and optimization techniques used in the training process." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Code Example\n", + "\n", + "Let's start with a simple implementation of a VAE using PyTorch." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "import torch\n", + "import torch.nn as nn\n", + "import torch.optim as optim\n", + "from torchvision import datasets, transforms\n", + "import matplotlib.pyplot as plt" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the Encoder\n", + "\n", + "The encoder network takes the input data and outputs the parameters of the latent distribution (mean and log variance)." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "class Encoder(nn.Module):\n", + " def __init__(self, input_dim, hidden_dim, latent_dim):\n", + " super(Encoder, self).__init__()\n", + " self.fc1 = nn.Linear(input_dim, hidden_dim)\n", + " self.fc2_mean = nn.Linear(hidden_dim, latent_dim)\n", + " self.fc2_log_var = nn.Linear(hidden_dim, latent_dim)\n", + " self.relu = nn.ReLU()\n", + " \n", + " def forward(self, x):\n", + " h = self.relu(self.fc1(x))\n", + " mean = self.fc2_mean(h)\n", + " log_var = self.fc2_log_var(h)\n", + " return mean, log_var" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the Decoder\n", + "\n", + "The decoder network takes samples from the latent distribution and generates new data points." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "class Decoder(nn.Module):\n", + " def __init__(self, latent_dim, hidden_dim, output_dim):\n", + " super(Decoder, self).__init__()\n", + " self.fc1 = nn.Linear(latent_dim, hidden_dim)\n", + " self.fc2 = nn.Linear(hidden_dim, output_dim)\n", + " self.relu = nn.ReLU()\n", + " self.sigmoid = nn.Sigmoid()\n", + " \n", + " def forward(self, z):\n", + " h = self.relu(self.fc1(z))\n", + " x_reconstructed = self.sigmoid(self.fc2(h))\n", + " return x_reconstructed" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the VAE\n", + "\n", + "The VAE model combines the encoder and decoder, and includes a sampling layer to sample from the latent distribution." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "class VAE(nn.Module):\n", + " def __init__(self, encoder, decoder):\n", + " super(VAE, self).__init__()\n", + " self.encoder = encoder\n", + " self.decoder = decoder\n", + " \n", + " def forward(self, x):\n", + " mean, log_var = self.encoder(x)\n", + " std = torch.exp(0.5 * log_var)\n", + " epsilon = torch.randn_like(std)\n", + " z = mean + std * epsilon\n", + " x_reconstructed = self.decoder(z)\n", + " return x_reconstructed, mean, log_var" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the Loss Function\n", + "\n", + "The loss function consists of the reconstruction loss and the KL divergence." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "def vae_loss(x, x_reconstructed, mean, log_var):\n", + " reconstruction_loss = nn.functional.binary_cross_entropy(x_reconstructed, x, reduction='sum')\n", + " kl_divergence = -0.5 * torch.sum(1 + log_var - mean.pow(2) - log_var.exp())\n", + " return reconstruction_loss + kl_divergence" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Train the VAE\n", + "\n", + "Let's train the VAE on a simple dataset, such as the MNIST dataset." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# Load the MNIST dataset\n", + "transform = transforms.Compose([transforms.ToTensor(), transforms.Lambda(lambda x: x.view(-1))])\n", + "train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)\n", + "test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)\n", + "train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)\n", + "test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)\n", + "\n", + "# Define the VAE model\n", + "input_dim = 28 * 28\n", + "hidden_dim = 256\n", + "latent_dim = 2\n", + "encoder = Encoder(input_dim, hidden_dim, latent_dim)\n", + "decoder = Decoder(latent_dim, hidden_dim, input_dim)\n", + "vae = VAE(encoder, decoder)\n", + "\n", + "# Define the optimizer\n", + "optimizer = optim.Adam(vae.parameters(), lr=1e-3)\n", + "\n", + "# Train the model\n", + "num_epochs = 10\n", + "for epoch in range(num_epochs):\n", + " vae.train()\n", + " train_loss = 0\n", + " for x, _ in train_loader:\n", + " optimizer.zero_grad()\n", + " x_reconstructed, mean, log_var = vae(x)\n", + " loss = vae_loss(x, x_reconstructed, mean, log_var)\n", + " loss.backward()\n", + " train_loss += loss.item()\n", + " optimizer.step()\n", + " print(f'Epoch {epoch + 1}, Loss: {train_loss / len(train_loader.dataset)}')" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize the Results\n", + "\n", + "Let's visualize the latent space learned by the VAE and generate some new data points." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# Encode the test data to the latent space\n", + "vae.eval()\n", + "with torch.no_grad():\n", + " z_mean, z_log_var = [], []\n", + " for x, _ in test_loader:\n", + " mean, log_var = vae.encoder(x)\n", + " z_mean.append(mean)\n", + " z_log_var.append(log_var)\n", + " z_mean = torch.cat(z_mean)\n", + " z_log_var = torch.cat(z_log_var)\n", + " z = z_mean + torch.exp(0.5 * z_log_var) * torch.randn_like(z_mean)\n", + "\n", + "# Plot the latent space\n", + "plt.figure(figsize=(8, 6))\n", + "plt.scatter(z[:, 0].numpy(), z[:, 1].numpy(), c='blue', alpha=0.5)\n", + "plt.xlabel('z1')\n", + "plt.ylabel('z2')\n", + "plt.title('Latent Space')\n", + "plt.show()\n", + "\n", + "# Generate new data points\n", + "z_new = torch.randn(10, latent_dim)\n", + "with torch.no_grad():\n", + " generated = vae.decoder(z_new)\n", + "\n", + "# Plot the generated data points\n", + "plt.figure(figsize=(10, 2))\n", + "for i in range(10):\n", + " plt.subplot(1, 10, i + 1)\n", + " plt.imshow(generated[i].view(28, 28).numpy(), cmap='gray')\n", + " plt.axis('off')\n", + "plt.show()" + ], + "outputs": [], + "execution_count": null + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.x" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Code Example\n", - "\n", - "Let's start with a simple implementation of a VAE using PyTorch." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "import torch\n", - "import torch.nn as nn\n", - "import torch.optim as optim\n", - "from torchvision import datasets, transforms\n", - "import matplotlib.pyplot as plt" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the Encoder\n", - "\n", - "The encoder network takes the input data and outputs the parameters of the latent distribution (mean and log variance)." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "class Encoder(nn.Module):\n", - " def __init__(self, input_dim, hidden_dim, latent_dim):\n", - " super(Encoder, self).__init__()\n", - " self.fc1 = nn.Linear(input_dim, hidden_dim)\n", - " self.fc2_mean = nn.Linear(hidden_dim, latent_dim)\n", - " self.fc2_log_var = nn.Linear(hidden_dim, latent_dim)\n", - " self.relu = nn.ReLU()\n", - " \n", - " def forward(self, x):\n", - " h = self.relu(self.fc1(x))\n", - " mean = self.fc2_mean(h)\n", - " log_var = self.fc2_log_var(h)\n", - " return mean, log_var" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the Decoder\n", - "\n", - "The decoder network takes samples from the latent distribution and generates new data points." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "class Decoder(nn.Module):\n", - " def __init__(self, latent_dim, hidden_dim, output_dim):\n", - " super(Decoder, self).__init__()\n", - " self.fc1 = nn.Linear(latent_dim, hidden_dim)\n", - " self.fc2 = nn.Linear(hidden_dim, output_dim)\n", - " self.relu = nn.ReLU()\n", - " self.sigmoid = nn.Sigmoid()\n", - " \n", - " def forward(self, z):\n", - " h = self.relu(self.fc1(z))\n", - " x_reconstructed = self.sigmoid(self.fc2(h))\n", - " return x_reconstructed" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the VAE\n", - "\n", - "The VAE model combines the encoder and decoder, and includes a sampling layer to sample from the latent distribution." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "class VAE(nn.Module):\n", - " def __init__(self, encoder, decoder):\n", - " super(VAE, self).__init__()\n", - " self.encoder = encoder\n", - " self.decoder = decoder\n", - " \n", - " def forward(self, x):\n", - " mean, log_var = self.encoder(x)\n", - " std = torch.exp(0.5 * log_var)\n", - " epsilon = torch.randn_like(std)\n", - " z = mean + std * epsilon\n", - " x_reconstructed = self.decoder(z)\n", - " return x_reconstructed, mean, log_var" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the Loss Function\n", - "\n", - "The loss function consists of the reconstruction loss and the KL divergence." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "def vae_loss(x, x_reconstructed, mean, log_var):\n", - " reconstruction_loss = nn.functional.binary_cross_entropy(x_reconstructed, x, reduction='sum')\n", - " kl_divergence = -0.5 * torch.sum(1 + log_var - mean.pow(2) - log_var.exp())\n", - " return reconstruction_loss + kl_divergence" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Train the VAE\n", - "\n", - "Let's train the VAE on a simple dataset, such as the MNIST dataset." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "# Load the MNIST dataset\n", - "transform = transforms.Compose([transforms.ToTensor(), transforms.Lambda(lambda x: x.view(-1))])\n", - "train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)\n", - "test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)\n", - "train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)\n", - "test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)\n", - "\n", - "# Define the VAE model\n", - "input_dim = 28 * 28\n", - "hidden_dim = 256\n", - "latent_dim = 2\n", - "encoder = Encoder(input_dim, hidden_dim, latent_dim)\n", - "decoder = Decoder(latent_dim, hidden_dim, input_dim)\n", - "vae = VAE(encoder, decoder)\n", - "\n", - "# Define the optimizer\n", - "optimizer = optim.Adam(vae.parameters(), lr=1e-3)\n", - "\n", - "# Train the model\n", - "num_epochs = 10\n", - "for epoch in range(num_epochs):\n", - " vae.train()\n", - " train_loss = 0\n", - " for x, _ in train_loader:\n", - " optimizer.zero_grad()\n", - " x_reconstructed, mean, log_var = vae(x)\n", - " loss = vae_loss(x, x_reconstructed, mean, log_var)\n", - " loss.backward()\n", - " train_loss += loss.item()\n", - " optimizer.step()\n", - " print(f'Epoch {epoch + 1}, Loss: {train_loss / len(train_loader.dataset)}')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Visualize the Results\n", - "\n", - "Let's visualize the latent space learned by the VAE and generate some new data points." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "# Encode the test data to the latent space\n", - "vae.eval()\n", - "with torch.no_grad():\n", - " z_mean, z_log_var = [], []\n", - " for x, _ in test_loader:\n", - " mean, log_var = vae.encoder(x)\n", - " z_mean.append(mean)\n", - " z_log_var.append(log_var)\n", - " z_mean = torch.cat(z_mean)\n", - " z_log_var = torch.cat(z_log_var)\n", - " z = z_mean + torch.exp(0.5 * z_log_var) * torch.randn_like(z_mean)\n", - "\n", - "# Plot the latent space\n", - "plt.figure(figsize=(8, 6))\n", - "plt.scatter(z[:, 0].numpy(), z[:, 1].numpy(), c='blue', alpha=0.5)\n", - "plt.xlabel('z1')\n", - "plt.ylabel('z2')\n", - "plt.title('Latent Space')\n", - "plt.show()\n", - "\n", - "# Generate new data points\n", - "z_new = torch.randn(10, latent_dim)\n", - "with torch.no_grad():\n", - " generated = vae.decoder(z_new)\n", - "\n", - "# Plot the generated data points\n", - "plt.figure(figsize=(10, 2))\n", - "for i in range(10):\n", - " plt.subplot(1, 10, i + 1)\n", - " plt.imshow(generated[i].view(28, 28).numpy(), cmap='gray')\n", - " plt.axis('off')\n", - "plt.show()" - ] - } - ] -} + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/docs/90_variational_auto_encoders/04_vae_applications.ipynb b/docs/90_variational_auto_encoders/04_vae_applications.ipynb index 22f0d70c..6119b61c 100644 --- a/docs/90_variational_auto_encoders/04_vae_applications.ipynb +++ b/docs/90_variational_auto_encoders/04_vae_applications.ipynb @@ -1,275 +1,323 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Applications of Variational Auto-Encoders (VAEs)\n", - "\n", - "In this notebook, we will explore some practical applications of Variational Auto-Encoders (VAEs). We will provide examples of using VAEs for image generation and other practical applications." - ] + "cells": [ + { + "cell_type": "markdown", + "id": "directed-compensation", + "metadata": {}, + "source": [ + "
This notebook may contain text, code and images generated by artificial intelligence. Used model: gpt-4o-2024-08-06, vision model: gpt-4o-2024-08-06, endpoint: None, bia-bob version: 0.21.1.. It is good scientific practice to check the code and results it produces carefully. Read more about code generation using bia-bob
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Applications of Variational Auto-Encoders (VAEs)\n", + "\n", + "In this notebook, we will explore some practical applications of Variational Auto-Encoders (VAEs). We will provide examples of using VAEs for image generation and other practical applications." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Image Generation\n", + "\n", + "One of the most popular applications of VAEs is image generation. VAEs can be used to generate new images that are similar to the training data. Let's see how we can use a trained VAE to generate new images." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "import torch\n", + "from torch import nn, optim\n", + "from torchvision import datasets, transforms\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the Encoder\n", + "\n", + "The encoder network takes the input data and outputs the parameters of the latent distribution (mean and log variance)." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "class Encoder(nn.Module):\n", + " def __init__(self, input_dim, hidden_dim, latent_dim):\n", + " super(Encoder, self).__init__()\n", + " self.fc1 = nn.Linear(input_dim, hidden_dim)\n", + " self.fc2_mean = nn.Linear(hidden_dim, latent_dim)\n", + " self.fc2_log_var = nn.Linear(hidden_dim, latent_dim)\n", + " self.relu = nn.ReLU()\n", + " \n", + " def forward(self, x):\n", + " h = self.relu(self.fc1(x))\n", + " mean = self.fc2_mean(h)\n", + " log_var = self.fc2_log_var(h)\n", + " return mean, log_var" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the Decoder\n", + "\n", + "The decoder network takes samples from the latent distribution and generates new data points." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "class Decoder(nn.Module):\n", + " def __init__(self, latent_dim, hidden_dim, output_dim):\n", + " super(Decoder, self).__init__()\n", + " self.fc1 = nn.Linear(latent_dim, hidden_dim)\n", + " self.fc2 = nn.Linear(hidden_dim, output_dim)\n", + " self.relu = nn.ReLU()\n", + " self.sigmoid = nn.Sigmoid()\n", + " \n", + " def forward(self, z):\n", + " h = self.relu(self.fc1(z))\n", + " x_reconstructed = self.sigmoid(self.fc2(h))\n", + " return x_reconstructed" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the VAE\n", + "\n", + "The VAE model combines the encoder and decoder, and includes a sampling layer to sample from the latent distribution." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "class VAE(nn.Module):\n", + " def __init__(self, encoder, decoder):\n", + " super(VAE, self).__init__()\n", + " self.encoder = encoder\n", + " self.decoder = decoder\n", + " \n", + " def forward(self, x):\n", + " mean, log_var = self.encoder(x)\n", + " std = torch.exp(0.5 * log_var)\n", + " epsilon = torch.randn_like(std)\n", + " z = mean + std * epsilon\n", + " x_reconstructed = self.decoder(z)\n", + " return x_reconstructed, mean, log_var" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define the Loss Function\n", + "\n", + "The loss function consists of the reconstruction loss and the KL divergence." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "def vae_loss(x, x_reconstructed, mean, log_var):\n", + " reconstruction_loss = nn.functional.binary_cross_entropy(x_reconstructed, x, reduction='sum')\n", + " kl_divergence = -0.5 * torch.sum(1 + log_var - mean.pow(2) - log_var.exp())\n", + " return reconstruction_loss + kl_divergence" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Train the VAE\n", + "\n", + "Let's train the VAE on a simple dataset, such as the MNIST dataset." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# Load the MNIST dataset\n", + "transform = transforms.Compose([transforms.ToTensor(), transforms.Lambda(lambda x: x.view(-1))])\n", + "train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)\n", + "test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)\n", + "train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)\n", + "test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)\n", + "\n", + "# Define the VAE model\n", + "input_dim = 28 * 28\n", + "hidden_dim = 256\n", + "latent_dim = 2\n", + "encoder = Encoder(input_dim, hidden_dim, latent_dim)\n", + "decoder = Decoder(latent_dim, hidden_dim, input_dim)\n", + "vae = VAE(encoder, decoder)\n", + "\n", + "# Define the optimizer\n", + "optimizer = optim.Adam(vae.parameters(), lr=1e-3)\n", + "\n", + "# Train the model\n", + "num_epochs = 10\n", + "for epoch in range(num_epochs):\n", + " vae.train()\n", + " train_loss = 0\n", + " for x, _ in train_loader:\n", + " optimizer.zero_grad()\n", + " x_reconstructed, mean, log_var = vae(x)\n", + " loss = vae_loss(x, x_reconstructed, mean, log_var)\n", + " loss.backward()\n", + " train_loss += loss.item()\n", + " optimizer.step()\n", + " print(f'Epoch {epoch + 1}, Loss: {train_loss / len(train_loader.dataset)}')" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Generate New Images\n", + "\n", + "Now that we have trained the VAE, we can use it to generate new images." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# Generate new data points\n", + "z_new = torch.randn(10, latent_dim)\n", + "with torch.no_grad():\n", + " generated = vae.decoder(z_new)\n", + "\n", + "# Plot the generated data points\n", + "plt.figure(figsize=(10, 2))\n", + "for i in range(10):\n", + " plt.subplot(1, 10, i + 1)\n", + " plt.imshow(generated[i].view(28, 28).numpy(), cmap='gray')\n", + " plt.axis('off')\n", + "plt.show()" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Anomaly Detection\n", + "\n", + "VAEs can also be used for anomaly detection. By training a VAE on normal data, we can use the reconstruction error to detect anomalies. Data points with high reconstruction error are likely to be anomalies." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# Calculate reconstruction error for test data\n", + "vae.eval()\n", + "x_test = next(iter(test_loader))[0] # Get the test data from DataLoader\n", + "with torch.no_grad():\n", + " reconstructed, _, _ = vae(x_test)\n", + " reconstruction_error = torch.mean((x_test - reconstructed) ** 2, axis=1)\n", + "\n", + "# Plot reconstruction error\n", + "plt.figure(figsize=(8, 6))\n", + "plt.hist(reconstruction_error.numpy(), bins=50)\n", + "plt.xlabel('Reconstruction Error')\n", + "plt.ylabel('Frequency')\n", + "plt.title('Reconstruction Error Histogram')\n", + "plt.show()" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Data Compression\n", + "\n", + "VAEs can be used for data compression by encoding the data into a lower-dimensional latent space. The latent representation can then be used to reconstruct the original data." + ] + }, + { + "cell_type": "code", + "metadata": {}, + "source": [ + "# Encode the test data to the latent space\n", + "vae.eval()\n", + "with torch.no_grad():\n", + " mean, log_var = vae.encoder(x_test)\n", + " std = torch.exp(0.5 * log_var)\n", + " z = mean + std * torch.randn_like(std)\n", + "\n", + "# Decode the latent representation to reconstruct the data\n", + "with torch.no_grad():\n", + " reconstructed = vae.decoder(z)\n", + "\n", + "# Plot original and reconstructed data\n", + "plt.figure(figsize=(10, 4))\n", + "for i in range(10):\n", + " # Original data\n", + " plt.subplot(2, 10, i + 1)\n", + " plt.imshow(x_test[i].reshape(28, 28), cmap='gray')\n", + " plt.axis('off')\n", + " \n", + " # Reconstructed data\n", + " plt.subplot(2, 10, i + 11)\n", + " plt.imshow(reconstructed[i].view(28, 28).numpy(), cmap='gray')\n", + " plt.axis('off')\n", + "plt.show()" + ], + "outputs": [], + "execution_count": null + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.5" + } }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Image Generation\n", - "\n", - "One of the most popular applications of VAEs is image generation. VAEs can be used to generate new images that are similar to the training data. Let's see how we can use a trained VAE to generate new images." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "import torch\n", - "from torch import nn, optim\n", - "from torchvision import datasets, transforms\n", - "import matplotlib.pyplot as plt\n", - "import numpy as np" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the Encoder\n", - "\n", - "The encoder network takes the input data and outputs the parameters of the latent distribution (mean and log variance)." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "class Encoder(nn.Module):\n", - " def __init__(self, input_dim, hidden_dim, latent_dim):\n", - " super(Encoder, self).__init__()\n", - " self.fc1 = nn.Linear(input_dim, hidden_dim)\n", - " self.fc2_mean = nn.Linear(hidden_dim, latent_dim)\n", - " self.fc2_log_var = nn.Linear(hidden_dim, latent_dim)\n", - " self.relu = nn.ReLU()\n", - " \n", - " def forward(self, x):\n", - " h = self.relu(self.fc1(x))\n", - " mean = self.fc2_mean(h)\n", - " log_var = self.fc2_log_var(h)\n", - " return mean, log_var" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the Decoder\n", - "\n", - "The decoder network takes samples from the latent distribution and generates new data points." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "class Decoder(nn.Module):\n", - " def __init__(self, latent_dim, hidden_dim, output_dim):\n", - " super(Decoder, self).__init__()\n", - " self.fc1 = nn.Linear(latent_dim, hidden_dim)\n", - " self.fc2 = nn.Linear(hidden_dim, output_dim)\n", - " self.relu = nn.ReLU()\n", - " self.sigmoid = nn.Sigmoid()\n", - " \n", - " def forward(self, z):\n", - " h = self.relu(self.fc1(z))\n", - " x_reconstructed = self.sigmoid(self.fc2(h))\n", - " return x_reconstructed" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the VAE\n", - "\n", - "The VAE model combines the encoder and decoder, and includes a sampling layer to sample from the latent distribution." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "class VAE(nn.Module):\n", - " def __init__(self, encoder, decoder):\n", - " super(VAE, self).__init__()\n", - " self.encoder = encoder\n", - " self.decoder = decoder\n", - " \n", - " def forward(self, x):\n", - " mean, log_var = self.encoder(x)\n", - " std = torch.exp(0.5 * log_var)\n", - " epsilon = torch.randn_like(std)\n", - " z = mean + std * epsilon\n", - " x_reconstructed = self.decoder(z)\n", - " return x_reconstructed, mean, log_var" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the Loss Function\n", - "\n", - "The loss function consists of the reconstruction loss and the KL divergence." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "def vae_loss(x, x_reconstructed, mean, log_var):\n", - " reconstruction_loss = nn.functional.binary_cross_entropy(x_reconstructed, x, reduction='sum')\n", - " kl_divergence = -0.5 * torch.sum(1 + log_var - mean.pow(2) - log_var.exp())\n", - " return reconstruction_loss + kl_divergence" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Train the VAE\n", - "\n", - "Let's train the VAE on a simple dataset, such as the MNIST dataset." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "# Load the MNIST dataset\n", - "transform = transforms.Compose([transforms.ToTensor(), transforms.Lambda(lambda x: x.view(-1))])\n", - "train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)\n", - "test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)\n", - "train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)\n", - "test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)\n", - "\n", - "# Define the VAE model\n", - "input_dim = 28 * 28\n", - "hidden_dim = 256\n", - "latent_dim = 2\n", - "encoder = Encoder(input_dim, hidden_dim, latent_dim)\n", - "decoder = Decoder(latent_dim, hidden_dim, input_dim)\n", - "vae = VAE(encoder, decoder)\n", - "\n", - "# Define the optimizer\n", - "optimizer = optim.Adam(vae.parameters(), lr=1e-3)\n", - "\n", - "# Train the model\n", - "num_epochs = 10\n", - "for epoch in range(num_epochs):\n", - " vae.train()\n", - " train_loss = 0\n", - " for x, _ in train_loader:\n", - " optimizer.zero_grad()\n", - " x_reconstructed, mean, log_var = vae(x)\n", - " loss = vae_loss(x, x_reconstructed, mean, log_var)\n", - " loss.backward()\n", - " train_loss += loss.item()\n", - " optimizer.step()\n", - " print(f'Epoch {epoch + 1}, Loss: {train_loss / len(train_loader.dataset)}')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Generate New Images\n", - "\n", - "Now that we have trained the VAE, we can use it to generate new images." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "# Generate new data points\n", - "z_new = torch.randn(10, latent_dim)\n", - "with torch.no_grad():\n", - " generated = vae.decoder(z_new)\n", - "\n", - "# Plot the generated data points\n", - "plt.figure(figsize=(10, 2))\n", - "for i in range(10):\n", - " plt.subplot(1, 10, i + 1)\n", - " plt.imshow(generated[i].view(28, 28).numpy(), cmap='gray')\n", - " plt.axis('off')\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Anomaly Detection\n", - "\n", - "VAEs can also be used for anomaly detection. By training a VAE on normal data, we can use the reconstruction error to detect anomalies. Data points with high reconstruction error are likely to be anomalies." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "# Calculate reconstruction error for test data\n", - "vae.eval()\n", - "with torch.no_grad():\n", - " reconstructed, _, _ = vae(torch.tensor(x_test, dtype=torch.float32))\n", - " reconstruction_error = torch.mean((torch.tensor(x_test, dtype=torch.float32) - reconstructed) ** 2, axis=1)\n", - "\n", - "# Plot reconstruction error\n", - "plt.figure(figsize=(8, 6))\n", - "plt.hist(reconstruction_error.numpy(), bins=50)\n", - "plt.xlabel('Reconstruction Error')\n", - "plt.ylabel('Frequency')\n", - "plt.title('Reconstruction Error Histogram')\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Data Compression\n", - "\n", - "VAEs can be used for data compression by encoding the data into a lower-dimensional latent space. The latent representation can then be used to reconstruct the original data." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "# Encode the test data to the latent space\n", - "vae.eval()\n", - "with torch.no_grad():\n", - " mean, log_var = vae.encoder(torch.tensor(x_test, dtype=torch.float32))\n", - " std = torch.exp(0.5 * log_var)\n", - " z = mean + std * torch.randn_like(std)\n", - "\n", - "# Decode the latent representation to reconstruct the data\n", - "with torch.no_grad():\n", - " reconstructed = vae.decoder(z)\n", - "\n", - "# Plot original and reconstructed data\n", - "plt.figure(figsize=(10, 4))\n", - "for i in range(10):\n", - " # Original data\n", - " plt.subplot(2, 10, i + 1)\n", - " plt.imshow(x_test[i].reshape(28, 28), cmap='gray')\n", - " plt.axis('off')\n", - " \n", - " # Reconstructed data\n", - " plt.subplot(2, 10, i + 11)\n", - " plt.imshow(reconstructed[i].view(28, 28).numpy(), cmap='gray')\n", - " plt.axis('off')\n", - "plt.show()" - ] - } - ] -} + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file From 2c446debdc3a63e5498817de86424ea0061d3577 Mon Sep 17 00:00:00 2001 From: Robert Haase Date: Mon, 12 Aug 2024 10:52:46 +0200 Subject: [PATCH 3/4] removed VAE training example as it was a duplicate of contents in other notebooks --- .../03_vae_training.ipynb | 271 ------------------ docs/_toc.yml | 1 - 2 files changed, 272 deletions(-) delete mode 100644 docs/90_variational_auto_encoders/03_vae_training.ipynb diff --git a/docs/90_variational_auto_encoders/03_vae_training.ipynb b/docs/90_variational_auto_encoders/03_vae_training.ipynb deleted file mode 100644 index 76c7ad1e..00000000 --- a/docs/90_variational_auto_encoders/03_vae_training.ipynb +++ /dev/null @@ -1,271 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "directed-compensation", - "metadata": {}, - "source": [ - "
This notebook may contain text, code and images generated by artificial intelligence. Used model: gpt-4o-2024-08-06, vision model: gpt-4o-2024-08-06, endpoint: None, bia-bob version: 0.21.1.. It is good scientific practice to check the code and results it produces carefully. Read more about code generation using bia-bob
" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Training Variational Auto-Encoders (VAEs)\n", - "\n", - "In this notebook, we will focus on training Variational Auto-Encoders (VAEs). We will provide code examples for training VAEs and explain the loss functions and optimization techniques used in the training process." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Code Example\n", - "\n", - "Let's start with a simple implementation of a VAE using PyTorch." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "import torch\n", - "import torch.nn as nn\n", - "import torch.optim as optim\n", - "from torchvision import datasets, transforms\n", - "import matplotlib.pyplot as plt" - ], - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the Encoder\n", - "\n", - "The encoder network takes the input data and outputs the parameters of the latent distribution (mean and log variance)." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "class Encoder(nn.Module):\n", - " def __init__(self, input_dim, hidden_dim, latent_dim):\n", - " super(Encoder, self).__init__()\n", - " self.fc1 = nn.Linear(input_dim, hidden_dim)\n", - " self.fc2_mean = nn.Linear(hidden_dim, latent_dim)\n", - " self.fc2_log_var = nn.Linear(hidden_dim, latent_dim)\n", - " self.relu = nn.ReLU()\n", - " \n", - " def forward(self, x):\n", - " h = self.relu(self.fc1(x))\n", - " mean = self.fc2_mean(h)\n", - " log_var = self.fc2_log_var(h)\n", - " return mean, log_var" - ], - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the Decoder\n", - "\n", - "The decoder network takes samples from the latent distribution and generates new data points." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "class Decoder(nn.Module):\n", - " def __init__(self, latent_dim, hidden_dim, output_dim):\n", - " super(Decoder, self).__init__()\n", - " self.fc1 = nn.Linear(latent_dim, hidden_dim)\n", - " self.fc2 = nn.Linear(hidden_dim, output_dim)\n", - " self.relu = nn.ReLU()\n", - " self.sigmoid = nn.Sigmoid()\n", - " \n", - " def forward(self, z):\n", - " h = self.relu(self.fc1(z))\n", - " x_reconstructed = self.sigmoid(self.fc2(h))\n", - " return x_reconstructed" - ], - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the VAE\n", - "\n", - "The VAE model combines the encoder and decoder, and includes a sampling layer to sample from the latent distribution." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "class VAE(nn.Module):\n", - " def __init__(self, encoder, decoder):\n", - " super(VAE, self).__init__()\n", - " self.encoder = encoder\n", - " self.decoder = decoder\n", - " \n", - " def forward(self, x):\n", - " mean, log_var = self.encoder(x)\n", - " std = torch.exp(0.5 * log_var)\n", - " epsilon = torch.randn_like(std)\n", - " z = mean + std * epsilon\n", - " x_reconstructed = self.decoder(z)\n", - " return x_reconstructed, mean, log_var" - ], - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the Loss Function\n", - "\n", - "The loss function consists of the reconstruction loss and the KL divergence." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "def vae_loss(x, x_reconstructed, mean, log_var):\n", - " reconstruction_loss = nn.functional.binary_cross_entropy(x_reconstructed, x, reduction='sum')\n", - " kl_divergence = -0.5 * torch.sum(1 + log_var - mean.pow(2) - log_var.exp())\n", - " return reconstruction_loss + kl_divergence" - ], - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Train the VAE\n", - "\n", - "Let's train the VAE on a simple dataset, such as the MNIST dataset." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "# Load the MNIST dataset\n", - "transform = transforms.Compose([transforms.ToTensor(), transforms.Lambda(lambda x: x.view(-1))])\n", - "train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)\n", - "test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)\n", - "train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)\n", - "test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)\n", - "\n", - "# Define the VAE model\n", - "input_dim = 28 * 28\n", - "hidden_dim = 256\n", - "latent_dim = 2\n", - "encoder = Encoder(input_dim, hidden_dim, latent_dim)\n", - "decoder = Decoder(latent_dim, hidden_dim, input_dim)\n", - "vae = VAE(encoder, decoder)\n", - "\n", - "# Define the optimizer\n", - "optimizer = optim.Adam(vae.parameters(), lr=1e-3)\n", - "\n", - "# Train the model\n", - "num_epochs = 10\n", - "for epoch in range(num_epochs):\n", - " vae.train()\n", - " train_loss = 0\n", - " for x, _ in train_loader:\n", - " optimizer.zero_grad()\n", - " x_reconstructed, mean, log_var = vae(x)\n", - " loss = vae_loss(x, x_reconstructed, mean, log_var)\n", - " loss.backward()\n", - " train_loss += loss.item()\n", - " optimizer.step()\n", - " print(f'Epoch {epoch + 1}, Loss: {train_loss / len(train_loader.dataset)}')" - ], - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Visualize the Results\n", - "\n", - "Let's visualize the latent space learned by the VAE and generate some new data points." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "# Encode the test data to the latent space\n", - "vae.eval()\n", - "with torch.no_grad():\n", - " z_mean, z_log_var = [], []\n", - " for x, _ in test_loader:\n", - " mean, log_var = vae.encoder(x)\n", - " z_mean.append(mean)\n", - " z_log_var.append(log_var)\n", - " z_mean = torch.cat(z_mean)\n", - " z_log_var = torch.cat(z_log_var)\n", - " z = z_mean + torch.exp(0.5 * z_log_var) * torch.randn_like(z_mean)\n", - "\n", - "# Plot the latent space\n", - "plt.figure(figsize=(8, 6))\n", - "plt.scatter(z[:, 0].numpy(), z[:, 1].numpy(), c='blue', alpha=0.5)\n", - "plt.xlabel('z1')\n", - "plt.ylabel('z2')\n", - "plt.title('Latent Space')\n", - "plt.show()\n", - "\n", - "# Generate new data points\n", - "z_new = torch.randn(10, latent_dim)\n", - "with torch.no_grad():\n", - " generated = vae.decoder(z_new)\n", - "\n", - "# Plot the generated data points\n", - "plt.figure(figsize=(10, 2))\n", - "for i in range(10):\n", - " plt.subplot(1, 10, i + 1)\n", - " plt.imshow(generated[i].view(28, 28).numpy(), cmap='gray')\n", - " plt.axis('off')\n", - "plt.show()" - ], - "outputs": [], - "execution_count": null - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.x" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} \ No newline at end of file diff --git a/docs/_toc.yml b/docs/_toc.yml index 04061f49..fb9eb8ce 100644 --- a/docs/_toc.yml +++ b/docs/_toc.yml @@ -123,7 +123,6 @@ parts: chapters: - file: 90_variational_auto_encoders/01_intro_to_vae.ipynb - file: 90_variational_auto_encoders/02_vae_architecture.ipynb - - file: 90_variational_auto_encoders/03_vae_training.ipynb - file: 90_variational_auto_encoders/04_vae_applications.ipynb From 8bce9a1a7288e5232c9c497553018bbd23d18da7 Mon Sep 17 00:00:00 2001 From: Robert Haase Date: Mon, 12 Aug 2024 10:57:07 +0200 Subject: [PATCH 4/4] run notebooks --- .gitignore | 3 +- .../01_intro_to_vae.ipynb | 154 +++- .../02_vae_architecture.ipynb | 397 +++++----- .../04_vae_applications.ipynb | 695 ++++++++++-------- 4 files changed, 734 insertions(+), 515 deletions(-) diff --git a/.gitignore b/.gitignore index 16ec5a31..9b76f9b2 100644 --- a/.gitignore +++ b/.gitignore @@ -21,4 +21,5 @@ docs/29_algorithm_validation/ideas.ipynb docs/29_algorithm_validation/solution for exercise - metrics to investigate segmentation results.ipynb docs/22_feature_extraction/blobs_analysis.csv data/S-BIAD634 -docs/71_fine_tuning_hf/haesleinhuepf \ No newline at end of file +docs/71_fine_tuning_hf/haesleinhuepf +docs/90_variational_auto_encoders/data \ No newline at end of file diff --git a/docs/90_variational_auto_encoders/01_intro_to_vae.ipynb b/docs/90_variational_auto_encoders/01_intro_to_vae.ipynb index e6fcfa4c..d75a8be7 100644 --- a/docs/90_variational_auto_encoders/01_intro_to_vae.ipynb +++ b/docs/90_variational_auto_encoders/01_intro_to_vae.ipynb @@ -1,12 +1,5 @@ { "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "
This notebook may contain text, code and images generated by artificial intelligence. Used model: claude-3-5-sonnet-20240620, vision model: claude-3-5-sonnet-20240620, endpoint: None, bia-bob version: 0.21.1.. It is good scientific practice to check the code and results it produces carefully. Read more about code generation using bia-bob
" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -51,7 +44,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "metadata": {}, "outputs": [], "source": [ @@ -73,7 +66,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 2, "metadata": {}, "outputs": [], "source": [ @@ -103,7 +96,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 3, "metadata": {}, "outputs": [], "source": [ @@ -132,7 +125,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 4, "metadata": {}, "outputs": [], "source": [ @@ -162,7 +155,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 5, "metadata": {}, "outputs": [], "source": [ @@ -183,9 +176,110 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 6, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz\n", + "Failed to download (trying next):\n", + "HTTP Error 403: Forbidden\n", + "\n", + "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz\n", + "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data\\MNIST\\raw\\train-images-idx3-ubyte.gz\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████████████████████████████████████████████████████| 9912422/9912422 [00:01<00:00, 7150299.78it/s]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Extracting ./data\\MNIST\\raw\\train-images-idx3-ubyte.gz to ./data\\MNIST\\raw\n", + "\n", + "Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz\n", + "Failed to download (trying next):\n", + "HTTP Error 403: Forbidden\n", + "\n", + "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz\n", + "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./data\\MNIST\\raw\\train-labels-idx1-ubyte.gz\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|███████████████████████████████████████████████████████████████| 28881/28881 [00:00<00:00, 260992.37it/s]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Extracting ./data\\MNIST\\raw\\train-labels-idx1-ubyte.gz to ./data\\MNIST\\raw\n", + "\n", + "Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz\n", + "Failed to download (trying next):\n", + "HTTP Error 403: Forbidden\n", + "\n", + "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz\n", + "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./data\\MNIST\\raw\\t10k-images-idx3-ubyte.gz\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████████████████████████████████████████████████████| 1648877/1648877 [00:00<00:00, 2029789.12it/s]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Extracting ./data\\MNIST\\raw\\t10k-images-idx3-ubyte.gz to ./data\\MNIST\\raw\n", + "\n", + "Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz\n", + "Failed to download (trying next):\n", + "HTTP Error 403: Forbidden\n", + "\n", + "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz\n", + "Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./data\\MNIST\\raw\\t10k-labels-idx1-ubyte.gz\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|████████████████████████████████████████████████████████████████| 4542/4542 [00:00<00:00, 2265492.78it/s]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Extracting ./data\\MNIST\\raw\\t10k-labels-idx1-ubyte.gz to ./data\\MNIST\\raw\n", + "\n", + "Epoch 1, Loss: 182.6514552001953\n", + "Epoch 2, Loss: 165.24368096516926\n", + "Epoch 3, Loss: 161.51080247395834\n", + "Epoch 4, Loss: 159.12194900716145\n", + "Epoch 5, Loss: 157.41532638346354\n", + "Epoch 6, Loss: 156.18680362141927\n", + "Epoch 7, Loss: 155.20852910970052\n", + "Epoch 8, Loss: 154.32584979654948\n", + "Epoch 9, Loss: 153.60243400065104\n", + "Epoch 10, Loss: 152.96523545735678\n" + ] + } + ], "source": [ "# Load the MNIST dataset\n", "transform = transforms.Compose([transforms.ToTensor(), transforms.Lambda(lambda x: x.view(-1))])\n", @@ -231,9 +325,30 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 7, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ "# Encode the test data to the latent space\n", "vae.eval()\n", @@ -268,6 +383,13 @@ " plt.axis('off')\n", "plt.show()" ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { diff --git a/docs/90_variational_auto_encoders/02_vae_architecture.ipynb b/docs/90_variational_auto_encoders/02_vae_architecture.ipynb index d611d683..b21c0c53 100644 --- a/docs/90_variational_auto_encoders/02_vae_architecture.ipynb +++ b/docs/90_variational_auto_encoders/02_vae_architecture.ipynb @@ -1,188 +1,215 @@ { - "cells": [ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# VAE Architecture\n", + "\n", + "In this notebook, we will delve into the architecture of Variational Auto-Encoders (VAEs). We will explain the components of a VAE, including the encoder and decoder, and provide code examples for building a VAE architecture using PyTorch." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Encoder\n", + "\n", + "The encoder is a neural network that takes the input data and maps it to a latent space. The output of the encoder is the parameters of a probability distribution in the latent space, typically the mean and log variance." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import torch\n", + "import torch.nn as nn\n", + "\n", + "class Encoder(nn.Module):\n", + " def __init__(self, input_dim, hidden_dim, latent_dim):\n", + " super(Encoder, self).__init__()\n", + " self.fc1 = nn.Linear(input_dim, hidden_dim)\n", + " self.fc2_mean = nn.Linear(hidden_dim, latent_dim)\n", + " self.fc2_log_var = nn.Linear(hidden_dim, latent_dim)\n", + " self.relu = nn.ReLU()\n", + " \n", + " def forward(self, x):\n", + " h = self.relu(self.fc1(x))\n", + " mean = self.fc2_mean(h)\n", + " log_var = self.fc2_log_var(h)\n", + " return mean, log_var" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Decoder\n", + "\n", + "The decoder is a neural network that takes samples from the latent distribution and maps them back to the original data space. The output of the decoder is the reconstructed data." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "class Decoder(nn.Module):\n", + " def __init__(self, latent_dim, hidden_dim, output_dim):\n", + " super(Decoder, self).__init__()\n", + " self.fc1 = nn.Linear(latent_dim, hidden_dim)\n", + " self.fc2 = nn.Linear(hidden_dim, output_dim)\n", + " self.relu = nn.ReLU()\n", + " self.sigmoid = nn.Sigmoid()\n", + " \n", + " def forward(self, z):\n", + " h = self.relu(self.fc1(z))\n", + " x_reconstructed = self.sigmoid(self.fc2(h))\n", + " return x_reconstructed" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## VAE Model\n", + "\n", + "The VAE model combines the encoder and decoder, and includes a sampling layer to sample from the latent distribution." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "class VAE(nn.Module):\n", + " def __init__(self, encoder, decoder):\n", + " super(VAE, self).__init__()\n", + " self.encoder = encoder\n", + " self.decoder = decoder\n", + " \n", + " def forward(self, x):\n", + " mean, log_var = self.encoder(x)\n", + " std = torch.exp(0.5 * log_var)\n", + " epsilon = torch.randn_like(std)\n", + " z = mean + std * epsilon\n", + " x_reconstructed = self.decoder(z)\n", + " return x_reconstructed, mean, log_var" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Loss Function\n", + "\n", + "The loss function for a VAE consists of two terms: the reconstruction loss and the KL divergence. The reconstruction loss measures how well the decoder can reconstruct the input data from the latent space, while the KL divergence measures how close the learned distribution is to a prior distribution." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "def vae_loss(x, x_reconstructed, mean, log_var):\n", + " reconstruction_loss = nn.functional.binary_cross_entropy(x_reconstructed, x, reduction='sum')\n", + " kl_divergence = -0.5 * torch.sum(1 + log_var - mean.pow(2) - log_var.exp())\n", + " return reconstruction_loss + kl_divergence" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Training the VAE\n", + "\n", + "Let's train the VAE on a simple dataset, such as the MNIST dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "id": "directed-compensation", - "metadata": {}, - "source": [ - "
This notebook may contain text, code and images generated by artificial intelligence. Used model: gpt-4o-2024-08-06, vision model: gpt-4o-2024-08-06, endpoint: None, bia-bob version: 0.21.1.. It is good scientific practice to check the code and results it produces carefully. Read more about code generation using bia-bob
" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# VAE Architecture\n", - "\n", - "In this notebook, we will delve into the architecture of Variational Auto-Encoders (VAEs). We will explain the components of a VAE, including the encoder and decoder, and provide code examples for building a VAE architecture using PyTorch." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Encoder\n", - "\n", - "The encoder is a neural network that takes the input data and maps it to a latent space. The output of the encoder is the parameters of a probability distribution in the latent space, typically the mean and log variance." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "import torch\n", - "import torch.nn as nn\n", - "\n", - "class Encoder(nn.Module):\n", - " def __init__(self, input_dim, hidden_dim, latent_dim):\n", - " super(Encoder, self).__init__()\n", - " self.fc1 = nn.Linear(input_dim, hidden_dim)\n", - " self.fc2_mean = nn.Linear(hidden_dim, latent_dim)\n", - " self.fc2_log_var = nn.Linear(hidden_dim, latent_dim)\n", - " self.relu = nn.ReLU()\n", - " \n", - " def forward(self, x):\n", - " h = self.relu(self.fc1(x))\n", - " mean = self.fc2_mean(h)\n", - " log_var = self.fc2_log_var(h)\n", - " return mean, log_var" - ], - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Decoder\n", - "\n", - "The decoder is a neural network that takes samples from the latent distribution and maps them back to the original data space. The output of the decoder is the reconstructed data." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "class Decoder(nn.Module):\n", - " def __init__(self, latent_dim, hidden_dim, output_dim):\n", - " super(Decoder, self).__init__()\n", - " self.fc1 = nn.Linear(latent_dim, hidden_dim)\n", - " self.fc2 = nn.Linear(hidden_dim, output_dim)\n", - " self.relu = nn.ReLU()\n", - " self.sigmoid = nn.Sigmoid()\n", - " \n", - " def forward(self, z):\n", - " h = self.relu(self.fc1(z))\n", - " x_reconstructed = self.sigmoid(self.fc2(h))\n", - " return x_reconstructed" - ], - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## VAE Model\n", - "\n", - "The VAE model combines the encoder and decoder, and includes a sampling layer to sample from the latent distribution." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "class VAE(nn.Module):\n", - " def __init__(self, encoder, decoder):\n", - " super(VAE, self).__init__()\n", - " self.encoder = encoder\n", - " self.decoder = decoder\n", - " \n", - " def forward(self, x):\n", - " mean, log_var = self.encoder(x)\n", - " std = torch.exp(0.5 * log_var)\n", - " epsilon = torch.randn_like(std)\n", - " z = mean + std * epsilon\n", - " x_reconstructed = self.decoder(z)\n", - " return x_reconstructed, mean, log_var" - ], - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Loss Function\n", - "\n", - "The loss function for a VAE consists of two terms: the reconstruction loss and the KL divergence. The reconstruction loss measures how well the decoder can reconstruct the input data from the latent space, while the KL divergence measures how close the learned distribution is to a prior distribution." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "def vae_loss(x, x_reconstructed, mean, log_var):\n", - " reconstruction_loss = nn.functional.binary_cross_entropy(x_reconstructed, x, reduction='sum')\n", - " kl_divergence = -0.5 * torch.sum(1 + log_var - mean.pow(2) - log_var.exp())\n", - " return reconstruction_loss + kl_divergence" - ], - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Training the VAE\n", - "\n", - "Let's train the VAE on a simple dataset, such as the MNIST dataset." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "from torchvision import datasets, transforms\n", - "import torch.optim as optim\n", - "\n", - "# Load the MNIST dataset\n", - "transform = transforms.Compose([transforms.ToTensor(), transforms.Lambda(lambda x: x.view(-1))])\n", - "train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)\n", - "test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)\n", - "train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)\n", - "test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)\n", - "\n", - "# Define the VAE model\n", - "input_dim = 28 * 28\n", - "hidden_dim = 256\n", - "latent_dim = 2\n", - "encoder = Encoder(input_dim, hidden_dim, latent_dim)\n", - "decoder = Decoder(latent_dim, hidden_dim, input_dim)\n", - "vae = VAE(encoder, decoder)\n", - "\n", - "# Define the optimizer\n", - "optimizer = optim.Adam(vae.parameters(), lr=1e-3)\n", - "\n", - "# Train the model\n", - "num_epochs = 10\n", - "for epoch in range(num_epochs):\n", - " vae.train()\n", - " train_loss = 0\n", - " for x, _ in train_loader:\n", - " optimizer.zero_grad()\n", - " x_reconstructed, mean, log_var = vae(x)\n", - " loss = vae_loss(x, x_reconstructed, mean, log_var)\n", - " loss.backward()\n", - " train_loss += loss.item()\n", - " optimizer.step()\n", - " print(f'Epoch {epoch + 1}, Loss: {train_loss / len(train_loader.dataset)}')" - ], - "outputs": [], - "execution_count": null + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 1, Loss: 182.64177789713543\n", + "Epoch 2, Loss: 164.48488276367186\n", + "Epoch 3, Loss: 161.2271118815104\n", + "Epoch 4, Loss: 159.09910853678386\n", + "Epoch 5, Loss: 157.53306800944011\n", + "Epoch 6, Loss: 156.28730290527344\n", + "Epoch 7, Loss: 155.26126284179688\n", + "Epoch 8, Loss: 154.44241954752604\n", + "Epoch 9, Loss: 153.73177485351562\n", + "Epoch 10, Loss: 153.18850033365885\n" + ] } - ], - "metadata": {}, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + ], + "source": [ + "from torchvision import datasets, transforms\n", + "import torch.optim as optim\n", + "\n", + "# Load the MNIST dataset\n", + "transform = transforms.Compose([transforms.ToTensor(), transforms.Lambda(lambda x: x.view(-1))])\n", + "train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)\n", + "test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)\n", + "train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)\n", + "test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)\n", + "\n", + "# Define the VAE model\n", + "input_dim = 28 * 28\n", + "hidden_dim = 256\n", + "latent_dim = 2\n", + "encoder = Encoder(input_dim, hidden_dim, latent_dim)\n", + "decoder = Decoder(latent_dim, hidden_dim, input_dim)\n", + "vae = VAE(encoder, decoder)\n", + "\n", + "# Define the optimizer\n", + "optimizer = optim.Adam(vae.parameters(), lr=1e-3)\n", + "\n", + "# Train the model\n", + "num_epochs = 10\n", + "for epoch in range(num_epochs):\n", + " vae.train()\n", + " train_loss = 0\n", + " for x, _ in train_loader:\n", + " optimizer.zero_grad()\n", + " x_reconstructed, mean, log_var = vae(x)\n", + " loss = vae_loss(x, x_reconstructed, mean, log_var)\n", + " loss.backward()\n", + " train_loss += loss.item()\n", + " optimizer.step()\n", + " print(f'Epoch {epoch + 1}, Loss: {train_loss / len(train_loader.dataset)}')" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/90_variational_auto_encoders/04_vae_applications.ipynb b/docs/90_variational_auto_encoders/04_vae_applications.ipynb index 6119b61c..e1b2ef14 100644 --- a/docs/90_variational_auto_encoders/04_vae_applications.ipynb +++ b/docs/90_variational_auto_encoders/04_vae_applications.ipynb @@ -1,323 +1,392 @@ { - "cells": [ - { - "cell_type": "markdown", - "id": "directed-compensation", - "metadata": {}, - "source": [ - "
This notebook may contain text, code and images generated by artificial intelligence. Used model: gpt-4o-2024-08-06, vision model: gpt-4o-2024-08-06, endpoint: None, bia-bob version: 0.21.1.. It is good scientific practice to check the code and results it produces carefully. Read more about code generation using bia-bob
" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Applications of Variational Auto-Encoders (VAEs)\n", - "\n", - "In this notebook, we will explore some practical applications of Variational Auto-Encoders (VAEs). We will provide examples of using VAEs for image generation and other practical applications." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Image Generation\n", - "\n", - "One of the most popular applications of VAEs is image generation. VAEs can be used to generate new images that are similar to the training data. Let's see how we can use a trained VAE to generate new images." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "import torch\n", - "from torch import nn, optim\n", - "from torchvision import datasets, transforms\n", - "import matplotlib.pyplot as plt\n", - "import numpy as np" - ], - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the Encoder\n", - "\n", - "The encoder network takes the input data and outputs the parameters of the latent distribution (mean and log variance)." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "class Encoder(nn.Module):\n", - " def __init__(self, input_dim, hidden_dim, latent_dim):\n", - " super(Encoder, self).__init__()\n", - " self.fc1 = nn.Linear(input_dim, hidden_dim)\n", - " self.fc2_mean = nn.Linear(hidden_dim, latent_dim)\n", - " self.fc2_log_var = nn.Linear(hidden_dim, latent_dim)\n", - " self.relu = nn.ReLU()\n", - " \n", - " def forward(self, x):\n", - " h = self.relu(self.fc1(x))\n", - " mean = self.fc2_mean(h)\n", - " log_var = self.fc2_log_var(h)\n", - " return mean, log_var" - ], - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the Decoder\n", - "\n", - "The decoder network takes samples from the latent distribution and generates new data points." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "class Decoder(nn.Module):\n", - " def __init__(self, latent_dim, hidden_dim, output_dim):\n", - " super(Decoder, self).__init__()\n", - " self.fc1 = nn.Linear(latent_dim, hidden_dim)\n", - " self.fc2 = nn.Linear(hidden_dim, output_dim)\n", - " self.relu = nn.ReLU()\n", - " self.sigmoid = nn.Sigmoid()\n", - " \n", - " def forward(self, z):\n", - " h = self.relu(self.fc1(z))\n", - " x_reconstructed = self.sigmoid(self.fc2(h))\n", - " return x_reconstructed" - ], - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the VAE\n", - "\n", - "The VAE model combines the encoder and decoder, and includes a sampling layer to sample from the latent distribution." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "class VAE(nn.Module):\n", - " def __init__(self, encoder, decoder):\n", - " super(VAE, self).__init__()\n", - " self.encoder = encoder\n", - " self.decoder = decoder\n", - " \n", - " def forward(self, x):\n", - " mean, log_var = self.encoder(x)\n", - " std = torch.exp(0.5 * log_var)\n", - " epsilon = torch.randn_like(std)\n", - " z = mean + std * epsilon\n", - " x_reconstructed = self.decoder(z)\n", - " return x_reconstructed, mean, log_var" - ], - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define the Loss Function\n", - "\n", - "The loss function consists of the reconstruction loss and the KL divergence." - ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "def vae_loss(x, x_reconstructed, mean, log_var):\n", - " reconstruction_loss = nn.functional.binary_cross_entropy(x_reconstructed, x, reduction='sum')\n", - " kl_divergence = -0.5 * torch.sum(1 + log_var - mean.pow(2) - log_var.exp())\n", - " return reconstruction_loss + kl_divergence" - ], - "outputs": [], - "execution_count": null - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Train the VAE\n", - "\n", - "Let's train the VAE on a simple dataset, such as the MNIST dataset." - ] - }, + "cells": [ + { + "cell_type": "markdown", + "id": "directed-compensation", + "metadata": {}, + "source": [ + "
This notebook may contain text, code and images generated by artificial intelligence. Used model: gpt-4o-2024-08-06, vision model: gpt-4o-2024-08-06, endpoint: None, bia-bob version: 0.21.1.. It is good scientific practice to check the code and results it produces carefully. Read more about code generation using bia-bob
" + ] + }, + { + "cell_type": "markdown", + "id": "d63c57fe", + "metadata": {}, + "source": [ + "# Applications of Variational Auto-Encoders (VAEs)\n", + "\n", + "In this notebook, we will explore some practical applications of Variational Auto-Encoders (VAEs). We will provide examples of using VAEs for image generation and other practical applications." + ] + }, + { + "cell_type": "markdown", + "id": "7aa831f4", + "metadata": {}, + "source": [ + "## Image Generation\n", + "\n", + "One of the most popular applications of VAEs is image generation. VAEs can be used to generate new images that are similar to the training data. Let's see how we can use a trained VAE to generate new images." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "97d7eadd", + "metadata": {}, + "outputs": [], + "source": [ + "import torch\n", + "from torch import nn, optim\n", + "from torchvision import datasets, transforms\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np" + ] + }, + { + "cell_type": "markdown", + "id": "0be44b86", + "metadata": {}, + "source": [ + "### Define the Encoder\n", + "\n", + "The encoder network takes the input data and outputs the parameters of the latent distribution (mean and log variance)." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "85f2b640", + "metadata": {}, + "outputs": [], + "source": [ + "class Encoder(nn.Module):\n", + " def __init__(self, input_dim, hidden_dim, latent_dim):\n", + " super(Encoder, self).__init__()\n", + " self.fc1 = nn.Linear(input_dim, hidden_dim)\n", + " self.fc2_mean = nn.Linear(hidden_dim, latent_dim)\n", + " self.fc2_log_var = nn.Linear(hidden_dim, latent_dim)\n", + " self.relu = nn.ReLU()\n", + " \n", + " def forward(self, x):\n", + " h = self.relu(self.fc1(x))\n", + " mean = self.fc2_mean(h)\n", + " log_var = self.fc2_log_var(h)\n", + " return mean, log_var" + ] + }, + { + "cell_type": "markdown", + "id": "572eb6c2", + "metadata": {}, + "source": [ + "### Define the Decoder\n", + "\n", + "The decoder network takes samples from the latent distribution and generates new data points." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "38b43173", + "metadata": {}, + "outputs": [], + "source": [ + "class Decoder(nn.Module):\n", + " def __init__(self, latent_dim, hidden_dim, output_dim):\n", + " super(Decoder, self).__init__()\n", + " self.fc1 = nn.Linear(latent_dim, hidden_dim)\n", + " self.fc2 = nn.Linear(hidden_dim, output_dim)\n", + " self.relu = nn.ReLU()\n", + " self.sigmoid = nn.Sigmoid()\n", + " \n", + " def forward(self, z):\n", + " h = self.relu(self.fc1(z))\n", + " x_reconstructed = self.sigmoid(self.fc2(h))\n", + " return x_reconstructed" + ] + }, + { + "cell_type": "markdown", + "id": "563c8846", + "metadata": {}, + "source": [ + "### Define the VAE\n", + "\n", + "The VAE model combines the encoder and decoder, and includes a sampling layer to sample from the latent distribution." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "5f113558", + "metadata": {}, + "outputs": [], + "source": [ + "class VAE(nn.Module):\n", + " def __init__(self, encoder, decoder):\n", + " super(VAE, self).__init__()\n", + " self.encoder = encoder\n", + " self.decoder = decoder\n", + " \n", + " def forward(self, x):\n", + " mean, log_var = self.encoder(x)\n", + " std = torch.exp(0.5 * log_var)\n", + " epsilon = torch.randn_like(std)\n", + " z = mean + std * epsilon\n", + " x_reconstructed = self.decoder(z)\n", + " return x_reconstructed, mean, log_var" + ] + }, + { + "cell_type": "markdown", + "id": "6fad01da", + "metadata": {}, + "source": [ + "### Define the Loss Function\n", + "\n", + "The loss function consists of the reconstruction loss and the KL divergence." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "b35e02c9", + "metadata": {}, + "outputs": [], + "source": [ + "def vae_loss(x, x_reconstructed, mean, log_var):\n", + " reconstruction_loss = nn.functional.binary_cross_entropy(x_reconstructed, x, reduction='sum')\n", + " kl_divergence = -0.5 * torch.sum(1 + log_var - mean.pow(2) - log_var.exp())\n", + " return reconstruction_loss + kl_divergence" + ] + }, + { + "cell_type": "markdown", + "id": "45253ced", + "metadata": {}, + "source": [ + "### Train the VAE\n", + "\n", + "Let's train the VAE on a simple dataset, such as the MNIST dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "20470855", + "metadata": {}, + "outputs": [ { - "cell_type": "code", - "metadata": {}, - "source": [ - "# Load the MNIST dataset\n", - "transform = transforms.Compose([transforms.ToTensor(), transforms.Lambda(lambda x: x.view(-1))])\n", - "train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)\n", - "test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)\n", - "train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)\n", - "test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)\n", - "\n", - "# Define the VAE model\n", - "input_dim = 28 * 28\n", - "hidden_dim = 256\n", - "latent_dim = 2\n", - "encoder = Encoder(input_dim, hidden_dim, latent_dim)\n", - "decoder = Decoder(latent_dim, hidden_dim, input_dim)\n", - "vae = VAE(encoder, decoder)\n", - "\n", - "# Define the optimizer\n", - "optimizer = optim.Adam(vae.parameters(), lr=1e-3)\n", - "\n", - "# Train the model\n", - "num_epochs = 10\n", - "for epoch in range(num_epochs):\n", - " vae.train()\n", - " train_loss = 0\n", - " for x, _ in train_loader:\n", - " optimizer.zero_grad()\n", - " x_reconstructed, mean, log_var = vae(x)\n", - " loss = vae_loss(x, x_reconstructed, mean, log_var)\n", - " loss.backward()\n", - " train_loss += loss.item()\n", - " optimizer.step()\n", - " print(f'Epoch {epoch + 1}, Loss: {train_loss / len(train_loader.dataset)}')" - ], - "outputs": [], - "execution_count": null - }, + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 1, Loss: 185.8339251871745\n", + "Epoch 2, Loss: 166.8651765625\n", + "Epoch 3, Loss: 163.44859486490884\n", + "Epoch 4, Loss: 161.4818348063151\n", + "Epoch 5, Loss: 159.99831422526043\n", + "Epoch 6, Loss: 158.78891735026042\n", + "Epoch 7, Loss: 157.77428404947918\n", + "Epoch 8, Loss: 156.8380381347656\n", + "Epoch 9, Loss: 156.01695650227865\n", + "Epoch 10, Loss: 155.277118351237\n" + ] + } + ], + "source": [ + "# Load the MNIST dataset\n", + "transform = transforms.Compose([transforms.ToTensor(), transforms.Lambda(lambda x: x.view(-1))])\n", + "train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)\n", + "test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)\n", + "train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)\n", + "test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)\n", + "\n", + "# Define the VAE model\n", + "input_dim = 28 * 28\n", + "hidden_dim = 256\n", + "latent_dim = 2\n", + "encoder = Encoder(input_dim, hidden_dim, latent_dim)\n", + "decoder = Decoder(latent_dim, hidden_dim, input_dim)\n", + "vae = VAE(encoder, decoder)\n", + "\n", + "# Define the optimizer\n", + "optimizer = optim.Adam(vae.parameters(), lr=1e-3)\n", + "\n", + "# Train the model\n", + "num_epochs = 10\n", + "for epoch in range(num_epochs):\n", + " vae.train()\n", + " train_loss = 0\n", + " for x, _ in train_loader:\n", + " optimizer.zero_grad()\n", + " x_reconstructed, mean, log_var = vae(x)\n", + " loss = vae_loss(x, x_reconstructed, mean, log_var)\n", + " loss.backward()\n", + " train_loss += loss.item()\n", + " optimizer.step()\n", + " print(f'Epoch {epoch + 1}, Loss: {train_loss / len(train_loader.dataset)}')" + ] + }, + { + "cell_type": "markdown", + "id": "7f330fe4", + "metadata": {}, + "source": [ + "### Generate New Images\n", + "\n", + "Now that we have trained the VAE, we can use it to generate new images." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "35155de6", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Generate New Images\n", - "\n", - "Now that we have trained the VAE, we can use it to generate new images." + "data": { + "image/png": "", + "text/plain": [ + "
" ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "# Generate new data points\n", - "z_new = torch.randn(10, latent_dim)\n", - "with torch.no_grad():\n", - " generated = vae.decoder(z_new)\n", - "\n", - "# Plot the generated data points\n", - "plt.figure(figsize=(10, 2))\n", - "for i in range(10):\n", - " plt.subplot(1, 10, i + 1)\n", - " plt.imshow(generated[i].view(28, 28).numpy(), cmap='gray')\n", - " plt.axis('off')\n", - "plt.show()" - ], - "outputs": [], - "execution_count": null - }, + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Generate new data points\n", + "z_new = torch.randn(10, latent_dim)\n", + "with torch.no_grad():\n", + " generated = vae.decoder(z_new)\n", + "\n", + "# Plot the generated data points\n", + "plt.figure(figsize=(10, 2))\n", + "for i in range(10):\n", + " plt.subplot(1, 10, i + 1)\n", + " plt.imshow(generated[i].view(28, 28).numpy(), cmap='gray')\n", + " plt.axis('off')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "921feb3d", + "metadata": {}, + "source": [ + "## Anomaly Detection\n", + "\n", + "VAEs can also be used for anomaly detection. By training a VAE on normal data, we can use the reconstruction error to detect anomalies. Data points with high reconstruction error are likely to be anomalies." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "710d9a57", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Anomaly Detection\n", - "\n", - "VAEs can also be used for anomaly detection. By training a VAE on normal data, we can use the reconstruction error to detect anomalies. Data points with high reconstruction error are likely to be anomalies." + "data": { + "image/png": "", + "text/plain": [ + "
" ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "# Calculate reconstruction error for test data\n", - "vae.eval()\n", - "x_test = next(iter(test_loader))[0] # Get the test data from DataLoader\n", - "with torch.no_grad():\n", - " reconstructed, _, _ = vae(x_test)\n", - " reconstruction_error = torch.mean((x_test - reconstructed) ** 2, axis=1)\n", - "\n", - "# Plot reconstruction error\n", - "plt.figure(figsize=(8, 6))\n", - "plt.hist(reconstruction_error.numpy(), bins=50)\n", - "plt.xlabel('Reconstruction Error')\n", - "plt.ylabel('Frequency')\n", - "plt.title('Reconstruction Error Histogram')\n", - "plt.show()" - ], - "outputs": [], - "execution_count": null - }, + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Calculate reconstruction error for test data\n", + "vae.eval()\n", + "x_test = next(iter(test_loader))[0] # Get the test data from DataLoader\n", + "with torch.no_grad():\n", + " reconstructed, _, _ = vae(x_test)\n", + " reconstruction_error = torch.mean((x_test - reconstructed) ** 2, axis=1)\n", + "\n", + "# Plot reconstruction error\n", + "plt.figure(figsize=(8, 6))\n", + "plt.hist(reconstruction_error.numpy(), bins=50)\n", + "plt.xlabel('Reconstruction Error')\n", + "plt.ylabel('Frequency')\n", + "plt.title('Reconstruction Error Histogram')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "e4f34351", + "metadata": {}, + "source": [ + "## Data Compression\n", + "\n", + "VAEs can be used for data compression by encoding the data into a lower-dimensional latent space. The latent representation can then be used to reconstruct the original data." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "3a983ee3", + "metadata": {}, + "outputs": [ { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Data Compression\n", - "\n", - "VAEs can be used for data compression by encoding the data into a lower-dimensional latent space. The latent representation can then be used to reconstruct the original data." + "data": { + "image/png": "", + "text/plain": [ + "
" ] - }, - { - "cell_type": "code", - "metadata": {}, - "source": [ - "# Encode the test data to the latent space\n", - "vae.eval()\n", - "with torch.no_grad():\n", - " mean, log_var = vae.encoder(x_test)\n", - " std = torch.exp(0.5 * log_var)\n", - " z = mean + std * torch.randn_like(std)\n", - "\n", - "# Decode the latent representation to reconstruct the data\n", - "with torch.no_grad():\n", - " reconstructed = vae.decoder(z)\n", - "\n", - "# Plot original and reconstructed data\n", - "plt.figure(figsize=(10, 4))\n", - "for i in range(10):\n", - " # Original data\n", - " plt.subplot(2, 10, i + 1)\n", - " plt.imshow(x_test[i].reshape(28, 28), cmap='gray')\n", - " plt.axis('off')\n", - " \n", - " # Reconstructed data\n", - " plt.subplot(2, 10, i + 11)\n", - " plt.imshow(reconstructed[i].view(28, 28).numpy(), cmap='gray')\n", - " plt.axis('off')\n", - "plt.show()" - ], - "outputs": [], - "execution_count": null - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.5" + }, + "metadata": {}, + "output_type": "display_data" } + ], + "source": [ + "# Encode the test data to the latent space\n", + "vae.eval()\n", + "with torch.no_grad():\n", + " mean, log_var = vae.encoder(x_test)\n", + " std = torch.exp(0.5 * log_var)\n", + " z = mean + std * torch.randn_like(std)\n", + "\n", + "# Decode the latent representation to reconstruct the data\n", + "with torch.no_grad():\n", + " reconstructed = vae.decoder(z)\n", + "\n", + "# Plot original and reconstructed data\n", + "plt.figure(figsize=(10, 4))\n", + "for i in range(10):\n", + " # Original data\n", + " plt.subplot(2, 10, i + 1)\n", + " plt.imshow(x_test[i].reshape(28, 28), cmap='gray')\n", + " plt.axis('off')\n", + " \n", + " # Reconstructed data\n", + " plt.subplot(2, 10, i + 11)\n", + " plt.imshow(reconstructed[i].view(28, 28).numpy(), cmap='gray')\n", + " plt.axis('off')\n", + "plt.show()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" }, - "nbformat": 4, - "nbformat_minor": 5 -} \ No newline at end of file + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}