Variational Autoencoders

Learn about variational autoencoders and their essential components.

Autoencoders are highly effective neural network architectures that excel at unsupervised learning tasks, such as dimensionality reduction and feature extraction. However, they are fundamentally limited in terms of data generation. While Autoencoders map similar inputs to clusters in the latent space (e.g., similar MNIST digits group together), some parts of that latent space remain unmapped or unregulated. This means that if we sample a random point from the latent space, there is no guarantee that it corresponds to a valid data point; the decoder might produce an invalid or non-meaningful output. Because the latent space is not smoothly organized, Autoencoders are unsuitable for controlled data generation. This realization necessitates a modification of the architecture, resulting in variational autoencoders.

Variational autoencoders

The variational autoencoder (VAE) is an advanced version of the standard autoencoder, built specifically to fix one major problem: creating new data.

In a standard autoencoder, the compressed code (latent space) is disorganized, meaning large gaps exist where the decoder has no idea what to create. The VAE solves this by using a special rule: it forces the compressed code to follow a simple, well-known probability pattern (like a perfectly organized bell curve). This rule makes the entire latent space smooth and predictable, ensuring that any randomly selected point will produce a valid piece of data.

Major components of VAE

VAEs use three main parts, but the Encoder acts very differently:

  • Encoder (fWef_{{W}_e}): Instead of giving a single, exact compressed code, the encoder outputs two numbers for every input. These two numbers are the mean (μ\boldsymbol{\mu}), which is the center of the compressed location, and the standard deviation (σ\boldsymbol{\sigma}), which defines how wide or uncertain that location is. This is the step that forces the code to fit the predefined pattern.

  • Latent vector (ZZ): This is the final compressed code that is passed to the decoder. Crucially, the vector ZZ is randomly sampled from the distribution defined by the mean (μ\boldsymbol{\mu}) and standard deviation ( ...

Ask