47 Variational Inference with Normalizing Flows

Annotated Paper Link: Google Drive

This is my notes for the paper Rezende & Mohamed (2015).

This paper proposes the usage of normalizing flow models as an approximate posterior distribution in variational inference
The Problem
- Often, the approximations used for the posterior distributions are limited, and no solution is able to resemble the true posterior \(p(z|x)\)
The author mentions some proposals for more flexible posterior distributions, but are computationally expensive
In section 2, they summarize variational inference
Normalizing Flow Models
- A normalizing flow describes the transformation of a probability density through a sequence of invertible mappings.
- Starting from \(\mathbf{z}_0\) which has a known distribution \(q_0\) and performing \(K\) transformations \(f_k\). Then the log likelihood of the output is
  - \[ \ln q_K(\mathbf{z}_K) = \ln q_0(\mathbf{z}_0) - \sum_{k=1}^K \ln \det \left| \frac{\partial f_k}{\partial \mathbf{z}_k} \right| \]
- There are infinitesimal flows which aren’t described by a finite sequence of transformations
  - The authors discuss two such families: Langevin Flow and Hamilotonian Flow
- To allow for scalable inference using finite normalizing flows, two things are required
  - Invertible Transformations
  - Efficent calculation of the determinant of the Jacobian
The authors describe a certain family of transformations and define the (negative) ELBO (AKA Free Energy Bound)
The result is an approximate posterior that can resemble the true posterior
They find substantial improvement in approximation quality when they increase the flow length
They use this model on CIFAR and MNIST