49 Invertible Residual Networks

Annotated Paper Link: Google Drive

These are my notes for Behrmann et al. (2019).

This paper shows that through a simple modification, ResNets can be made invertible which makes them usable by normalizing flow models.
- = ResNets for both discriminative and generative modeling.
How to make ResNets invertible?
- The modification is to make the Lipschitz constant of the residual block less than 1. This can be achieved by spectral normalization of the weights of the residual block.
  - The lipschitz constant can be thought of as the maximum absolute value of the derivative of a function.
  - The spectral norm of a matrix can be though of as the maximum streaching factor a matrix can have on a vector.
- There is no analytical solution for the inverse of a ResNet, but it can be computed through a certain algorithm.
- Check section 2 for more details on the ResNet modification.
How to use it for normalizing flows?
- To use an architecture in a normalizing flow model, we need two things:
  1. Invertible architecture (done)
  2. Tractable log-determinant of the Jacobian (Discussed below; section 3.1, 3.2)
- The paper proves that the log-determinant of the Jacobian can be computed through a power series expansion for which is there is a stochastic approximation.
Results
- Performs competitively with both SOTA image classification (discriminative; CIFAR-10, CIFAR-100, MNIST) and flow based generative modeling (generative; MNIST, CIFAR-10)