Invertible Residual Networks
Annotated Paper Link: Google Drive
These are my notes for Behrmann et al. (2019).
- This paper shows that through a simple modification, ResNets can be made invertible which makes them usable by normalizing flow models.
- = ResNets for both discriminative and generative modeling.
- How to make ResNets invertible?
- The modification is to make the Lipschitz constant of the residual block less than 1. This can be achieved by spectral normalization of the weights of the residual block.
- The lipschitz constant can be thought of as the maximum absolute value of the derivative of a function.
- The spectral norm of a matrix can be though of as the maximum streaching factor a matrix can have on a vector.
- There is no analytical solution for the inverse of a ResNet, but it can be computed through a certain algorithm.
- Check section 2 for more details on the ResNet modification.
- The modification is to make the Lipschitz constant of the residual block less than 1. This can be achieved by spectral normalization of the weights of the residual block.
- How to use it for normalizing flows?
- To use an architecture in a normalizing flow model, we need two things:
- Invertible architecture (done)
- Tractable log-determinant of the Jacobian (Discussed below; section 3.1, 3.2)
- The paper proves that the log-determinant of the Jacobian can be computed through a power series expansion for which is there is a stochastic approximation.
- To use an architecture in a normalizing flow model, we need two things:
- Results
- Performs competitively with both SOTA image classification (discriminative; CIFAR-10, CIFAR-100, MNIST) and flow based generative modeling (generative; MNIST, CIFAR-10)