Pixel Recurrent Neural Networks
Annotated Paper Link: Google Drive
These are my notes for Oord et al. (2016).
- This paper introduces two Autoregressive generateive image models: PixelRNN (Row LSTM and Diagonal BiLSTM) and PixelCNN.
- Like in any autoregressive model, the joint distribution of the pixels is factorized into a product of conditional distributions.
- Row LSTM
- Row LSTM process the image row-by-row (top-to-bottom) and computes the features for a whole row at once.
- It has a triangular receptive field. (Unable to capture the entire available context for the current pixel)
- Diagonal BiLSTM
- Diagonal BiLSTM processes the image along the diagonals, which allows it to capture the entire available context for the current pixel.
- Both LSTMs are slow intraining because the hidden state calculation can’t be parallelized
- PixelCNN
- Replaces the LSTM layers with convolutional layers, which allows for parallelization during training.
- Still, the generation process is sequential and can’t be parallelized. (This is an autoregressive model after all)
- Residual Connections
- Both PixelRNN and PixelCNN use residual connections between layers