51 Pixel Recurrent Neural Networks

Annotated Paper Link: Google Drive

These are my notes for Oord et al. (2016).

This paper introduces two Autoregressive generateive image models: PixelRNN (Row LSTM and Diagonal BiLSTM) and PixelCNN.
Like in any autoregressive model, the joint distribution of the pixels is factorized into a product of conditional distributions.
Row LSTM
- Row LSTM process the image row-by-row (top-to-bottom) and computes the features for a whole row at once.
- It has a triangular receptive field. (Unable to capture the entire available context for the current pixel)
Diagonal BiLSTM
- Diagonal BiLSTM processes the image along the diagonals, which allows it to capture the entire available context for the current pixel.
Both LSTMs are slow intraining because the hidden state calculation can’t be parallelized
PixelCNN
- Replaces the LSTM layers with convolutional layers, which allows for parallelization during training.
- Still, the generation process is sequential and can’t be parallelized. (This is an autoregressive model after all)
Residual Connections
- Both PixelRNN and PixelCNN use residual connections between layers