References

Behrmann, J., Grathwohl, W., Chen, R. T. Q., Duvenaud, D., & Jacobsen, J.-H. (2019). Invertible residual networks. In K. Chaudhuri & R. Salakhutdinov (Eds.), Proceedings of the 36th international conference on machine learning (Vol. 97, pp. 573–582). PMLR. https://proceedings.mlr.press/v97/behrmann19a.html
Bengio, Y., & Bengio, S. (1999). Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks. Advances in Neural Information Processing Systems, 12. https://proceedings.neurips.cc/paper_files/paper/1999/hash/e6384711491713d29bc63fc5eeb5ba4f-Abstract.html
Kingma, D. P., & Dhariwal, P. (2018). Glow: Generative flow with invertible 1x1 convolutions. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 31). Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2018/file/d139db6a236200b21cc7f752979132d0-Paper.pdf
Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., & Welling, M. (2016). Improved Variational Inference with Inverse Autoregressive Flow. Advances in Neural Information Processing Systems, 29. https://proceedings.neurips.cc/paper_files/paper/2016/hash/ddeebdeefdb7e7e7a697e1c3e3d8ef54-Abstract.html
Leskovec, J. (2023, December 7). Graph neural networks. Stanford. https://www.youtube.com/watch?v=ZfK4FDk9uy8
Li, T., Tian, Y., Li, H., Deng, M., & He, K. (2024). Autoregressive image generation without vector quantization. https://arxiv.org/abs/2406.11838
Oord, A. van den, Kalchbrenner, N., & Kavukcuoglu, K. (2016). Pixel recurrent neural networks. https://arxiv.org/abs/1601.06759
Oord, A. van den, Vinyals, O., & Kavukcuoglu, K. (2018). Neural discrete representation learning. https://arxiv.org/abs/1711.00937
Rezende, D., & Mohamed, S. (2015). Variational inference with normalizing flows. In F. Bach & D. Blei (Eds.), Proceedings of the 32nd international conference on machine learning (Vol. 37, pp. 1530–1538). PMLR. https://proceedings.mlr.press/v37/rezende15.html
Tian, K., Jiang, Y., Yuan, Z., Peng, B., & Wang, L. (2024). Visual autoregressive modeling: Scalable image generation via next-scale prediction. https://arxiv.org/abs/2404.02905
Tomczak, J. M. (2024). Deep Generative Modeling. Springer International Publishing. https://doi.org/10.1007/978-3-031-64087-2
Ulyanov, D., Vedaldi, A., & Lempitsky, V. (2018). Deep image prior. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Yu, L., Cheng, Y., Sohn, K., Lezama, J., Zhang, H., Chang, H., Hauptmann, A. G., Yang, M.-H., Hao, Y., Essa, I., & Jiang, L. (2023). MAGVIT: Masked generative video transformer. https://arxiv.org/abs/2212.05199
Yu, L., Lezama, J., Gundavarapu, N. B., Versari, L., Sohn, K., Minnen, D., Cheng, Y., Birodkar, V., Gupta, A., Gu, X., Hauptmann, A. G., Gong, B., Yang, M.-H., Essa, I., Ross, D. A., & Jiang, L. (2024). Language model beats diffusion – tokenizer is key to visual generation. https://arxiv.org/abs/2310.05737
Yu, Q., Weber, M., Deng, X., Shen, X., Cremers, D., & Chen, L.-C. (2024). An image is worth 32 tokens for reconstruction and generation. https://arxiv.org/abs/2406.07550
Zhou, C., Yu, L., Babu, A., Tirumala, K., Yasunaga, M., Shamis, L., Kahn, J., Ma, X., Zettlemoyer, L., & Levy, O. (2024). Transfusion: Predict the next token and diffuse images with one multi-modal model. https://arxiv.org/abs/2408.11039
Zoran, D., & Weiss, Y. (2011). From learning models of natural image patches to whole image restoration. 2011 International Conference on Computer Vision, 479–486. https://doi.org/10.1109/ICCV.2011.6126278
Zoran, D., & Weiss, Y. (2012). Natural images, gaussian mixtures and dead leaves. In F. Pereira, C. J. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 25). Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2012/file/e97ee2054defb209c35fe4dc94599061-Paper.pdf