References

Behrmann, J., Grathwohl, W., Chen, R. T. Q., Duvenaud, D., & Jacobsen, J.-H. (2019). Invertible residual networks. In K. Chaudhuri & R. Salakhutdinov (Eds.), Proceedings of the 36th international conference on machine learning (Vol. 97, pp. 573–582). PMLR. https://proceedings.mlr.press/v97/behrmann19a.html

Bengio, Y., & Bengio, S. (1999). Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks. Advances in Neural Information Processing Systems, 12. https://proceedings.neurips.cc/paper_files/paper/1999/hash/e6384711491713d29bc63fc5eeb5ba4f-Abstract.html

Kingma, D. P., & Dhariwal, P. (2018). Glow: Generative flow with invertible 1x1 convolutions. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 31). Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2018/file/d139db6a236200b21cc7f752979132d0-Paper.pdf

Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., & Welling, M. (2016). Improved Variational Inference with Inverse Autoregressive Flow. Advances in Neural Information Processing Systems, 29. https://proceedings.neurips.cc/paper_files/paper/2016/hash/ddeebdeefdb7e7e7a697e1c3e3d8ef54-Abstract.html

Leskovec, J. (2023, December 7). Graph neural networks. Stanford. https://www.youtube.com/watch?v=ZfK4FDk9uy8

Li, T., Tian, Y., Li, H., Deng, M., & He, K. (2024). Autoregressive image generation without vector quantization. https://arxiv.org/abs/2406.11838

Oord, A. van den, Kalchbrenner, N., & Kavukcuoglu, K. (2016). Pixel recurrent neural networks. https://arxiv.org/abs/1601.06759

Oord, A. van den, Vinyals, O., & Kavukcuoglu, K. (2018). Neural discrete representation learning. https://arxiv.org/abs/1711.00937

Rezende, D., & Mohamed, S. (2015). Variational inference with normalizing flows. In F. Bach & D. Blei (Eds.), Proceedings of the 32nd international conference on machine learning (Vol. 37, pp. 1530–1538). PMLR. https://proceedings.mlr.press/v37/rezende15.html

Tian, K., Jiang, Y., Yuan, Z., Peng, B., & Wang, L. (2024). Visual autoregressive modeling: Scalable image generation via next-scale prediction. https://arxiv.org/abs/2404.02905

Tomczak, J. M. (2024). Deep Generative Modeling. Springer International Publishing. https://doi.org/10.1007/978-3-031-64087-2

Ulyanov, D., Vedaldi, A., & Lempitsky, V. (2018). Deep image prior. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Yu, L., Cheng, Y., Sohn, K., Lezama, J., Zhang, H., Chang, H., Hauptmann, A. G., Yang, M.-H., Hao, Y., Essa, I., & Jiang, L. (2023). MAGVIT: Masked generative video transformer. https://arxiv.org/abs/2212.05199

Yu, L., Lezama, J., Gundavarapu, N. B., Versari, L., Sohn, K., Minnen, D., Cheng, Y., Birodkar, V., Gupta, A., Gu, X., Hauptmann, A. G., Gong, B., Yang, M.-H., Essa, I., Ross, D. A., & Jiang, L. (2024). Language model beats diffusion – tokenizer is key to visual generation. https://arxiv.org/abs/2310.05737

Yu, Q., Weber, M., Deng, X., Shen, X., Cremers, D., & Chen, L.-C. (2024). An image is worth 32 tokens for reconstruction and generation. https://arxiv.org/abs/2406.07550

Zhou, C., Yu, L., Babu, A., Tirumala, K., Yasunaga, M., Shamis, L., Kahn, J., Ma, X., Zettlemoyer, L., & Levy, O. (2024). Transfusion: Predict the next token and diffuse images with one multi-modal model. https://arxiv.org/abs/2408.11039

Zoran, D., & Weiss, Y. (2011). From learning models of natural image patches to whole image restoration. 2011 International Conference on Computer Vision, 479–486. https://doi.org/10.1109/ICCV.2011.6126278

Zoran, D., & Weiss, Y. (2012). Natural images, gaussian mixtures and dead leaves. In F. Pereira, C. J. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 25). Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2012/file/e97ee2054defb209c35fe4dc94599061-Paper.pdf