Deep Image Prior

Published

February 4, 2026

These are the notes for Ulyanov et al. (2018)

Main Point: The structure of ConvNets is sufficient to capture a great deal of low-level statistics before any training
- The study focuses on the prior captured by a deep convolutional network, independent of any training
The Driving Reasoning
- ConvNets are SOTA for many image-related tasks (super-resolution, image reconstruction, denoising, etc.)
- They are usually trained on huge datasets.
- It can be assumed that large training datasets is the reason of the great performance, but learning isn’t a sufficient explanation
- Generalization requires the structure of the network to resonate with the structure of the data
Their Method
- Basically, the authors fit a randomly initialized ConvNet on the noisy image and use it for the generation task.
- The Task
  - They consider inverse tasks such as denoising, super-resolution, and inpainting.
  - Expressed as energy-minimization problem: \(x^* = \min_x E(x; x_0) + R(x)\)
    - \(E(x;x_0)\): Task dependendt data term (e.g. How similiar the reconstructed image is to the noisy one?)
    - \(R(x)\) is a requalization term (e.g. the probability \(x\) occurs in nature as determined by the prior of a pretrained model)
    - \(x_0\) is the noisy/low-resolution/occulded image
    - \(x^*\) is the model’s predicted clean/high-resolution/inpainted image
  - Deep networks are applied by mapping a random code \(z\) to an image \(x\): \(x = f_\theta (z)\)
- Their method
  - Instead of finding the parameters by training on a large dataset, they learn to map \(z\) to the given \(x_0\)
  - \(\theta^* = \arg\min_{\theta} E(f_\theta(z); x_0)\)
  - and they set the regualizer to zero. Thus,
  - \(x^* = f_{\theta ^ *}(z)\)
- Why it works?
  - It is expected that their model learns the noise in \(x_0\)
  - This doesn’t happen because the ConvNet architecture has high resistance to learning noise and low resistance to learning the signal
  - => the model learns the signal before it learns the noise
  - => They stop training early before the model learns the noise
Applications
- They apply their model to multiple tasks including denoising, super-resolution, inpainting, etc.
- In all tasks, the model outperforms or is very close to the SOTA no-training models and is close to those that train on large datasets
To Summarize, ConvNets are really good image priors reagardless of the training data