Models to convert black and white video to color

Jason Antic produced DeOldify, a Black and White image colorizing library. The techniques from these two publications were mostly used in this library: Self-Attention Generative Adversarial Network and Two Time-Scale Update Rule. DeOldify also developed the NoGAN technique to overcome some of the key challenges in creating hyper-realistic colorization photos and movies, and we’ll see everything in our next article part with python code implementation of coloring black and white images and videos with various models.

They’ve built a public Web API for non-coders to use to color their photos using the Drag and Drop approach; here’s an example of their free website API, which is accurate enough but lacks some details. According to Jason, it is not technically papered, and the technique is a BlackBox. His best guess is that NoGAN saves time on GAN training while also giving good colorization, which would otherwise take days.

For various use cases, DeOldify offers three primary models. Each of these has its own set of drawbacks and advantages. Models to convert black and white video to color are as follows:

  • Model for the Arts

This model produces rich color and detailed photographs, but it requires a lot of tweaking to get the best results. To acquire the most realistic colorize image, you must change the rendering resolution and parameters. On the decoder side, the model employs a resnet34 backbone on a UNet, with a focus on layer depth. It is also trained using NoGAN on five critic pretrain/GAN cycle repeats.

  • The dependable model

This model saves the finest portrait and landscape results. It ensures that nothing is too colored and that the majority of the image, such as faces and limbs, remains grey. Although it is less hyper-realistic, it ensures that nothing appears to be more colored. On the decoder side, it uses a resnet101 backbone on a UNet with a focus on layer breadth.

  • Model in Video

As the name implies, it’s a model for coloring videos, and we’ll see each of these models in action in a Python context. It produces video that is smooth, consistent, and flicker-free. In the context of architecture, this model is similar to a stable model, however, it differs in training. DeOldify is trained once at 192px using only the first generator/critic pretrain/GAN NoGAN training on 2.2 percent of the Imagenet dataset.