Last Updated on 22/11/2021 by Sanskriti
While photo boosting in movies and TV programmes is frequently mocked for being unrealistic, real-world photo enhancing research is increasingly seeping into science fiction. Take a peek at Google’s most recent AI photo upscaling technology.
Google researchers in the company’s Brain Team discuss recent achievements in picture super-resolution in an article titled “High Fidelity Image Generation Using Diffusion Models” published on the Google AI Blog (and noticed by DPR).
A machine learning model is taught to transform a low-resolution photo into a detailed high-resolution photo in image super-resolution, and possible applications vary from recovering old family images to enhancing medical imaging.
Google has been experimenting with a notion known as “diffusion models,” which was initially presented in 2015 but has been overshadowed by a family of deep learning approaches known as “deep generative models” until recently. When people are asked to assess, the company’s findings using this new technique outperform previous technology.
SR3, or Super-Resolution through Repeated Refinement, is the first method. The technical explanation is as follows:
“SR3 is a super-resolution diffusion model that takes as input a low-resolution image, and builds a corresponding high-resolution image from pure noise,” Google writes. “The model is trained on an image corruption process in which noise is progressively added to a high-resolution image until only pure noise remains.
“It then learns to reverse this process, beginning from pure noise and progressively removing noise to reach a target distribution through the guidance of the input low-resolution image.”
Upscaling portraits and natural pictures has proven to be a breeze using SR3. When used to conduct 8x upscaling on faces, it has a “confusion rate” of over 50%, whereas previous techniques only reach 34%, implying that the results are photo-realistic.
When Google realised how successful SR3 was in upscaling photographs, it developed a second method called CDM, or class-conditional diffusion model.
“CDM is a class-conditional diffusion model trained on ImageNet data to generate high-resolution natural images,” Google writes. “Since ImageNet is a difficult, high-entropy dataset, we built CDM as a cascade of multiple diffusion models. This cascade approach involves chaining together multiple generative models over several spatial resolutions: one diffusion model that generates data at a low resolution, followed by a sequence of SR3 super-resolution diffusion models that gradually increase the resolution of the generated image to the highest resolution.”
A series of samples demonstrating low-resolution pictures upscaled in a cascade has been provided by Google. A 32-bit image may be upgraded to 64-bit and subsequently 256-bit. Upscaling a 6464 picture to 256256 and then 10241024, for example, is possible.
The results are amazing, and despite certain flaws (such as holes in the frames of glasses), most viewers would mistake the finished pictures for real original photographs at first sight.
“With SR3 and CDM, we have pushed the performance of diffusion models to the state-of-the-art on super-resolution and class-conditional ImageNet generation benchmarks,” Google researchers write. “We are excited to further test the limits of diffusion models for a wide variety of generative modeling problems.”