About Model
Karlo is an advanced text-conditional image generation model that builds upon OpenAI's powerful unCLIP architecture. It introduces significant improvements over standard super-resolution models by enhancing the image resolution from 64px to an impressive 256px, specifically targeting the recovery of high-frequency details through a limited number of denoising steps.
The alpha version of Karlo has been meticulously trained on a vast dataset comprising 115 million image-text pairs, including the COYO-100M high-quality subset, CC3M, and CC12M.
To achieve its exceptional upscaling capabilities, Karlo employs a two-step process. Initially, the standard super-resolution (SR) module, trained using the DDPM objective, efficiently upscales the images from 64px to 256px in just six denoising steps, utilizing a rescaling technique. Subsequently, the additional fine-tuned SR module, trained using VQ-GAN-style loss, performs a final reverse step to recover high-frequency details. This combined approach has proven highly effective in upscaling low-resolution images within a minimal number of reverse steps.
