Rejection Sampling IMLE: Designing Priors for Better Few-Shot Image Synthesis

Simon Fraser University
European Conference on Computer Vision (ECCV) 2024


TLDR: We identity an issue with the current IMLE-based methods and propose a novel approach to address it. We achieve state-of-the-art performance on few-shot image generation tasks.

Abstract

An emerging area of research aims to learn deep generative models with limited training data. Implicit Maximum Likelihood Estimation (IMLE), a recent technique, successfully addresses the mode collapse issue of GANs and has been adapted to the few-shot setting, achieving state-of-the-art performance. However, current IMLE-based approaches encounter challenges due to inadequate correspondence between the latent codes selected for training and those drawn during inference. This results in suboptimal test-time performance. We theoretically show a way to address this issue and propose RS-IMLE, a novel approach that changes the prior distribution used for training. This leads to substantially higher quality image generation compared to existing GAN and IMLE-based methods, as validated by comprehensive experiments conducted on nine few-shot image datasets.

Methodology

We identity an issue
We can also analyze the latent space of model trained by the respective objectives. Dots represent the points selected by the model over the course of training, with dots of the same colour belonging to the same data-point.


Frechet Inception Distance

We present the FID scores computed for all the datasets across different methods. Lower FID scores indicates that the distribution of generated images is closer to the distribution of real images.
Our method performs significantly better compared to baselines. We report an average improvement of 45.9% over the best baseline.

Dataset FastGAN FakeCLR FreGAN ReGAN AdaIMLE RS-IMLE
Obama 41.1 29.9 33.4 45.7 25.0 14
Grumpy Cat 26.6 20.6 24.9 27.3 19.1 11.5
Panda 10.0 8.8 9.0 12.6 7.6 3.5
FFHQ-100 54.2 62.1 50.5 87.4 33.2 12.9
Cat 35.1 27.4 31.0 42.1 24.9 15.9
Dog 50.7 44.4 47.9 57.2 43.0 23.1
Anime 69.8 77.7 59.8 110.8 65.8 35.8
Skulls 109.6 106.5 163.3 130.7 81.9 51.1
Shells 120.9 148.4 169.3 236.1 108.5 55.4

Visual Recall Test

First column is the query image from the dataset.
Subsequent columns are the samples produced by different methods that are closest to the query image in LPIPS feature space.
The samples produced by our method are closer to the query images compared to the baselines, while being sufficiently diverse.

You can find more examples in the main paper and the supplementary material.

BibTeX

BibTex Code Here