Timezone: »
Obtaining high quality data for training classification models is challenging when sufficient data covering the real manifold is difficult to find in the wild. In this paper, we present Diffusion Inversion, a dataset-agnostic augmentation strategy for training classification models. Diffusion Inversion is a simple yet effective method that leverages the powerful pretrained Stable Diffusion model to generate synthetic datasets that ensure coverage of the original data manifold while also generating novel samples that extrapolate the training domain to allow for better generalization. We ensure data coverage by inverting each image in the original set to its condition vector in the latent space of Stable Diffusion. We ensure sample diversity by adding noise to the learned embeddings or performing interpolation in the latentspace, and using the new vector as the conditioning signal. The method produces high-quality and diverse samples, consistently outperforming generic prompt-based steering methods and KNN retrieval baselines across a wide range of common and specialized datasets. Furthermore, we demonstrate the compatibility of our approach with widely-used data augmentation techniques, and assess the reliability of the generated data in both supporting various neural architectures and enhancing few-shot learning performance.
Author Information
Yongchao Zhou (University of Toronto)
Hshmat Sahak (University of Toronto)
Jimmy Ba (University of Toronto / xAI)
More from the Same Authors
-
2021 : On Low Rank Training of Deep Neural Networks »
Siddhartha Kamalakara · Acyr Locatelli · Bharat Venkitesh · Jimmy Ba · Yarin Gal · Aidan Gomez -
2022 : You Can’t Count on Luck: Why Decision Transformers Fail in Stochastic Environments »
Keiran Paster · Sheila McIlraith · Jimmy Ba -
2023 : Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective »
Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu -
2023 : Training on Thin Air: Improve Image Classification with Generated Data »
Yongchao Zhou · Hshmat Sahak · Jimmy Ba -
2023 : A Generative Model for Text Control in Minecraft »
Shalev Lifshitz · Keiran Paster · Harris Chan · Jimmy Ba · Sheila McIlraith -
2023 : Calibrating Language Models via Augmented Prompt Ensembles »
Mingjian Jiang · Yangjun Ruan · Sicong Huang · Saifei Liao · Silviu Pitis · Roger Grosse · Jimmy Ba -
2023 : A Generative Model for Text Control in Minecraft »
Shalev Lifshitz · Keiran Paster · Harris Chan · Jimmy Ba · Sheila McIlraith -
2023 Poster: TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation »
Zhaoyan Liu · Noël Vouitsis · Satya Krishna Gorti · Jimmy Ba · Gabriel Loaiza-Ganem -
2021 Poster: LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning »
Yuhuai Wu · Markus Rabe · Wenda Li · Jimmy Ba · Roger Grosse · Christian Szegedy -
2021 Spotlight: LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning »
Yuhuai Wu · Markus Rabe · Wenda Li · Jimmy Ba · Roger Grosse · Christian Szegedy -
2020 Poster: Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning »
Silviu Pitis · Harris Chan · Stephen Zhao · Bradly Stadie · Jimmy Ba -
2020 Poster: Improving Transformer Optimization Through Better Initialization »
Xiao Shi Huang · Felipe Perez · Jimmy Ba · Maksims Volkovs