Timezone: »
Massive web datasets play a key role in the success of large vision-language models such as CLIP and Flamingo. However, the raw data is noisy, and existing methods to reduce noise often come at the expense of data diversity. Our work focuses on caption quality as one major source of noise, and studies the effectiveness of synthetic captions in increasing the utility of web-scraped datapoints with poorly aligned captions. Through exploring different mixing strategies for raw and synthetic captions, we achieve state-of-the-art performance at the small and medium scales of the DataComp benchmark (Gadre et al., 2023), improving ImageNet accuracy by 2% and average accuracy (over 38 tasks) by 4% compared to the previous best baseline, given a candidate pool of 128M image-text pairs. The best-performing approach is also 2x better at Flickr and MS-COCO retrieval. We then analyze what makes synthetic captions so effective, and explore the impact of image captioning model and sampling temperature on the resulting training set. Overall our findings demonstrate the potential of leveraging image-captioning models as a way to improve multimodal datasets, as (i) we show that progress in image captioning models can translate to better captions and boost accuracy, and (ii) this unlocks a plethora of web images without accompanying captions that can now be used for training.
Author Information
Thao Nguyen (University of Washington)
Gabriel Ilharco (University of Washington)
Sewoong Oh (University of Washington)
Ludwig Schmidt (University of Washington)
More from the Same Authors
-
2022 : How well do contrastively trained models transfer? »
M. Moein Shariatnia · Rahim Entezari · Mitchell Wortsman · Olga Saukh · Ludwig Schmidt -
2022 : On the Connection between Pre-training Data Diversity and Robustness »
Vivek Ramanujan · Vivek Ramanujan · Thao Nguyen · Thao Nguyen · Ludwig Schmidt · Ali Farhadi · Ali Farhadi -
2022 : Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP »
Thao Nguyen -
2022 Poster: Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time »
Mitchell Wortsman · Gabriel Ilharco · Samir Gadre · Becca Roelofs · Raphael Gontijo Lopes · Ari Morcos · Hongseok Namkoong · Ali Farhadi · Yair Carmon · Simon Kornblith · Ludwig Schmidt -
2022 Poster: MAML and ANIL Provably Learn Representations »
Liam Collins · Aryan Mokhtari · Sewoong Oh · Sanjay Shakkottai -
2022 Spotlight: Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time »
Mitchell Wortsman · Gabriel Ilharco · Samir Gadre · Becca Roelofs · Raphael Gontijo Lopes · Ari Morcos · Hongseok Namkoong · Ali Farhadi · Yair Carmon · Simon Kornblith · Ludwig Schmidt -
2022 Spotlight: MAML and ANIL Provably Learn Representations »
Liam Collins · Aryan Mokhtari · Sewoong Oh · Sanjay Shakkottai -
2022 Poster: De novo mass spectrometry peptide sequencing with a transformer model »
Melih Yilmaz · William Fondrie · Wout Bittremieux · Sewoong Oh · William Noble -
2022 Poster: Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP) »
Alex Fang · Gabriel Ilharco · Mitchell Wortsman · Yuhao Wan · Vaishaal Shankar · Achal Dave · Ludwig Schmidt -
2022 Spotlight: Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP) »
Alex Fang · Gabriel Ilharco · Mitchell Wortsman · Yuhao Wan · Vaishaal Shankar · Achal Dave · Ludwig Schmidt -
2022 Spotlight: De novo mass spectrometry peptide sequencing with a transformer model »
Melih Yilmaz · William Fondrie · Wout Bittremieux · Sewoong Oh · William Noble -
2021 Poster: Defense against backdoor attacks via robust covariance estimation »
Jonathan Hayase · Weihao Kong · Raghav Somani · Sewoong Oh -
2021 Spotlight: Defense against backdoor attacks via robust covariance estimation »
Jonathan Hayase · Weihao Kong · Raghav Somani · Sewoong Oh -
2021 Poster: KO codes: inventing nonlinear encoding and decoding for reliable wireless communication via deep-learning »
Ashok Vardhan Makkuva · Xiyang Liu · Mohammad Vahid Jamali · Hessam Mahdavifar · Sewoong Oh · Pramod Viswanath -
2021 Poster: Accuracy on the Line: on the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization »
John Miller · Rohan Taori · Aditi Raghunathan · Shiori Sagawa · Pang Wei Koh · Vaishaal Shankar · Percy Liang · Yair Carmon · Ludwig Schmidt -
2021 Spotlight: Accuracy on the Line: on the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization »
John Miller · Rohan Taori · Aditi Raghunathan · Shiori Sagawa · Pang Wei Koh · Vaishaal Shankar · Percy Liang · Yair Carmon · Ludwig Schmidt -
2021 Spotlight: KO codes: inventing nonlinear encoding and decoding for reliable wireless communication via deep-learning »
Ashok Vardhan Makkuva · Xiyang Liu · Mohammad Vahid Jamali · Hessam Mahdavifar · Sewoong Oh · Pramod Viswanath -
2020 Poster: Optimal transport mapping via input convex neural networks »
Ashok Vardhan Makkuva · Amirhossein Taghvaei · Sewoong Oh · Jason Lee -
2020 Poster: InfoGAN-CR and ModelCentrality: Self-supervised Model Training and Selection for Disentangling GANs »
Zinan Lin · Kiran Thekumparampil · Giulia Fanti · Sewoong Oh -
2020 Poster: Meta-learning for Mixed Linear Regression »
Weihao Kong · Raghav Somani · Zhao Song · Sham Kakade · Sewoong Oh