Timezone: »
Vision-language representation learning models (e.g., CLIP) have achieved state-of-the-art performance on various downstream tasks, which usually need large-scale training data to learn discriminative representation. Recent progress on generative diffusion models (e.g., DALL-E 2) has demonstrated that diverse high-quality samples can be synthesized by randomly sampling from generative distribution. By virtue of generative capability in this paper, we propose a novel vision-language Representation Learning method with diffusion-based Embedding Generation (RLEG), which exploits diffusion models to generate feature embedding online for learning effective vision-language representation. Specifically, we first adopt image and text encoders to extract the corresponding embeddings. Secondly, pretrained diffusion-based embedding generators are harnessed to transfer the embedding modality online between vision and language domains. The embeddings generated from the generators are then served as augmented embedding-level samples, which are applied to contrastive learning with the variant of the CLIP framework. Experimental results show that the proposed method could learn effective representation and achieve state-of-the-art performance on various tasks including image classification, image-text retrieval, object detection, semantic segmentation, and text-conditional image generation.
Author Information
Liming Zhao (Alibaba Group)

Liming Zhao is currently a research scientist at Alibaba DAMO Academy. His current research interests are primarily in computer vision and machine learning, especially large-scale multimodal foundation model pretraining. His research works have been published in journals/conferences such as TPAMI, TIP, CVPR, ICCV, ECCV, ICML, AAAI and IJCAI. He serves as a reviewer for top journals (IEEE TIP and ACM TIST) and conferences (CVPR, ICCV, AAAI, IJCAI) .
Kecheng Zheng (Ant Research)
Yun Zheng (Alibaba Group)
Deli Zhao (Alibaba Group)
Jingren Zhou (Alibaba Group)
More from the Same Authors
-
2023 : Latent Space Editing in Transformer-Based Flow Matching »
Tao Hu · David Zhang · Meng Tang · Pascal Mettes · Deli Zhao · Cees Snoek -
2023 Poster: Cones: Concept Neurons in Diffusion Models for Customized Generation »
Zhiheng Liu · Ruili Feng · Kai Zhu · Yifei Zhang · Kecheng Zheng · Yu Liu · Deli Zhao · Jingren Zhou · Yang Cao -
2023 Poster: Composer: Creative and Controllable Image Synthesis with Composable Conditions »
Lianghua Huang · Di Chen · Yu Liu · Yujun Shen · Deli Zhao · Jingren Zhou -
2023 Oral: Cones: Concept Neurons in Diffusion Models for Customized Generation »
Zhiheng Liu · Ruili Feng · Kai Zhu · Yifei Zhang · Kecheng Zheng · Yu Liu · Deli Zhao · Jingren Zhou · Yang Cao -
2023 Poster: mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video »
Haiyang Xu · Qinghao Ye · Ming Yan · Yaya Shi · Jiabo Ye · yuanhong xu · Chenliang Li · Bin Bi · Qi Qian · Wei Wang · Guohai Xu · Ji Zhang · Songfang Huang · Fei Huang · Jingren Zhou -
2022 Poster: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework »
Peng Wang · An Yang · Rui Men · Junyang Lin · Shuai Bai · Zhikang Li · Jianxin Ma · Chang Zhou · Jingren Zhou · Hongxia Yang -
2022 Spotlight: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework »
Peng Wang · An Yang · Rui Men · Junyang Lin · Shuai Bai · Zhikang Li · Jianxin Ma · Chang Zhou · Jingren Zhou · Hongxia Yang -
2022 Poster: Principled Knowledge Extrapolation with GANs »
Ruili Feng · Jie Xiao · Kecheng Zheng · Deli Zhao · Jingren Zhou · Qibin Sun · Zheng-Jun Zha -
2022 Spotlight: Principled Knowledge Extrapolation with GANs »
Ruili Feng · Jie Xiao · Kecheng Zheng · Deli Zhao · Jingren Zhou · Qibin Sun · Zheng-Jun Zha -
2022 Poster: Region-Based Semantic Factorization in GANs »
Jiapeng Zhu · Yujun Shen · Yinghao Xu · Deli Zhao · Qifeng Chen -
2022 Spotlight: Region-Based Semantic Factorization in GANs »
Jiapeng Zhu · Yujun Shen · Yinghao Xu · Deli Zhao · Qifeng Chen -
2021 Poster: Learning to Rehearse in Long Sequence Memorization »
Zhu Zhang · Chang Zhou · Jianxin Ma · Zhijie Lin · Jingren Zhou · Hongxia Yang · Zhou Zhao -
2021 Spotlight: Learning to Rehearse in Long Sequence Memorization »
Zhu Zhang · Chang Zhou · Jianxin Ma · Zhijie Lin · Jingren Zhou · Hongxia Yang · Zhou Zhao -
2021 Poster: Understanding Noise Injection in GANs »
Ruili Feng · Deli Zhao · Zheng-Jun Zha -
2021 Spotlight: Understanding Noise Injection in GANs »
Ruili Feng · Deli Zhao · Zheng-Jun Zha -
2021 Poster: Uncertainty Principles of Encoding GANs »
Ruili Feng · Zhouchen Lin · Jiapeng Zhu · Deli Zhao · Jingren Zhou · Zheng-Jun Zha -
2021 Spotlight: Uncertainty Principles of Encoding GANs »
Ruili Feng · Zhouchen Lin · Jiapeng Zhu · Deli Zhao · Jingren Zhou · Zheng-Jun Zha