Timezone: »
While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind. To get us closer to general self-supervised learning, we present data2vec, a framework that uses the same learning method for either speech, NLP or computer vision. The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture. Instead of predicting modality-specific targets such as words, visual tokens or units of human speech which are local in nature, data2vec predicts contextualized latent representations that contain information from the entire input. Experiments on the major benchmarks of speech recognition, image classification, and natural language understanding demonstrate a new state of the art or competitive performance to predominant approaches.
Author Information
Alexei Baevski (Foundational AI Research (Meta))
Wei-Ning Hsu (FAIR)
Qiantong Xu (Sambanova Systems)
Arun Babu
Jiatao Gu (Facebook AI Research)
Michael Auli (Meta AI)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Oral: data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language »
Tue. Jul 19th 06:05 -- 06:25 PM Room Room 327 - 329
More from the Same Authors
-
2021 : Latency-Aware Neural Architecture Search with Multi-Objective Bayesian Optimization »
David Eriksson · Pierce Chuang · Samuel Daulton · Peng Xia · Akshat Shrivastava · Arun Babu · Shicong Zhao · Ahmed A Aly · Ganesh Venkatesh · Maximilian Balandat -
2023 Poster: Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language »
Alexei Baevski · Arun Babu · Wei-Ning Hsu · Michael Auli -
2023 Oral: Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language »
Alexei Baevski · Arun Babu · Wei-Ning Hsu · Michael Auli -
2022 : Panel Discussion »
Mirco Ravanelli · Chris Donahue · Zhifeng Kong · Wei-Ning Hsu · Rachel Manzelli · Sadie Allen -
2022 : Self-supervised learning for speech generation »
Wei-Ning Hsu -
2020 : Invited Talk: Self-supervised learning of speech representations with wav2vec »
Alexei Baevski -
2020 Poster: Non-autoregressive Machine Translation with Disentangled Context Transformer »
Jungo Kasai · James Cross · Marjan Ghazvininejad · Jiatao Gu -
2019 Poster: Mixture Models for Diverse Machine Translation: Tricks of the Trade »
Tianxiao Shen · Myle Ott · Michael Auli · Marc'Aurelio Ranzato -
2019 Oral: Mixture Models for Diverse Machine Translation: Tricks of the Trade »
Tianxiao Shen · Myle Ott · Michael Auli · Marc'Aurelio Ranzato -
2018 Poster: Analyzing Uncertainty in Neural Machine Translation »
Myle Ott · Michael Auli · David Grangier · Marc'Aurelio Ranzato -
2018 Oral: Analyzing Uncertainty in Neural Machine Translation »
Myle Ott · Michael Auli · David Grangier · Marc'Aurelio Ranzato -
2017 Poster: Convolutional Sequence to Sequence Learning »
Jonas Gehring · Michael Auli · David Grangier · Denis Yarats · Yann Dauphin -
2017 Poster: Language Modeling with Gated Convolutional Networks »
Yann Dauphin · Angela Fan · Michael Auli · David Grangier -
2017 Talk: Convolutional Sequence to Sequence Learning »
Jonas Gehring · Michael Auli · David Grangier · Denis Yarats · Yann Dauphin -
2017 Talk: Language Modeling with Gated Convolutional Networks »
Yann Dauphin · Angela Fan · Michael Auli · David Grangier