Timezone: »
Much recent progress in applications of machine learning models to NLP has been driven by benchmarks that evaluate models across a wide variety of tasks. However, these broad-coverage benchmarks have been mostly limited to English, and despite an increasing interest in multilingual models, a benchmark that enables the comprehensive evaluation of such methods on a diverse range of languages and tasks is still missing. To this end, we introduce the Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME) benchmark, a multi-task benchmark for evaluating the cross-lingual generalization capabilities of multilingual representations across 40 languages and 9 tasks. We demonstrate that while models tested on English reach human performance on many tasks, there is still a sizable gap in the performance of cross-lingually transferred models, particularly on syntactic and sentence retrieval tasks. There is also a wide spread of results across languages. We will release the benchmark to encourage research on cross-lingual learning methods that transfer linguistic knowledge across a diverse and representative set of languages and tasks.
Author Information
Junjie Hu (Carnegie Mellon University)
Sebastian Ruder (DeepMind)
Aditya Siddhant (Google Research)
Graham Neubig (Carnegie Mellon University)
Orhan Firat (Google)
Melvin Johnson (Google)
More from the Same Authors
-
2023 : Interactive-Chain-Prompting: Ambiguity Resolution for Crosslingual Conditional Generation with Interaction »
Jonathan Pilault · Xavier Garcia · Arthur Brazinskas · Orhan Firat -
2023 Oral: Mu$^2$SLAM: Multitask, Multilingual Speech and Language Models »
Yong Cheng · Yu Zhang · Melvin Johnson · Wolfgang Macherey · Ankur Bapna -
2023 Oral: Cross-Modal Fine-Tuning: Align then Refine »
Junhong Shen · Liam Li · Lucio Dery · Corey Staten · Mikhail Khodak · Graham Neubig · Ameet Talwalkar -
2023 Poster: The Unreasonable Effectiveness of Few-shot Learning for Machine Translation »
Xavier Garcia · Yamini Bansal · Colin Cherry · George Foster · Maxim Krikun · Melvin Johnson · Orhan Firat -
2023 Poster: Cross-Modal Fine-Tuning: Align then Refine »
Junhong Shen · Liam Li · Lucio Dery · Corey Staten · Mikhail Khodak · Graham Neubig · Ameet Talwalkar -
2023 Poster: Scaling Laws for Multilingual Neural Machine Translation »
Patrick Fernandes · Behrooz Ghorbani · Xavier Garcia · Markus Freitag · Orhan Firat -
2023 Poster: Mu$^2$SLAM: Multitask, Multilingual Speech and Language Models »
Yong Cheng · Yu Zhang · Melvin Johnson · Wolfgang Macherey · Ankur Bapna -
2023 Poster: PAL: Program-aided Language Models »
Luyu Gao · Aman Madaan · Shuyan Zhou · Uri Alon · Pengfei Liu · Yiming Yang · Jamie Callan · Graham Neubig -
2023 Poster: Why do Nearest Neighbor Language Models Work? »
Frank Xu · Uri Alon · Graham Neubig -
2022 Poster: Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval »
Uri Alon · Frank Xu · Junxian He · Sudipta Sengupta · Dan Roth · Graham Neubig -
2022 Poster: GLaM: Efficient Scaling of Language Models with Mixture-of-Experts »
Nan Du · Yanping Huang · Andrew Dai · Simon Tong · Dmitry Lepikhin · Yuanzhong Xu · Maxim Krikun · Yanqi Zhou · Adams Wei Yu · Orhan Firat · Barret Zoph · William Fedus · Maarten Bosma · Zongwei Zhou · Tao Wang · Emma Wang · Kellie Webster · Marie Pellat · Kevin Robinson · Kathleen Meier-Hellstern · Toju Duke · Lucas Dixon · Kun Zhang · Quoc Le · Yonghui Wu · Zhifeng Chen · Claire Cui -
2022 Poster: Data Scaling Laws in NMT: The Effect of Noise and Architecture »
Yamini Bansal · Behrooz Ghorbani · Ankush Garg · Biao Zhang · Colin Cherry · Behnam Neyshabur · Orhan Firat -
2022 Spotlight: Data Scaling Laws in NMT: The Effect of Noise and Architecture »
Yamini Bansal · Behrooz Ghorbani · Ankush Garg · Biao Zhang · Colin Cherry · Behnam Neyshabur · Orhan Firat -
2022 Spotlight: GLaM: Efficient Scaling of Language Models with Mixture-of-Experts »
Nan Du · Yanping Huang · Andrew Dai · Simon Tong · Dmitry Lepikhin · Yuanzhong Xu · Maxim Krikun · Yanqi Zhou · Adams Wei Yu · Orhan Firat · Barret Zoph · William Fedus · Maarten Bosma · Zongwei Zhou · Tao Wang · Emma Wang · Kellie Webster · Marie Pellat · Kevin Robinson · Kathleen Meier-Hellstern · Toju Duke · Lucas Dixon · Kun Zhang · Quoc Le · Yonghui Wu · Zhifeng Chen · Claire Cui -
2022 Spotlight: Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval »
Uri Alon · Frank Xu · Junxian He · Sudipta Sengupta · Dan Roth · Graham Neubig -
2022 Poster: Symmetric Machine Theory of Mind »
Melanie Sclar · Graham Neubig · Yonatan Bisk -
2022 Poster: Examining Scaling and Transfer of Language Model Architectures for Machine Translation »
Biao Zhang · Behrooz Ghorbani · Ankur Bapna · Yong Cheng · Xavier Garcia · Jonathan Shen · Orhan Firat -
2022 Spotlight: Symmetric Machine Theory of Mind »
Melanie Sclar · Graham Neubig · Yonatan Bisk -
2022 Spotlight: Examining Scaling and Transfer of Language Model Architectures for Machine Translation »
Biao Zhang · Behrooz Ghorbani · Ankur Bapna · Yong Cheng · Xavier Garcia · Jonathan Shen · Orhan Firat -
2021 Poster: Examining and Combating Spurious Features under Distribution Shift »
Chunting Zhou · Xuezhe Ma · Paul Michel · Graham Neubig -
2021 Poster: Few-shot Language Coordination by Modeling Theory of Mind »
Hao Zhu · Graham Neubig · Yonatan Bisk -
2021 Spotlight: Few-shot Language Coordination by Modeling Theory of Mind »
Hao Zhu · Graham Neubig · Yonatan Bisk -
2021 Spotlight: Examining and Combating Spurious Features under Distribution Shift »
Chunting Zhou · Xuezhe Ma · Paul Michel · Graham Neubig -
2020 Poster: Optimizing Data Usage via Differentiable Rewards »
Xinyi Wang · Hieu Pham · Paul Michel · Antonios Anastasopoulos · Jaime Carbonell · Graham Neubig