Timezone: »
Attention mechanisms have shown promising results in sequence modeling tasks that require long-term memory. Recent work investigated mechanisms to reduce the computational cost of preserving and storing memories. However, not all content in the past is equally important to remember. We propose Expire-Span, a method that learns to retain the most important information and expire the irrelevant information. This forgetting of memories enables Transformers to scale to attend over tens of thousands of previous timesteps efficiently, as not all states from previous timesteps are preserved. We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve strong performance on reinforcement learning tasks specifically designed to challenge this functionality. Next, we show that Expire-Span can scale to memories that are tens of thousands in size, setting a new state of the art on incredibly long context tasks such as character-level language modeling and a frame-by-frame moving objects task. Finally, we analyze the efficiency of Expire-Span compared to existing approaches and demonstrate that it trains faster and uses less memory.
Author Information
Sainbayar Sukhbaatar (Facebook AI Research)
Da JU (Facebook AI Research)
Spencer Poff (Facebook)
Stephen Roller (Facebook)
Arthur Szlam (Facebook)
Jason Weston (FAIR)
Angela Fan (Facebook AI Research)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Oral: Not All Memories are Created Equal: Learning to Forget by Expiring »
Tue. Jul 20th 02:00 -- 02:20 PM Room
More from the Same Authors
-
2023 Poster: Scaling Laws for Generative Mixed-Modal Language Models »
Armen Aghajanyan · LILI YU · Alexis Conneau · Wei-Ning Hsu · Karen Hambardzumyan · Susan Zhang · Stephen Roller · Naman Goyal · Omer Levy · Luke Zettlemoyer -
2021 Poster: CURI: A Benchmark for Productive Concept Learning Under Uncertainty »
Shanmukha Ramakrishna Vedantam · Arthur Szlam · Maximilian Nickel · Ari Morcos · Brenden Lake -
2021 Spotlight: CURI: A Benchmark for Productive Concept Learning Under Uncertainty »
Shanmukha Ramakrishna Vedantam · Arthur Szlam · Maximilian Nickel · Ari Morcos · Brenden Lake -
2020 : Collaboration in Situated Instruction Following Q&A »
Yoav Artzi · Arthur Szlam -
2020 : Collaborative Construction and Communication in Minecraft Q&A »
Julia Hockenmaier · Arthur Szlam -
2020 Workshop: Workshop on Learning in Artificial Open Worlds »
Arthur Szlam · Katja Hofmann · Ruslan Salakhutdinov · Noboru Kuno · William Guss · Kavya Srinet · Brandon Houghton -
2020 Poster: Fast Adaptation to New Environments via Policy-Dynamics Value Functions »
Roberta Raileanu · Max Goldstein · Arthur Szlam · Facebook Rob Fergus -
2018 Poster: Optimizing the Latent Space of Generative Networks »
Piotr Bojanowski · Armand Joulin · David Lopez-Paz · Arthur Szlam -
2018 Poster: Composable Planning with Attributes »
Amy Zhang · Sainbayar Sukhbaatar · Adam Lerer · Arthur Szlam · Facebook Rob Fergus -
2018 Oral: Composable Planning with Attributes »
Amy Zhang · Sainbayar Sukhbaatar · Adam Lerer · Arthur Szlam · Facebook Rob Fergus -
2018 Oral: Optimizing the Latent Space of Generative Networks »
Piotr Bojanowski · Armand Joulin · David Lopez-Paz · Arthur Szlam -
2017 Poster: Language Modeling with Gated Convolutional Networks »
Yann Dauphin · Angela Fan · Michael Auli · David Grangier -
2017 Talk: Language Modeling with Gated Convolutional Networks »
Yann Dauphin · Angela Fan · Michael Auli · David Grangier