Poster
in
Workshop: Next Generation of Sequence Modeling Architectures
Associative Recurrent Memory Transformer
Ivan Rodkin · Yuri Kuratov · Aidar Bulatov · Mikhail Burtsev
Abstract:
This paper addresses the challenge of creating a neural architecture for very long sequences thatrequires constant time for processing new information at each time step. Our approach, AssociativeRecurrent Memory Transformer (ARMT), is based on transformer self-attention for local contextand segment-level recurrence for storage of task specific information distributed over a long context.We demonstrate that ARMT outperfors existing alternatives in associative retrieval tasks and sets anew performance record in the recent BABILong multi-task long-context benchmark by answeringsingle-fact questions over 50 million tokens with an accuracy of 79.9%.
Chat is not available.