Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Next Generation of Sequence Modeling Architectures

Associative Recurrent Memory Transformer

Ivan Rodkin · Yuri Kuratov · Aidar Bulatov · Mikhail Burtsev


Abstract:

This paper addresses the challenge of creating a neural architecture for very long sequences thatrequires constant time for processing new information at each time step. Our approach, AssociativeRecurrent Memory Transformer (ARMT), is based on transformer self-attention for local contextand segment-level recurrence for storage of task specific information distributed over a long context.We demonstrate that ARMT outperfors existing alternatives in associative retrieval tasks and sets anew performance record in the recent BABILong multi-task long-context benchmark by answeringsingle-fact questions over 50 million tokens with an accuracy of 79.9%.

Chat is not available.