Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Next Generation of Sequence Modeling Architectures

Needle in the Haystack for Memory Based Large Language Models

Elliot Nelson · Soham Dan · Georgios Kollias · Payel Das · Subhajit Chaudhury


Abstract:

In this paper, we test Larimar, a recently proposed language model architecture which uses an external associative memory, on several long-context recall tasks, including passkey and needle-in-the-haystack tests. We demonstrate that the external memory can be adapted at test time to handle contexts much longer than those seen during training, while keeping memory readouts recognizable to the trained model and without increasing GPU memory footprint. Compared to alternative architectures for long-context recall tasks with modelsof a comparable parameter count, Larimar is able to maintain strong performance without any task-specific training.

Chat is not available.