Poster
in
Workshop: Next Generation of Sequence Modeling Architectures
Needle in the Haystack for Memory Based Large Language Models
Elliot Nelson · Soham Dan · Georgios Kollias · Payel Das · Subhajit Chaudhury
Abstract:
In this paper, we test Larimar, a recently proposed language model architecture which uses an external associative memory, on several long-context recall tasks, including passkey and needle-in-the-haystack tests. We demonstrate that the external memory can be adapted at test time to handle contexts much longer than those seen during training, while keeping memory readouts recognizable to the trained model and without increasing GPU memory footprint. Compared to alternative architectures for long-context recall tasks with modelsof a comparable parameter count, Larimar is able to maintain strong performance without any task-specific training.
Chat is not available.