Timezone: »
Language models (LMs) compute the probability of a text by sequentially computing a representation of an already-seen context and using this representation to predict the next word. Currently, most LMs calculate these representations through a neural network consuming the immediate previous context. However recently, retrieval-augmented LMs have shown to improve over standard neural LMs, by accessing information retrieved from a large datastore, in addition to their standard, parametric, next-word prediction. In this paper, we set out to understand why retrieval-augmented language models, and specifically why k-nearest neighbor language models (kNN-LMs) perform better than standard parametric LMs, even when the k-nearest neighbor component retrieves examples from the same training set that the LM was originally trained on. To this end, we perform analysis of various dimensions over which kNN-LM diverges from standard LMs, and investigate these dimensions one by one. Empirically, we identify three main reasons why kNN-LM performs better than standard LMs: using a different input representation for predicting the next tokens, approximate kNN search, and the importance of softmax temperature for the kNN distribution. Further, we incorporate some insights into the standard parametric LM, improving performance without the need for an explicit retrieval component. The code is available at https://github.com/frankxu2004/knnlm-why.
Author Information
Frank Xu (Carnegie Mellon University)
Uri Alon (Carnegie Mellon University)
Graham Neubig (Carnegie Mellon University)
More from the Same Authors
-
2023 Oral: Cross-Modal Fine-Tuning: Align then Refine »
Junhong Shen · Liam Li · Lucio Dery · Corey Staten · Mikhail Khodak · Graham Neubig · Ameet Talwalkar -
2023 Poster: Cross-Modal Fine-Tuning: Align then Refine »
Junhong Shen · Liam Li · Lucio Dery · Corey Staten · Mikhail Khodak · Graham Neubig · Ameet Talwalkar -
2023 Poster: PAL: Program-aided Language Models »
Luyu Gao · Aman Madaan · Shuyan Zhou · Uri Alon · Pengfei Liu · Yiming Yang · Jamie Callan · Graham Neubig -
2022 Poster: Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval »
Uri Alon · Frank Xu · Junxian He · Sudipta Sengupta · Dan Roth · Graham Neubig -
2022 Spotlight: Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval »
Uri Alon · Frank Xu · Junxian He · Sudipta Sengupta · Dan Roth · Graham Neubig -
2022 Poster: Symmetric Machine Theory of Mind »
Melanie Sclar · Graham Neubig · Yonatan Bisk -
2022 Spotlight: Symmetric Machine Theory of Mind »
Melanie Sclar · Graham Neubig · Yonatan Bisk -
2021 Poster: Examining and Combating Spurious Features under Distribution Shift »
Chunting Zhou · Xuezhe Ma · Paul Michel · Graham Neubig -
2021 Poster: Few-shot Language Coordination by Modeling Theory of Mind »
Hao Zhu · Graham Neubig · Yonatan Bisk -
2021 Spotlight: Few-shot Language Coordination by Modeling Theory of Mind »
Hao Zhu · Graham Neubig · Yonatan Bisk -
2021 Spotlight: Examining and Combating Spurious Features under Distribution Shift »
Chunting Zhou · Xuezhe Ma · Paul Michel · Graham Neubig -
2020 Poster: Optimizing Data Usage via Differentiable Rewards »
Xinyi Wang · Hieu Pham · Paul Michel · Antonios Anastasopoulos · Jaime Carbonell · Graham Neubig -
2020 Poster: XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalisation »
Junjie Hu · Sebastian Ruder · Aditya Siddhant · Graham Neubig · Orhan Firat · Melvin Johnson