The Surprising Effectiveness of Test-Time Training for Few-Shot Learning
Abstract
Lay Summary
Language models are good at solving problems they've seen before—but struggle when faced with unfamiliar challenges. Our research shows that we can significantly boost their performance by allowing them to "update" themselves a little during the test itself, without needing to retrain them from scratch.We apply this idea—called test-time training (TTT)—to two especially difficult benchmarks: one that uses visual puzzles (ARC) (similar to IQ tests) and another with complex language tasks (BIG-Bench Hard). By letting the model adjust itself slightly using the examples it sees during testing, we achieved up to six times better accuracy on ARC and over 7 percentage points improvement on BIG-Bench Hard compared to traditional approaches.Our method helps models adapt to unfamiliar tasks in real time, making them more useful in situations where they encounter something new or unexpected.