Timezone: »

 
Teaching Arithmetic to Small Transformers
Nayoung Lee · Kartik Sreenivasan · Jason Lee · Kangwook Lee · Dimitris Papailiopoulos

Large language models like GPT-4 exhibit emergent general-purpose capabilities, such as basic arithmetic, when trained on extensive text data, even though these tasks are not explicitly encoded by the unsupervised, next-token prediction objective. This study investigates how small transformers, trained from scratch, can efficiently learn arithmetic operations such as addition, multiplication, and elementary functions like square root, using the next-token prediction objective.We first demonstrate that conventional training data is not the most effective for arithmetic learning, and simple formatting changes can significantly improve accuracy. This leads to sharp phase transitions as a function of training data scale, which, in some cases, can be explained through connections to low-rank matrix completion. Building on prior work, we then train on chain-of-thought style data that include intermediate step results. Even in the complete absence of pretraining, this approach significantly and simultaneously improves accuracy, sample complexity, and convergence speed.We also study the interplay between arithmetic and text data during training and examine the effects of few-shot prompting, pretraining, and model scale. Additionally, we discuss length generalization challenges. Our work highlights the importance of high-quality, instructive data that considers the particular characteristics of the next-word prediction objective for rapidly eliciting arithmetic capabilities.

Author Information

Nayoung Lee (University of Wisconsin-Madison)
Kartik Sreenivasan (University of Wisconsin-Madison)
Jason Lee (Princeton University)
Kangwook Lee (UW Madison, KRAFTON AI)

I am an Assistant Professor at the Electrical and Computer Engineering department and the Computer Sciences department (by courtesy) at the University of Wisconsin-Madison. Previously, I was a Research Assistant Professor at Information and Electronics Research Institute of KAIST, working with Prof. Changho Suh. Before that, I was a postdoctoral scholar at the same institute. I received my PhD in May 2016 from the Electrical Engineering and Computer Science department at UC Berkeley and my Master of Science degree from the same department in December 2012, both under the supervision of Prof. Kannan Ramchandran. I was a member of Berkeley Laboratory of Information and System Sciences (BLISS, aka Wireless Foundation) and BASiCS Group. I received my Bachelor of Science degree in Electrical Engineering from Korea Advanced Institute of Science and Technology (KAIST) in May 2010.

Dimitris Papailiopoulos (University of Wisconsin-Madison)

More from the Same Authors