From Kepler to Newton: Inductive Biases Guide Learned World Models in Transformers
Abstract
Vafa et al. recently showed that a transformer fails to acquire an internal Newtonian world model when trained on synthetic planetary-motion data. How can we fix this problem? We find that inductive biases are key to learning the veridical world model: (1) Spatial smoothness is required for any world model to be learned. However, naive tokenization may disrupt smoothness since two close points in physical space may be far apart in token embedding space without sufficient training or data. We fix this by formulating the prediction problem as regression instead of classification. (2) Spatial stability makes the prediction robust to noise, which is not guaranteed by default, but can be taught via correcting in-context noise perturbations. (3) With both spatial smoothness and stability built in, further imposing temporal locality induces a Newtonian world model, while the lack of this knowledge induces a Keplerian world model -- fitting elliptical parameters instead of computing gravitational forces. Our results suggest that even simple general inductive biases are powerful enough to induce correct and specific world models. The inductive biases do not need to know that much about the underlying law to be learned, but without them, it is impossible to learn.