Modern large language model pretraining is governed by complex heuristics — from cosine learning-rate decay to batch-size scheduling. Yet, a growing body of work suggests that an analytically simple quadratic model can accurately predict much of this large-scale optimization behavior. In this talk, I will argue that the quadratic model is not merely a convenient theoretical toy, but a useful lens for pretraining practice — both for compute efficiency and for serial runtime.
We will begin with exact computations of critical batch size and time-dependent learning rates in linear systems, establishing a principled foundation. From there, we will see how the same analysis yields batch-size scaling laws (where we estimate batch-size exponents in LLMs) and motivates two pretraining improvements: SeeSaw, a scheduler that trades learning-rate decay for batch-size growth and matches loss at lower serial runtime; and Horizon-Free Pretraining, which shows how anytime schedules with weight averaging can match carefully tuned cosine decay without committing to a horizon in advance. We will close with lower bounds on the interaction between momentum and batch size, which suggest the quadratic model captures fundamental limits about what any first-order method can achieve.
Taken together, these results make the case that quadratics deserve a more central place in how we think about pretraining.
Lab-in-the-Loop for Drug R&D with AI
Making effective medicines is challenging: more than 90 percent of drug candidates fail in pre-clinical research or clinical trials. A major contributor to this low success rate is the enormous space of biological and therapeutic possibilities. In the underlying biology of disease, there are thousands of different cell types and states, about 20,000 genes in our genome, more than 105 disease associated loci, and perhaps 1013 or more ways in which they could meaningfully combine. To make medicines targeting this biology, one could consider at least 1060 possible small molecules with medicine-like properties, approximately 2032 relevant antibodies to consider, billions of people, and about 10,000 different diseases. Now, however, we are at a major inflection point: we can collect large-scale data, at high-resolution, from human biology, and crucially, combine these large datasets with AI to be able to represent, reason and generate over these enormous spaces to yield testable predictions of missing or nonexistent information and iteratively improve our models. Although it is not possible to test every possibility in a lab, clinical trial, or even an entire population, with the scale of data it is currently possible to generate, we can use AI to bridge different layers of biology, determine the impact of combinations of genetic mutations or drug perturbations, predict disease progression, and generate therapeutic molecules de novo or through optimization. Key to the success of this approach is an integrated interplay between data and AI, or a “Lab in the Loop,” where experimental or clinical data are used to train models, the models are used to help predict and design the next set of experiments, and the process is iterated, at scale, both to yield key predictions in any specific project and improve the model for all projects. In this talk, I will describe how we built such a Lab in the Loop of experiments and AI in Genentech across our target discovery, drug discovery and drug development efforts to serve patients across therapeutic areas.
| ICML uses cookies for essential functions only. We do not sell your personal information. Our Privacy Policy » |