Fri 12:00 a.m. - 12:02 a.m.
|
Opening Remarks
(
Intro
)
>
SlidesLive Video
|
Atish Agarwala
🔗
|
Fri 12:00 a.m. - 12:30 a.m.
|
Spectral alignment for high-dimensional SGD, Aukosh Jagannath
(
Plenary Speaker
)
>
SlidesLive Video
|
Aukosh Jagannath
🔗
|
Fri 12:30 a.m. - 1:00 a.m.
|
Misleading Endpoints – Lessons from LLM Training Dynamics, Angelica Chen
(
Plenary Speaker
)
>
SlidesLive Video
|
Angelica Chen
🔗
|
Fri 1:00 a.m. - 2:00 a.m.
|
Poster Session 1
(
Break/Poster Session
)
>
|
🔗
|
Fri 2:00 a.m. - 2:30 a.m.
|
Learning Representations and Associations with Gradient Descent, Jason Lee
(
Plenary Speaker
)
>
SlidesLive Video
|
Jason Lee
🔗
|
Fri 2:30 a.m. - 3:00 a.m.
|
When is theory useful in practice? A guide to pitching your work to LLM trainers, Stella Biderman
(
Plenary Speaker
)
>
|
Stella Biderman
🔗
|
Fri 3:30 a.m. - 5:00 a.m.
|
Lunch
|
🔗
|
Fri 5:00 a.m. - 5:30 a.m.
|
Phase transition in high-dimensional learning, Lenka Zdeborová
(
Plenary Speaker
)
>
SlidesLive Video
|
Lenka Zdeborova
🔗
|
Fri 5:30 a.m. - 6:00 a.m.
|
Generalization Error of min-norm interpolators in transfer learning, Pragya Sur
(
Plenary Speaker
)
>
SlidesLive Video
|
Pragya Sur
🔗
|
Fri 6:00 a.m. - 6:30 a.m.
|
Brain-Wide Compositionality and Learning Dynamics in Biological Agents, Kanaka Rajan
(
Plenary Speaker
)
>
SlidesLive Video
|
Kanaka Rajan
🔗
|
Fri 6:30 a.m. - 7:45 a.m.
|
Poster Session 2
(
Break/Poster Session
)
>
|
🔗
|
Fri 6:30 a.m. - 6:45 a.m.
|
Best Paper Awards
(
Poster Session
)
>
SlidesLive Video
|
🔗
|
Fri 7:59 a.m. - 8:00 a.m.
|
Closing Remarks
(
End
)
>
|
Atish Agarwala
🔗
|
-
|
u-μP: The Unit-Scaled Maximal Update Parametrization
(
Poster
)
>
link
|
Charlie Blake · Constantin Eichenberg · Josef Dean · Lukas Balles · Luke Prince · Björn Deiseroth · Andres Felipe Cruz Salinas · Carlo Luschi · Samuel Weinbach · Douglas Orr
🔗
|
-
|
All Roads Lead to Rome? Exploring Representational Similarities Between Latent Spaces of Generative Image Models
(
Poster
)
>
link
|
Charumathi Badrinath · Usha Bhalla · Alex Oesterling · Suraj Srinivas · Himabindu Lakkaraju
🔗
|
-
|
The Hidden Pitfalls of the Cosine Similarity Loss
(
Poster
)
>
link
|
Andrew Draganov · Sharvaree Vadgama · Erik Bekkers
🔗
|
-
|
Toward Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixture Models
(
Poster
)
>
link
|
Weihang Xu · Maryam Fazel · Simon Du
🔗
|
-
|
Asymptotic Dynamics for Delayed Feature Learning in a Toy Model
(
Poster
)
>
link
|
Blake Bordelon · Tanishq Kumar · Samuel Gershman · Cengiz Pehlevan
🔗
|
-
|
Gradient Dissent in Language Model Training and Saturation
(
Poster
)
>
link
|
Andrei Mircea · Ekaterina Lobacheva · Irina Rish
🔗
|
-
|
An exactly solvable model for emergence and scaling laws
(
Poster
)
>
link
|
yoonsoo nam · Nayara Fonseca · Seok Hyeong Lee · Chris Mingard · Ard Louis
🔗
|
-
|
Gradient descent induces alignment between weights and the pre-activation tangents for deep non-linear networks
(
Poster
)
>
link
|
Daniel Beaglehole · Ioannis Mitliagkas · Atish Agarwala
🔗
|
-
|
Probability Tools for Sequential Random Projection
(
Poster
)
>
link
|
Yingru Li
🔗
|
-
|
Early Period of Training Impacts Out-of-Distribution Generalization
(
Poster
)
>
link
|
Chen Cecilia Liu · Iryna Gurevych
🔗
|
-
|
The Implicit Bias of Adam on Separable Data
(
Poster
)
>
link
|
Chenyang Zhang · Difan Zou · Yuan Cao
🔗
|
-
|
ReLU Characteristic Activation Analysis
(
Poster
)
>
link
|
Wenlin Chen · Hong Ge
🔗
|
-
|
How Do Transformers Fill in the Blanks? A Case Study on Matrix Completion
(
Poster
)
>
link
|
Pulkit Gopalani · Ekdeep Singh Lubana · Wei Hu
🔗
|
-
|
Why Pruning and Conditional Computation Work: A High-Dimensional Perspective
(
Poster
)
>
link
|
Erdem Koyuncu
🔗
|
-
|
Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution
(
Poster
)
>
link
|
Naoki Yoshida · Shogo Nakakita · Masaaki Imaizumi
🔗
|
-
|
Does SGD really happen in tiny subspaces?
(
Poster
)
>
link
|
Minhak Song · Kwangjun Ahn · Chulhee Yun
🔗
|
-
|
Latent functional maps
(
Poster
)
>
link
|
Marco Fumero · Marco Pegoraro · Valentino Maiorca · Francesco Locatello · Emanuele Rodola
🔗
|
-
|
Looking at Deep Learning Phenomena Through a Telescoping Lens
(
Poster
)
>
link
|
Alan Jeffares · Alicia Curth · M van der Schaar
🔗
|
-
|
The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof
(
Poster
)
>
link
|
Derek Lim · Theo Putterman · Robin Walters · Haggai Maron · Stefanie Jegelka
🔗
|
-
|
Feature Learning Dynamics under Grokking in a Sparse Parity Task
(
Poster
)
>
link
|
Javier Sanguino Baustiste · Gregor Bachmann · Bobby He · Lorenzo Noci · Thomas Hofmann
🔗
|
-
|
Closed form of the Hessian spectrum for some Neural Networks
(
Poster
)
>
link
|
Sidak Pal Singh · Thomas Hofmann
🔗
|
-
|
Decomposing and Editing Predictions by Modeling Model Computation
(
Poster
)
>
link
|
Harshay Shah · Andrew Ilyas · Aleksander Madry
🔗
|
-
|
Three Mechanisms of Feature Learning in an Analytically Solvable Model
(
Poster
)
>
link
|
Yizhou Xu · Liu Ziyin
🔗
|
-
|
A Unified Approach to Feature Learning in Bayesian Neural Networks
(
Poster
)
>
link
|
Noa Rubin · Zohar Ringel · Inbar Seroussi · Moritz Helias
🔗
|
-
|
Exploring the development of complexity over depth and time in deep neural networks
(
Poster
)
>
link
|
Hannah Pinson · Aurélien Boland · Vincent Ginis · Mykola Pechenizkiy
🔗
|
-
|
Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks
(
Poster
)
>
link
|
Chenyang Zhang · Peifeng Gao · Difan Zou · Yuan Cao
🔗
|
-
|
Emergent representations in networks trained with the Forward-Forward algorithm
(
Poster
)
>
link
|
Niccolo Tosato · Lorenzo Basile · Emanuele Ballarin · Giuseppe De Alteriis · Alberto Cazzaniga · Alessio Ansuini
🔗
|
-
|
Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics
(
Poster
)
>
link
|
Alireza Mousavi-Hosseini · Denny Wu · Murat Erdogdu
🔗
|
-
|
The Butterfly Effect: Tiny Perturbations Cause Neural Network Training to Diverge
(
Poster
)
>
link
|
Gül Sena Altintas · Devin Kwok · David Rolnick
🔗
|
-
|
How Truncating Weights Improves Reasoning in Language Models
(
Poster
)
>
link
|
Lei Chen · Joan Bruna · Alberto Bietti
🔗
|
-
|
A Random Matrix Analysis of Learning with Noisy Labels
(
Poster
)
>
link
|
Aymane Firdoussi · Mohamed El Amine Seddik
🔗
|
-
|
Analyzing & Eliminating Learning Rate Warmup in GPT Pre-Training
(
Poster
)
>
link
|
Atli Kosson · Bettina Messmer · Martin Jaggi
🔗
|
-
|
Linear Weight Interpolation Leads to Transient Performance Gains
(
Poster
)
>
link
|
Gaurav Iyer · Gintare Karolina Dziugaite · David Rolnick
🔗
|
-
|
Provable Tempered Overfitting of Minimal Nets and Typical Nets
(
Poster
)
>
link
|
Itamar Harel · William Hoza · Gal Vardi · Itay Evron · Nati Srebro · Daniel Soudry
🔗
|
-
|
Effective Sharpness Aware Minimization Requires Layerwise Perturbation Scaling
(
Poster
)
>
link
|
Moritz Haas · Jin Xu · Volkan Cevher · Leena Chennuru Vankadara
🔗
|
-
|
Neural collapse versus low-rank bias: Is deep neural collapse really optimal?
(
Poster
)
>
link
|
Peter Súkeník · Marco Mondelli · Christoph Lampert
🔗
|
-
|
Analysing feature learning of gradient descent using periodic functions
(
Poster
)
>
link
|
Jaehui Hwang · Taeyoung Kim · Hongseok Yang
🔗
|
-
|
A Universal Class of Sharpness-Aware Minimization Algorithms
(
Poster
)
>
link
|
Behrooz Tahmasebi · Ashkan Soleymani · Dara Bahri · Stefanie Jegelka · Patrick Jaillet
🔗
|
-
|
Boundary between noise and information applied to filtering neural network weight matrices
(
Poster
)
>
|
Max Staats · Matthias Thamm · Bernd Rosenow
🔗
|
-
|
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit
(
Poster
)
>
link
|
Kazusato Oko · Denny Wu · Jason Lee · Taiji Suzuki
🔗
|
-
|
Interpolated-MLPs: Controllable Inductive Bias
(
Poster
)
>
link
|
Sean Wu · Jordan Hong · Keyu Bai · Gregor Bachmann
🔗
|
-
|
Hidden Learning Dynamics of Capability before Behavior in Diffusion Models
(
Poster
)
>
link
|
Core Francisco Park · Maya Okawa · Andrew Lee · Ekdeep Singh Lubana · Hidenori Tanaka
🔗
|
-
|
Landscaping Linear Mode Connectivity
(
Poster
)
>
link
|
Sidak Pal Singh · Linara Adilova · Michael Kamp · Asja Fischer · Bernhard Schölkopf · Thomas Hofmann
🔗
|
-
|
Deep Networks Always Grok and Here is Why
(
Poster
)
>
link
|
Ahmed Imtiaz Humayun · Randall Balestriero · Richard Baraniuk
🔗
|
-
|
Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity
(
Poster
)
>
link
|
Shuo Xie · Mohamad Amin Mohamadi · Zhiyuan Li
🔗
|
-
|
Understanding Nonlinear Implicit Bias via Region Counts in Input Space
(
Poster
)
>
link
|
Jingwei Li · Jing Xu · Zifan Wang · Huishuai Zhang · Jingzhao Zhang
🔗
|
-
|
On the metastability of learning algorithms in physics-informed neural networks: a case study on Schr\"{o}dinger operators
(
Poster
)
>
link
|
Alessandro Selvitella
🔗
|
-
|
Simple, unified analysis of Johnson-Lindenstrauss with applications
(
Poster
)
>
link
|
Yingru Li
🔗
|
-
|
Fundamental limits of weak learnability in high-dimensional multi-index models
(
Poster
)
>
link
|
Emanuele Troiani · Yatin Dandi · Leonardo Defilippis · Lenka Zdeborova · Bruno Loureiro · FLORENT KRZAKALA
🔗
|
-
|
When Are Bias-Free ReLU Networks Like Linear Networks?
(
Poster
)
>
link
|
Yedi Zhang · Andrew Saxe · Peter Latham
🔗
|
-
|
Provable Benefit of Cutout and CutMix for Feature Learning
(
Poster
)
>
link
|
Junsoo Oh · Chulhee Yun
🔗
|
-
|
Where Do Large Learning Rates Lead Us? A Feature Learning Perspective
(
Poster
)
>
link
|
Ildus Sadrtdinov · Maxim Kodryan · Eduard Pokonechny · Ekaterina Lobacheva · Dmitry Vetrov
🔗
|
-
|
Fine-grained Analysis of In-context Linear Estimation
(
Poster
)
>
link
|
Yingcong Li · Ankit Singh Rawat · Samet Oymak
🔗
|
-
|
A Hessian-Aware Stochastic Differential Equation for Modelling SGD
(
Poster
)
>
link
|
Xiang Li · Zebang Shen · Liang Zhang · Niao He
🔗
|
-
|
Expressivity of Neural Networks with Fixed Weights and Learned Biases
(
Poster
)
>
link
|
Ezekiel Williams · Hee-Woon Ryoo · Thomas Jiralerspong · Alexandre Payeur · Matthew Perich · Luca Mazzucato · Guillaume Lajoie
🔗
|
-
|
Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions
(
Poster
)
>
link
|
Luca Arnaboldi · Yatin Dandi · FLORENT KRZAKALA · Luca Pesce · Ludovic Stephan
🔗
|
-
|
Nonconvex Meta-optimization for Deep Learning
(
Poster
)
>
link
|
Xinyi Chen · Evan Dogariu · Zhou Lu · Elad Hazan
🔗
|
-
|
The optimization landscape of Spectral neural network
(
Poster
)
>
link
|
Chenghui Li · Rishi Sonthalia · Nicolas Garcia Trillos
🔗
|
-
|
Gradient Descent with Polyak’s Momentum Finds Flatter Minima via Large Catapults
(
Poster
)
>
link
|
Prin Phunyaphibarn · Junghyun Lee · Bohan Wang · Huishuai Zhang · Chulhee Yun
🔗
|
-
|
Merging Text Transformer Models from Different Initializations
(
Poster
)
>
link
|
Neha Verma · Maha Elbayad
🔗
|
-
|
How Do Nonlinear Transformers Acquire Generalization-Guaranteed CoT Ability?
(
Poster
)
>
link
|
Hongkang Li · Meng Wang · Songtao Lu · Xiaodong Cui · Pin-Yu Chen
🔗
|
-
|
InfoNCE: Identifying the Gap Between Theory and Practice
(
Poster
)
>
link
|
Roland S. Zimmermann · Evgenia Rusak · Wieland Brendel · Attila Juhos · Patrik Reizinger · Oliver Bringmann
🔗
|
-
|
Do Parameters Reveal More than Loss for Membership Inference?
(
Poster
)
>
link
|
Anshuman Suri · Xiao Zhang · David Evans
🔗
|
-
|
Neural Symmetry Detection for Learning Neural Network Constraints
(
Poster
)
>
link
|
Alex Gabel · Rick Quax · Efstratios Gavves
🔗
|
-
|
Understanding Adversarially Robust Generalization via Weight-Curvature Index
(
Poster
)
>
link
|
Yuelin Xu · Xiao Zhang
🔗
|
-
|
Random matrix theory analysis of neural network weight matrices
(
Poster
)
>
|
Matthias Thamm · Max Staats · Bernd Rosenow
🔗
|
-
|
Loss landscape geometry reveals stagewise development of transformers
(
Poster
)
>
link
|
George Wang · Matthew Farrugia-Roberts · Jesse Hoogland · Liam Carroll · Susan Wei · Daniel Murfet
🔗
|
-
|
Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning
(
Poster
)
>
link
|
Daniel Kunin · Allan Raventos · Clémentine Dominé · Feng Chen · David Klindt · Andrew Saxe · Surya Ganguli
🔗
|
-
|
Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances
(
Poster
)
>
link
|
Marcel Kühn · Bernd Rosenow
🔗
|
-
|
Progress Measures for Grokking on Real-world Tasks
(
Poster
)
>
link
|
Satvik Golechha
🔗
|
-
|
A Phase Transition between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention
(
Poster
)
>
link
|
Hugo Cui · Freya Behrens · FLORENT KRZAKALA · Lenka Zdeborova
🔗
|
-
|
SGD vs GD: Rank Deficiency in Linear Networks
(
Poster
)
>
link
|
Aditya Vardhan Varre · Margarita Sagitova · Nicolas Flammarion
🔗
|
-
|
Rank Minimization, Alignment and Weight Decay in Neural Networks
(
Poster
)
>
link
|
David Yunis · Kumar Kshitij Patel · Samuel Wheeler · Pedro Henrique Pamplona Savarese · Gal Vardi · Karen Livescu · Michael Maire · Matthew Walter
🔗
|