HiLD: High-dimensional Learning Dynamics Workshop

Workshop

HiLD: High-dimensional Learning Dynamics Workshop

Courtney Paquette · Zhenyu Liao · Mihai Nica · Elliot Paquette · Andrew Saxe · Rene Vidal

[ Abstract ] Workshop Website

[ Project Page ]

Modern applications of machine learning seek to extract insights from high-dimensional datasets. The goal of the High-dimensional Learning Dynamics (HiLD) Workshop is to predict and analyze the dynamics of learning algorithms when the number of samples and parameters are large. This workshop seeks to spur research and collaboration around:

1. Developing analyzable models and dynamics to explain observed deep neural network phenomena;
2. Creating mathematical frameworks for scaling limits of neural network dynamics as width and depth grow, which often defy low-dimensional geometric intuitions;
3. The role of overparameterization and how this leads to conserved quantities in the dynamics and the emergence of geometric invariants, with links to Noether's theorem, etc;
4. Provable impacts of the choice of optimization algorithm, hyper-parameters, and neural network architectures on training/test dynamics.

HiLD Workshop aims to bring together experts from classical random matrix theory, optimization, high-dimensional statistics/probability, and statistical physics to share their perspectives while leveraging crossover experts in ML. It seeks to create synergies between these two groups which often do not interact. Through a series of talks, poster sessions, and panel discussions, the workshop will tackle questions on dynamics of learning algorithms at the interface of random matrix theory, high-dimensional statistics, SDEs, and ML.

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Fri 12:00 p.m. - 12:01 p.m.	Opening Remarks ( Remarks ) > SlidesLive Video	🔗
Fri 12:00 p.m. - 12:45 p.m.	Feature Learning in Two-layer Neural Networks under Structured Data, Murat A. Erdogdu ( Plenary Speaker ) > SlidesLive Video	Murat Erdogdu 🔗
Fri 12:45 p.m. - 1:15 p.m.	Contributed talks 1 ( Contributed talks ) > SlidesLive Video	MENGQI LOU · Zhichao Wang 🔗
Fri 1:15 p.m. - 2:15 p.m.	Poster Session/Coffee Break	🔗
Fri 2:15 p.m. - 3:00 p.m.	High-dimensional Optimization in the Age of ChatGPT, Sanjeev Arora ( Plenary Speaker ) > SlidesLive Video	Sanjeev Arora 🔗
Fri 3:00 p.m. - 4:30 p.m.	Lunch	🔗
Fri 4:30 p.m. - 5:15 p.m.	Multi-level theory of neural representations: Capacity of neural manifolds in biological and artificial neural networks, SueYeon Chung ( Plenary Speaker ) > SlidesLive Video	SueYeon Chung 🔗
Fri 5:15 p.m. - 6:00 p.m.	Contributed talks 2 ( Contributed talks ) > SlidesLive Video	Simon Du · Wei Huang · Yuandong Tian 🔗
Fri 6:00 p.m. - 6:30 p.m.	Coffee Break	🔗
Fri 6:30 p.m. - 7:15 p.m.	A strong implicit bias in SGD dynamics towards much simpler subnetworks through stochastic collapse to invariant sets, Surya Ganguli ( Plenary Speaker ) > link SlidesLive Video Link	Surya Ganguli 🔗
Fri 7:15 p.m. - 8:00 p.m.	Solving overparametrized systems of random equations, Andrea Montanari ( Plenary Speaker ) > SlidesLive Video	Andrea Montanari 🔗
Fri 7:59 p.m. - 8:00 p.m.	Closing remarks ( Remarks ) >	🔗
-	Learning to Plan in Multi-dimensional Stochastic Differential Equations ( Poster ) >	Mohamad Sadegh Shirani Faradonbeh · Mohamad Kazem Shirani Faradonbeh 🔗
-	Elephant Neural Networks: Born to Be a Continual Learner ( Poster ) >	Qingfeng Lan · Rupam Mahmood 🔗
-	Investigating the Edge of Stability Phenomenon in Reinforcement Learning ( Poster ) >	Rares Iordan · Mihaela Rosca · Marc Deisenroth 🔗
-	Deep Neural Networks Extrapolate Cautiously in High Dimensions ( Poster ) >	Katie Kang · Amrith Setlur · Claire Tomlin · Sergey Levine 🔗
-	Implicit regularisation in stochastic gradient descent: from single-objective to two-player games ( Poster ) >	Mihaela Rosca · Marc Deisenroth 🔗
-	How to escape sharp minima ( Poster ) >	Kwangjun Ahn · Ali Jadbabaie · Suvrit Sra 🔗
-	Adapting to Gradual Distribution Shifts with Continual Weight Averaging ( Poster ) >	Jared Fernandez · Saujas Vaduguru · Sanket Vaibhav Mehta · Yonatan Bisk · Emma Strubell 🔗
-	On the Problem of Transferring Learning Trajectories Between Neural Networks ( Poster ) >	Daiki Chijiwa 🔗
-	Neural Collapse in the Intermediate Hidden Layers of Classification Neural Networks ( Poster ) >	Liam Parker 🔗
-	Flatter, Faster: Scaling Momentum for Optimal Speedup of SGD ( Poster ) >	Aditya Cowsik · Tankut Can · Paolo Glorioso 🔗
-	Which Features are Learned by Contrastive Learning? On the Role of Simplicity Bias in Class Collapse and Feature Suppression ( Poster ) >	Yihao Xue · Siddharth Joshi · Eric Gan · Pin-Yu Chen · Baharan Mirzasoleiman 🔗
-	An improved residual based random forest for robust prediction ( Poster ) >	Mingyan Li 🔗
-	How Does Adaptive Optimization Impact Local Neural Network Geometry? ( Poster ) >	Kaiqi Jiang · Dhruv Malik · Yuanzhi Li 🔗
-	Effects of Overparameterization on Sharpness-Aware Minimization: A Preliminary Investigation ( Poster ) >	Sungbin Shin · Dongyeop Lee · Namhoon Lee 🔗
-	High-dimensional Learning Dynamics of Deep Neural Nets in the Neural Tangent Regime ( Poster ) >	Yongqi Du · Zenan Ling · Robert Qiu · Zhenyu Liao 🔗
-	On the Equivalence Between Implicit and Explicit Neural Networks: A High-dimensional Viewpoint ( Poster ) >	Zenan Ling · Zhenyu Liao · Robert Qiu 🔗
-	Hyperparameter Tuning using Loss Landscape ( Poster ) >	Jianlong Chen · Qinxue Cao · Yefan Zhou · Konstantin Schürholt · Yaoqing Yang 🔗
-	Sharpness-Aware Minimization Leads to Low-Rank Features ( Poster ) >	Maksym Andriushchenko · Dara Bahri · Hossein Mobahi · Nicolas Flammarion 🔗
-	Layerwise Linear Mode Connectivity ( Poster ) >	Linara Adilova · Asja Fischer · Martin Jaggi 🔗
-	Does Double Descent Occur in Self-Supervised Learning? ( Poster ) >	Alisia Lupidi · Yonatan Gideoni · Dulhan Jayalath 🔗
-	On the Universality of Linear Recurrences Followed by Nonlinear Projections ( Poster ) >	Antonio Orvieto · Soham De · Razvan Pascanu · Caglar Gulcehre · Samuel Smith 🔗
-	Network Degeneracy as an Indicator of Training Performance: Comparing Finite and Infinite Width Angle Predictions ( Poster ) >	Cameron Jakub · Mihai Nica 🔗
-	Implicitly Learned Invariance and Equivariance in Linear Regression ( Poster ) >	Yonatan Gideoni 🔗
-	Latent State Transitions in Training Dynamics ( Poster ) >	Michael Hu · Angelica Chen · Naomi Saphra · Kyunghyun Cho 🔗
-	Hessian Inertia in Neural Networks ( Poster ) >	Xuchan Bao · Alberto Bietti · Aaron Defazio · Vivien Cabannnes 🔗
-	Generalization and Stability of Interpolating Neural Networks with Minimal Width ( Poster ) >	Hossein Taheri · Christos Thrampoulidis 🔗
-	The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold ( Poster ) >	Jialin Mao · Han Kheng Teoh · Itay Griniasty · Rahul Ramesh · Rubing Yang · Mark Transtrum · James Sethna · Pratik Chaudhari 🔗
-	An Adaptive Method for Minimizing Non-negative Losses ( Poster ) >	Antonio Orvieto · Lin Xiao 🔗
-	The Marginal Value of Momentum for Small Learning Rate SGD ( Poster ) >	Runzhe Wang · Sadhika Malladi · Tianhao Wang · Kaifeng Lyu · Zhiyuan Li 🔗
-	Spectral Evolution and Invariance in Linear-width Neural Networks ( Poster ) >	Zhichao Wang · Andrew Engel · Anand Sarwate · Ioana Dumitriu · Tony Chiang 🔗
-	On the Joint Interaction of Models, Data, and Features ( Poster ) >	YiDing Jiang · Christina Baek · Zico Kolter 🔗
-	Predictive Sparse Manifold Transform ( Poster ) >	Yujia Xie · Xinhui Li · Vince Calhoun 🔗
-	Margin Maximization in Attention Mechanism ( Poster ) >	Davoud Ataee Tarzanagh · Yingcong Li · Xuechen Zhang · Samet Oymak 🔗
-	Supervised-Contrastive Loss Learns Orthogonal Frames and Batching Matters ( Poster ) >	Ganesh Ramachandra Kini · Vala Vakilian · Tina Behnia · Jaidev Gill · Christos Thrampoulidis 🔗
-	Characterizing and Improving Transformer Solutions for Dyck Grammars ( Poster ) >	Kaiyue Wen · Yuchen Li · Bingbin Liu · Andrej Risteski 🔗
-	Benign Overfitting of Two-Layer Neural Networks under Inputs with Intrinsic Dimension ( Poster ) >	Shunta Akiyama · Kazusato Oko · Taiji Suzuki 🔗
-	Implicit regularization of multi-task learning and finetuning in overparameterized neural networks ( Poster ) >	Samuel Lippl · Jack Lindsey 🔗
-	Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization ( Poster ) >	Kaiyue Wen · Tengyu Ma · Zhiyuan Li 🔗
-	The phases of large learning rate gradient descent through effective parameters ( Poster ) >	Lawrence Wang · Stephen Roberts 🔗
-	On Privileged and Convergent Bases in Neural Network Representations ( Poster ) >	Davis Brown · Nikhil Vyas · Yamini Bansal 🔗
-	On the Effectiveness of Sharpness-Aware Minimization with Large Mini-batches ( Poster ) >	Jinseok Chung · Seonghwan Park · Jaeho Lee · Namhoon Lee 🔗
-	Fast Test Error Rates for Gradient-based Algorithms on Separable Data ( Poster ) >	Puneesh Deora · Bhavya Vasudeva · Vatsal Sharan · Christos Thrampoulidis 🔗
-	On the Advantage of Lion Compared to signSGD with Momentum ( Poster ) >	Alessandro Noiato · Luca Biggio · Antonio Orvieto 🔗
-	On the Training and Generalization Dynamics of Multi-head Attention ( Poster ) >	Puneesh Deora · Rouzbeh Ghaderi · Hossein Taheri · Christos Thrampoulidis 🔗
-	Learning Stochastic Dynamical Systems as an Implicit Regularization with Graph Neural Network ( Poster ) >	Jin Guo · Ting Gao · Yufu Lan · Peng Zhang · Sikun Yang · Jinqiao Duan 🔗
-	Graph Neural Networks Provably Benefit from Structural Information: A Feature Learning Perspective ( Oral ) >	Wei Huang · Yuan Cao · Haonan Wang · Xin Cao · Taiji Suzuki 🔗
-	Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron ( Oral ) > link Link	Weihang Xu · Simon Du 🔗
-	Sharp predictions for mini-batched prox-linear iterations in rank one matrix sensing ( Oral ) >	MENGQI LOU · Kabir Chandrasekher · Ashwin Pananjady 🔗
-	Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer ( Oral ) >	Yuandong Tian · Yiping Wang · Beidi Chen · Simon Du 🔗
-	Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective ( Oral ) >	Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu 🔗