Workshop
Workshop on Theoretical Foundations of Foundation Models (TF2M)
Berivan Isik · Ziteng Sun · Banghua Zhu · Enric Boix-Adserà · Nezihe Merve Gürel · Bo Li · Ahmad Beirami · Sanmi Koyejo
Straus 2
Sat 27 Jul, midnight PDT
Recent advancements in generative foundation models (FMs) such as large language models (LLMs) and diffusion models have propelled the capability of deep neural models to seemingly magical heights. Yet, the soaring growth in the model size and capability has also led to pressing concerns surrounding such modern AI systems. The scaling of the models significantly increases their energy consumption and deployment cost. Overreliance on AI may perpetuate existing inequalities and lead to widening discrimination against certain groups of people. The gap between the understanding of the internal workings of FMs and their empirical success has also reached an unprecedented level, hindering accountability and transparency.For decades, theoretical tools from statistics, information theory, and optimization have played a pivotal role in extracting information from unstructured data. Currently, the rapid pace of FM development has outstripped theoretical investigation, creating a potential gap between theoretical researchers and the challenges surrounding FMs. This workshop proposes a platform for bringing together researchers and practitioners from the foundation model and theory community (including statistics, information theory, optimization, and learning theory), to discuss advances and challenges in addressing these concerns, with a focus on responsible AI, efficiency, and principled foundations.
Schedule
Sat 12:00 a.m. - 12:05 a.m.
|
Opening remarks
(
Intro
)
>
SlidesLive Video |
🔗 |
Sat 12:00 a.m. - 12:05 a.m.
|
Berivan Isik
(
Opening
)
>
|
🔗 |
Sat 12:05 a.m. - 12:35 a.m.
|
Yuandong Tian (Meta AI): Understanding Foundation Models via the Lens of Training Dynamics
(
Invited Talk
)
>
SlidesLive Video |
Yuandong Tian 🔗 |
Sat 12:05 a.m. - 12:35 a.m.
|
Yuandong Tian
(
Talk
)
>
|
🔗 |
Sat 12:35 a.m. - 1:05 a.m.
|
Jason Lee (Princeton): Learning Representations and Associations with Gradient Descent
(
Invited Talk
)
>
SlidesLive Video |
Jason Lee 🔗 |
Sat 12:35 a.m. - 1:05 a.m.
|
Jason Lee
(
Talk
)
>
|
🔗 |
Sat 1:05 a.m. - 1:15 a.m.
|
Contributed Talk: Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models
(
Contributed Talk
)
>
SlidesLive Video |
🔗 |
Sat 1:05 a.m. - 1:15 a.m.
|
Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models
(
Contributed Talk
)
>
|
🔗 |
Sat 1:15 a.m. - 1:25 a.m.
|
Contributed Talk: Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models
(
Contributed Talk
)
>
SlidesLive Video |
🔗 |
Sat 1:15 a.m. - 1:25 a.m.
|
Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models
(
Contributed Talk
)
>
|
🔗 |
Sat 1:25 a.m. - 1:35 a.m.
|
Short break
|
🔗 |
Sat 1:25 a.m. - 1:35 a.m.
|
Break
|
🔗 |
Sat 1:35 a.m. - 2:05 a.m.
|
Dan Alistarh (IST Austria): Model Compression at GPT Scale by Estimating Second-Order Information
(
Invited Talk
)
>
SlidesLive Video |
Dan Alistarh 🔗 |
Sat 1:35 a.m. - 2:05 a.m.
|
Dan Alistarh
(
Talk
)
>
|
🔗 |
Sat 2:05 a.m. - 2:35 a.m.
|
Ananda Theertha Suresh (Google Research): Accelerating language model inference using optimal transport: Theory and algorithms
(
Invited Talk
)
>
SlidesLive Video |
Ananda Suresh 🔗 |
Sat 2:35 a.m. - 3:35 a.m.
|
Poster Session 1
(
Poster Session
)
>
|
🔗 |
Sat 3:35 a.m. - 5:00 a.m.
|
Lunch Break
|
🔗 |
Sat 5:00 a.m. - 6:00 a.m.
|
Poster Session 2
(
Poster Session
)
>
|
🔗 |
Sat 5:00 a.m. - 6:00 a.m.
|
Poster Session
(
Poster Session
)
>
|
🔗 |
Sat 6:00 a.m. - 6:30 a.m.
|
Kamalika Chaudhuri (UCSD): Theoretical Foundations of Memorization in Foundation Models
(
Invited Talk
)
>
SlidesLive Video |
Kamalika Chaudhuri 🔗 |
Sat 6:30 a.m. - 7:00 a.m.
|
Coffee Break
|
🔗 |
Sat 7:00 a.m. - 7:10 a.m.
|
Contributed Talk: Transformers are Minimax Optimal Nonparametric In-Context Learners
(
Contributed Talk
)
>
SlidesLive Video |
🔗 |
Sat 7:10 a.m. - 7:20 a.m.
|
Contributed Talk: Models That Prove Their Own Correctness
(
Contributed Talk
)
>
SlidesLive Video |
🔗 |
Sat 7:20 a.m. - 8:00 a.m.
|
Panel Discussion
(
Panel Discussion
)
>
SlidesLive Video |
🔗 |
Sat 8:00 a.m. - 8:00 a.m.
|
Concluding Remarks and Awards
(
Concluding Remarks
)
>
SlidesLive Video |
🔗 |
Sat 8:00 a.m. - 8:00 a.m.
|
Ziteng Sun
(
Closing
)
>
|
🔗 |
-
|
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs ( Poster ) > link | Jingtong Su · Julia Kempe · Karen Ullrich 🔗 |
-
|
The Geometry of Categorical and Hierarchical Concepts in Large Language Models ( Poster ) > link | Kiho Park · Yo Joong Choe · Yibo Jiang · Victor Veitch 🔗 |
-
|
Understanding the Role of Equivariance in Self-supervised Learning ( Poster ) > link | Yifei Wang · Kaiwen Hu · Sharut Gupta · Ziyu Ye · Yisen Wang · Stefanie Jegelka 🔗 |
-
|
Models That Prove Their Own Correctness ( Oral ) > link | Noga Amit · Shafi Goldwasser · Orr Paradise · Guy Rothblum 🔗 |
-
|
On Provable Length and Compositional Generalization ( Poster ) > link | Kartik Ahuja · Amin Mansouri 🔗 |
-
|
Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment ( Poster ) > link | Jiaxiang Li · Siliang Zeng · Hoi To Wai · Chenliang Li · Alfredo Garcia · Mingyi Hong 🔗 |
-
|
Rethinking Invariance in In-context Learning ( Poster ) > link | Lizhe Fang · Yifei Wang · Khashayar Gatmiry · Lei Fang · Yisen Wang 🔗 |
-
|
How Transformers Learn Diverse Attention Correlations in Masked Vision Pretraining ( Poster ) > link | Yu Huang · Zixin Wen · Yuejie Chi · Yingbin LIANG 🔗 |
-
|
Active Preference Optimization for Sample Efficient RLHF ( Poster ) > link | Nirjhar Das · Souradip Chakraborty · Aldo Pacchiano · Sayak Ray Chowdhury 🔗 |
-
|
State Space Models are Comparable to Transformers in Estimating Functions with Dynamic Smoothness ( Poster ) > link | Naoki Nishikawa · Taiji Suzuki 🔗 |
-
|
Efficient Document Ranking with Learnable Late Interactions ( Poster ) > link | Himanshu Jain · Ziwei Ji · Sashank J. Reddi · Ankit Singh Rawat · Felix Xinnan Yu · Aditya Menon · Sadeep Jayasumana 🔗 |
-
|
Modeling the Plurality of Human Preferences via Ideal Points ( Poster ) > link | Daiwei Chen · Yi Chen · Aniket Rege · Ramya Vinayak 🔗 |
-
|
Transformers need glasses! Information over-squashing in language tasks ( Poster ) > link | Federico Barbero · Andrea Banino · Steven Kapturowski · Dharshan Kumaran · João Madeira Araujo · Alex Vitvitskyi · Razvan Pascanu · Petar Veličković 🔗 |
-
|
Meta-optimization for Deep Learning via Nonstochastic Control ( Poster ) > link | Xinyi Chen · Evan Dogariu · Zhou Lu · Elad Hazan 🔗 |
-
|
Towards understanding the mechanisms of associative memory in transformers ( Poster ) > link | Yibo Jiang · Goutham Rajendran · Pradeep Ravikumar · Bryon Aragam 🔗 |
-
|
How Do Transformers Fill in the Blanks? A Case Study on Matrix Completion ( Poster ) > link | Pulkit Gopalani · Ekdeep Singh Lubana · Wei Hu 🔗 |
-
|
Detrimental Memories in Transfer Learning ( Poster ) > link | Amal Alnouri · Timothy Wroge · Bilal Alsallakh 🔗 |
-
|
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression ( Poster ) > link | Xingwu Chen · Lei Zhao · Difan Zou 🔗 |
-
|
Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement ( Poster ) > link | Yunzhen Feng · Elvis Dohmatob · Pu Yang · Francois Charton · Julia Kempe 🔗 |
-
|
Fast Machine Unlearning via Robust Training ( Poster ) > link | Youssef Allouah · Joshua Kazdan · Rachid Guerraoui · Sanmi Koyejo 🔗 |
-
|
Transformer Efficiently Learns Low-dimensional Target Functions In-context ( Poster ) > link | Yujin Song · Denny Wu · Kazusato Oko · Taiji Suzuki 🔗 |
-
|
In-Context Learning with Representations: Contextual Generalization of Trained Transformers ( Poster ) > link | Tong Yang · Yu Huang · Yingbin LIANG · Yuejie Chi 🔗 |
-
|
Understanding and Minimising Outlier Features in Neural Network Training ( Poster ) > link | Bobby He · Lorenzo Noci · Daniele Paliotta · Imanol Schlag · Thomas Hofmann 🔗 |
-
|
Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models ( Oral ) > link | Adway Girish · Alliot Nagle · Ashok Vardhan Makkuva · Marco Bondaschi · Michael Gastpar · Hyeji Kim 🔗 |
-
|
Implicit Regularization of Sharpness-Aware Minimization for Scale-Invariant Problems ( Poster ) > link | Bingcong Li · Liang Zhang · Niao He 🔗 |
-
|
Self-Play Preference Optimization for Language Model Alignment ( Poster ) > link | Yue Wu · Zhiqing Sun · Huizhuo Yuan · Kaixuan Ji · Yiming Yang · Quanquan Gu 🔗 |
-
|
Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models ( Oral ) > link | Sanae Lotfi · Yilun Kuang · Marc Finzi · Brandon Amos · Micah Goldblum · Andrew Wilson 🔗 |
-
|
Transformer Designs for In-Context Learning in Foundation Models for Time Series Forecasting with Covariates ( Poster ) > link | Afrin Dange · Raj · Praneeth Kumar Netrapalli · Sunita Sarawagi 🔗 |
-
|
Implicit Optimization Bias of Next-token Prediction in Linear Models ( Poster ) > link | Christos Thrampoulidis 🔗 |
-
|
A Theoretical Understanding of Self-Correction through In-context Alignment ( Poster ) > link | Yifei Wang · Yuyang Wu · Zeming Wei · Stefanie Jegelka · Yisen Wang 🔗 |
-
|
Transformers are Minimax Optimal Nonparametric In-Context Learners ( Oral ) > link | Juno Kim · Tai Nakamaki · Taiji Suzuki 🔗 |
-
|
Unified Taxonomy in AI Safety: Watermarks, Adversarial Defenses, and Transferable Attacks ( Poster ) > link | Grzegorz Gluch · Sai Ganesh Nagarajan · Berkant Turan 🔗 |
-
|
Zero-Shot Generalization of GNNs over Distinct Attribute Domains ( Poster ) > link | Yangyi Shen · Beatrice Bevilacqua · Joshua Robinson · Charilaos Kanatsoulis · Jure Leskovec · Bruno Ribeiro 🔗 |
-
|
Hallmarks of Optimization Trajectories in Neural Networks and LLMs: Directional Exploration and Redundancy ( Poster ) > link | Sidak Pal Singh · Bobby He · Thomas Hofmann · Bernhard Schölkopf 🔗 |
-
|
Decoding-Time Language Model Alignment with Multiple Objectives ( Poster ) > link | Ruizhe Shi · Yifang Chen · Yushi Hu · Alisa Liu · Hannaneh Hajishirzi · Noah Smith · Simon Du 🔗 |
-
|
Implementability of Information Elicitation Mechanisms with Pre-Trained Language Models ( Poster ) > link | Zach Robertson · Hannah Cha · Andrew Sheha · Sanmi Koyejo 🔗 |
-
|
Attention Is All You Need But You Don’t Need All Of It For Inference of Large Language Models ( Poster ) > link | Georgy Tyukin · Gbetondji Dovonon · Jean Kaddour · Pasquale Minervini 🔗 |
-
|
Setting the Record Straight on Transformer Oversmoothing ( Poster ) > link | Gbetondji Dovonon · Michael Bronstein · Matt Kusner 🔗 |
-
|
MSAMamba: Adapting Subquadratic Models To Long-Context DNA MSA Analysis ( Poster ) > link | Vishrut Thoutam · Dina Ellsworth 🔗 |
-
|
Fine-Tuning Large Language Models with User-Level Differential Privacy ( Poster ) > link | Zachary Charles · Arun Ganesh · Ryan McKenna · Hugh B McMahan · Nicole Mitchell · Krishna Pillutla · J K Rush 🔗 |
-
|
Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers ( Poster ) > link | Siyu Chen · Heejune Sheen · Tianhao Wang · Zhuoran Yang 🔗 |
-
|
Understanding and Mitigating Tokenization Bias in Language Models ( Poster ) > link | Buu Phan · Marton Havasi · Matthew Muckley · Karen Ullrich 🔗 |
-
|
Preference Learning Algorithms Do Not Learn Preference Rankings ( Poster ) > link | Angelica Chen · Sadhika Malladi · Lily Zhang · Xinyi Chen · Richard Zhang · Rajesh Ranganath · Kyunghyun Cho 🔗 |
-
|
Local to Global: Learning Dynamics and Effect of Initialization for Transformers ( Poster ) > link | Ashok Vardhan Makkuva · Marco Bondaschi · Chanakya Ekbote · Adway Girish · Alliot Nagle · Hyeji Kim · Michael Gastpar 🔗 |
-
|
On the Power of Convolution Augmented Transformer ( Poster ) > link | Mingchen Li · Xuechen Zhang · Yixiao HUANG · Samet Oymak 🔗 |
-
|
Sparse Neural Architectures and Deterministic Ramanujan Graphs ( Poster ) > link | Arindam Biswas · Suryam Arnav Kalra · Pabitra Mitra · Biswajit Basu 🔗 |
-
|
Multilingual Compression Parity: How Efficiently Large Language Models Represent Information Across Languages? ( Poster ) > link | Alexander Tsvetkov · Alon Kipnis 🔗 |
-
|
Unavoidable Learning Constraints Alter the Foundations of Direct Preference Optimization ( Poster ) > link | David Wipf 🔗 |
-
|
How Do Nonlinear Transformers Acquire Generalization-Guaranteed CoT Ability? ( Poster ) > link | Hongkang Li · Meng Wang · Songtao Lu · Xiaodong Cui · Pin-Yu Chen 🔗 |
-
|
Do LLM Agents Have Regret? A Case Study in Online Learning and Games ( Poster ) > link | Chanwoo Park · Xiangyu Liu · Asuman Ozdaglar · Kaiqing Zhang 🔗 |
-
|
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models ( Poster ) > link | Junsong Chen · Yue Wu · Simian Luo · Enze Xie 🔗 |
-
|
ImportanceWeighted Multi-Draft Speculative Sampling ( Poster ) > link | Ashish Khisti · Arash Behravesh · Hassan Dbouk · Arash Behboodi · Roland Memisevic · Christos Louizos 🔗 |
-
|
SAIL: Self-improving Efficient Online Alignment of Large Language Models ( Poster ) > link | Mucong Ding · Souradip Chakraborty · Vibhu Agrawal · Zora Che · Alec Koppel · Mengdi Wang · Amrit Singh Bedi · Furong Huang 🔗 |
-
|
A deeper look at depth pruning of LLMs ( Poster ) > link | Shoaib Ahmed Siddiqui · Xin Dong · Greg Heinrich · Thomas Breuel · Jan Kautz · David Krueger · Pavlo Molchanov 🔗 |
-
|
RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation ( Poster ) > link | Chanwoo Park · Mingyang Liu · Dingwen Kong · Kaiqing Zhang · Asuman Ozdaglar 🔗 |
-
|
One-Shot Safety Alignment for Large Language Models via Optimal Dualization ( Poster ) > link | Xinmeng Huang · Shuo Li · Edgar Dobriban · Osbert Bastani · Hamed Hassani · Dongsheng Ding 🔗 |
-
|
Progressive distillation improves feature learning via implicit curriculum ( Poster ) > link | Abhishek Panigrahi · Bingbin Liu · Sadhika Malladi · Andrej Risteski · Surbhi Goel 🔗 |
-
|
How In-Context Learning Emerges from Training on Unstructured Data: The Role of Co-Occurrence, Positional Information, and Noise Structures ( Poster ) > link | Kevin Christian Wibisono · Yixin Wang 🔗 |