Tutorial

Physics of Language Models

Zeyuan Allen-Zhu

2024 Tutorial

Project Page

Abstract

We divide "intelligence" into multiple dimensions (like language structures, knowledge, reasoning, etc.). For each dimension, we create synthetic data for LLM pretraining to understand the theory and push the capabilities of LLMs to the extreme.

Unlike benchmarking, by controlling the synthetic data, we aim to discover universal laws of all LLMs, not just a specific version like GPT/Llama. By tweaking hyperparameters such as data amount, type, difficulty, and format, we determine factors affecting LLM performance and suggest improvements.

Unlike black-box training, we develop advanced probing techniques to examine the inner workings of LLMs and understand their hidden mental processes. This helps us gain a deeper understanding of how these AI models function and moves us closer to creating more powerful and transparent AI systems.

This talk will cover language structures (Part 1), reasoning (Part 2), and knowledge (Part 3). These sections explain why and how language models succeed or fail on certain AI tasks and provide practical suggestions for necessary changes to (1) model architecture, (2) data preparation, and (3) the training process to move us closer to AGI.

Video

Chat is not available.