Skip to yearly menu bar Skip to main content


Invited Talk
in
Workshop: Workshop on Theoretical Foundations of Foundation Models (TF2M)

Dan Alistarh (IST Austria): Model Compression at GPT Scale by Estimating Second-Order Information

Dan Alistarh

[ ]
Sat 27 Jul 1:35 a.m. PDT — 2:05 a.m. PDT

Abstract:

A key barrier to the wide deployment of highly-accurate machine learning models, whether for language or vision, is their high computational and memory overhead. Although we possess the mathematical tools for highly-accurate compression of such models, these theoretically-elegant techniques require second-order information of the model’s loss function, which is hard to even approximate efficiently at the scale of billion-parameter models. In this talk, I will describe our work on bridging this computational divide, which enables the accurate second-order pruning and quantization of models at truly massive scale. Thus, models with billions and even trillions of parameters can be executed efficiently on a few GPUs, with significant speedups, and negligible accuracy loss. Models created using our techniques have been downloaded millions of times from open-source repositories such as HuggingFace.

Chat is not available.