ICML Progressive Knowledge Distillation: Balancing Inference Latency and Accuracy at Runtime

Poster
in
Workshop: ES-FoMo: Efficient Systems for Foundation Models

Progressive Knowledge Distillation: Balancing Inference Latency and Accuracy at Runtime

Don Kurian Dennis · Abhishek Shetty · Anish Sevekari · Kazuhito Koishida · Virginia Smith

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract: We study the problem of progressive distillation: Given a large, pretrained teacher model

g

$g$ , we seek to decompose the model into smaller, low-inference cost student models

f_{i}

$f_i$ , such that progressively evaluating additional models in this ensemble results in strict improvements over previous predictions. For user-facing inference applications, this allows us to flexibly trade accuracy for inference latency at runtime. We develop a boosting based algorithm, B-DISTIL, for progressive distillation, and demonstrate its effectiveness on standard datasets.

Chat is not available.

Poster in Workshop: ES-FoMo: Efficient Systems for Foundation Models

Progressive Knowledge Distillation: Balancing Inference Latency and Accuracy at Runtime

Don Kurian Dennis · Abhishek Shetty · Anish Sevekari · Kazuhito Koishida · Virginia Smith

Poster
in
Workshop: ES-FoMo: Efficient Systems for Foundation Models