Skip to yearly menu bar Skip to main content


Poster
in
Workshop: ES-FoMo II: 2nd Workshop on Efficient Systems for Foundation Models

CLAM: Unifying Finetuning, Quantization, and Pruning by Chaining LLM Adapter Modules

Neelay Velingker · Jason Liu · Amish Sethi · William Dodds · Zhiqiu (Oscar) Xu · Saikat Dutta · Mayur Naik · Eric Wong


Abstract:

As LLMs have grown in size and applicability, so too have the number of methods that adapt them for downstream tasks.Recent works to address challenges in memory consumption, task performance, and inference efficiency have led to the fields of parameter-efficient finetuning (PEFT), quantization, and pruning, among others.While it is useful to combine their benefits, composing these techniques in flexible ways is challenging due to the changes each method makes to the model and any restrictions they might impose.To address these challenges, we develop an algebraic abstraction called CLAM that enables unlimited chaining of popular resource-efficient methods on nearly every modern LLM with minimal overhead.We demonstrate that CLAM can create new compositions of techniques that achieve SOTA performance on specializing compressed models across multiple benchmarks.

Chat is not available.