Skip to yearly menu bar Skip to main content


Efficient LLM Pruning with Global Token-Dependency Awareness and Hardware-Adapted Inference

Oshin Dutta · Ritvik Gupta · Sumeet Agarwal

Abstract

Chat is not available.