Poster
in
Workshop: ES-FoMo II: 2nd Workshop on Efficient Systems for Foundation Models
AdaInf: Adaptive Inference for Resource-Constrained Foundation Models
Zhuoyan Xu · Khoi Nguyen · Preeti Mukherjee · Somali Chaterji · Yingyiu Liang · Yin Li
Foundation models have emerged as a powerful tool in AI, yet come with substantial computational cost, limiting their deployment in resource-constraint devices. Several recent research has been dedicated to improving the efficiency of foundation models. These prior solutions often yield models with static accuracy and latency footprint, and thus fall short in responding to potential runtime perturbations, including varying input characteristics (e.g., a static video vs.\ a dynamic one) or changing resource availability (e.g., contention due to other programs on the device). To bridge this gap, we introduce \textbf{AdaInf}---an adaptive inference framework that treats a foundation model as a collection of execution branches, and learns a scheduler to decide on which branch to execute, accounting for the input data and a compute budget. We demonstrate preliminary results on CIFAR and ImageNet with vision and vision-language models and across convolutional networks and Transformers. Our results show that AdaInf can achieve varying accuracy and latency trade-offs. When compared to latest method, AdaInf attains a major improvement in accuracy under a wide range of latency budgets.