Poster
in
Workshop: ES-FoMo II: 2nd Workshop on Efficient Systems for Foundation Models
Performance Control in Early Exiting to Deploy Large Models at the Same Cost of Smaller Ones
Mehrnaz Mofakhami · Reza Bayat · Ioannis Mitliagkas · Joao Monteiro · Valentina Zantedeschi
Early Exiting (EE) is a promising technique for speeding up inference at the cost of limited performance loss. It adaptively allocates compute budget to a datapoint based on its difficulty by exiting at earlier layers. In this study, we first present a novel perspective on EE by demonstrating that it should be used to deploy larger models in order to achieve higher performance while maintaining the low computational cost of small models. As existing EE approaches rely on confidence estimation at each exit point, we further study the impact of overconfidence on the controllability of the compute/performance trade-off. We introduce PCEE (Performance Control Early Exiting), a method that ensures a lower bound on accuracy, hence facilitating accurate adaptation of EE methods for practical use. In our experiments with MSDNets and Vision Transformer architectures on CIFAR-10, CIFAR-100, and ImageNet, we show that PCEE offers a simple yet computationally efficient approach that in most cases provides better controllability over performance than standard confidence-based approaches, and, interestingly, allows us to scale up model sizes to yield cost reductions and performance gain.