Poster
in
Workshop: ES-FoMo II: 2nd Workshop on Efficient Systems for Foundation Models
Exploring Monotonicity in Early-Exiting Language Models
Filipe Laitenberger · Max Belitsky · Denys Sheremet
Abstract:
Large Language Models (LLMs) have shown impressive results across the board, but inference can be costly. A promising solution is posed by early exiting methods that assume that not all tokens need the same amount of computation, exiting the LLM at earlier layers. Several early exiting methods have been proposed, which rely on the implicit assumption that as the network does more computation, it will become more confident in its prediction.We investigate this assumption for two early exiting methods and propose three new confidence measures for early exiting based on the insights.We find early evidence for monotonicity benefitting the quality of token generation.
Chat is not available.