Skip to yearly menu bar Skip to main content


Poster
in
Workshop: ES-FoMo II: 2nd Workshop on Efficient Systems for Foundation Models

Exploring Monotonicity in Early-Exiting Language Models

Filipe Laitenberger · Max Belitsky · Denys Sheremet


Abstract:

Large Language Models (LLMs) have shown impressive results across the board, but inference can be costly. A promising solution is posed by early exiting methods that assume that not all tokens need the same amount of computation, exiting the LLM at earlier layers. Several early exiting methods have been proposed, which rely on the implicit assumption that as the network does more computation, it will become more confident in its prediction.We investigate this assumption for two early exiting methods and propose three new confidence measures for early exiting based on the insights.We find early evidence for monotonicity benefitting the quality of token generation.

Chat is not available.