ICML FusionToken: Enhancing Compression and Efficiency in Language Model Tokenization

Poster
in
Workshop: Neural Compression: From Information Theory to Applications

FusionToken: Enhancing Compression and Efficiency in Language Model Tokenization

Robert Kwiatkowski · Zijian Wang · Robert Giaquinto · Varun Kumar · Xiaofei Ma · Anoop Deoras · Bing Xiang · Ben Athiwaratkun

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

We propose FusionToken, a novel method that substantially enhances the conventional Byte Pair Encoding (BPE) approach in data encoding for language models. FusionToken employs a more aggressive computational strategy compared to BPE, expanding the token groups from bi-grams to 10-grams. Remarkably, with the addition of just 1,000 tokens to the vocabulary, the compression rate significantly surpasses that of a regular BPE tokenizer with a vocabulary of one million. Overall, the FusionToken method leads to noticeable performance improvements due to an increased data scope per compute unit and faster inference times due to fewer tokens per given string. By devoting more compute resources to the tokenizer building process, FusionToken maximizes the potential of language models as efficient data compression engines, enabling more effective language modeling systems.

Chat is not available.

Poster in Workshop: Neural Compression: From Information Theory to Applications

FusionToken: Enhancing Compression and Efficiency in Language Model Tokenization

Robert Kwiatkowski · Zijian Wang · Robert Giaquinto · Varun Kumar · Xiaofei Ma · Anoop Deoras · Bing Xiang · Ben Athiwaratkun

Poster
in
Workshop: Neural Compression: From Information Theory to Applications