Poster

BiE: Bi-Exponent Block Floating-Point for Large Language Models Quantization

Lancheng Zou · Wenqian Zhao · Shuo Yin · Chen Bai · Qi Sun · Bei Yu

2024 Poster

Paper PDF [ Slides] [ Poster]

Abstract

Nowadays, Large Language Models (LLMs) mostly possess billions of parameters, bringing significant challenges to hardware platforms. Although quantization is an efficient approach to reduce computation and memory overhead for inference optimization, we stress the challenge that mainstream low-bit quantization approaches still suffer from either various data distribution outliers or a lack of hardware efficiency. We also find that low-bit data format has further potential expressiveness to cover the atypical language data distribution. In this paper, we propose a novel numerical representation, Bi-Exponent Block Floating Point (BiE), and a new quantization flow. BiE quantization shows accuracy superiority and hardware friendliness on various models and benchmarks.

Chat is not available.