Skip to yearly menu bar Skip to main content


Poster

BiE: Bi-Exponent Block Floating-Point for Large Language Models Quantization

Lancheng Zou · Wenqian Zhao · Shuo Yin · Chen Bai · Qi Sun · Bei Yu


Abstract:

Nowadays, Large Language Models (LLMs) mostly possess billions of parameters, bringing significant challenges to hardware platforms. Although quantization is an efficient approach to reduce computation and memory overhead for inference optimization,we stress the challenge that mainstream low-bit quantization approaches still suffer from either various data distribution outliers or a lack of hardware efficiency.We also find that low-bit data format has further potential expressiveness to cover the atypical language data distribution.In this paper, we propose a novel numerical representation, Bi-Exponent Block Floating Point (BiE), and a new quantization flow.BiE quantization shows accuracy superiority and hardware friendliness on various models and benchmarks.

Live content is unavailable. Log in and register to view live content