Skip to yearly menu bar Skip to main content


Poster

IM-Unpack: Training and Inference with Arbitrarily Low Precision Integers

Zhanpeng Zeng · Karthikeyan Sankaralingam · Vikas Singh


Abstract:

GEneral Matrix Multiply (GEMM) is a central operation in deep learning and corresponds to the largest chunk of the compute footprint. Therefore, improving its efficiency is an active topic of ongoing research. A popular strategy is the use of low bit-width integers to approximate the original entries in a matrix. This allows efficiency gains, but often requires sophisticated techniques to control the rounding error incurred. In this work, we first verify that if the low bit-width restriction is removed, for a variety of Transformer-based models, integers are, in fact, sufficient for all GEMMs need -- for both training and inference stages, and achieve parity compared to floating point counterparts. No sophisticated techniques are needed. We find that while a large majority of entries in matrices (encountered in such models) can be easily represented by low bit-width integers, the existence of a few heavy hitter entries make it difficult to achieve efficiency gains via the exclusive use of low bit-width GEMMs alone. To address this issue, we develop a simple algorithm, Integer Matrix Unpacking (IM-Unpack), to unpack a matrix with large integer entries into a larger matrix whose entries all lie within the representable range of arbitrarily low bit-width integers. This allows equivalence with the original GEMM, i.e., the exact result can be obtained using purely low bit-width integer GEMMs. This comes at the cost of additional operations -- we show that for many popular models, this overhead is quite small.

Live content is unavailable. Log in and register to view live content