Timezone: »

Towards Structured Sparsity in Transformers for Efficient Inference
Harry Dong · Beidi Chen · Yuejie Chi
Event URL: https://openreview.net/forum?id=c4m0BkO4OL »

Transformer models have been critical in accelerating progress in numerous fields, yet scaling these models come at high computational costs. In this paper, we explore sparsity properties in transformers and manipulate existing sparsity in transformers to be more structured for efficient training and inference. In particular, we create sparse structures that have inter-layer similarity and are block sparse which have the potential to bypass a significant amount of model loading and computation. We present preliminary results and ideas using a small transformer which we hope to extend to more complex models.

Author Information

Harry Dong (Carnegie Mellon University)
Beidi Chen (CMU / FAIR)
Yuejie Chi (CMU)

More from the Same Authors