Poster

Approximation to Smooth Functions by Low-Rank Swish Networks

Zimeng Li · Hongjun LI · Jingyuan Wang · Ke Tang

2025 Poster

[ Slides] [ OpenReview]

Abstract

While deep learning has witnessed remarkable achievements in a wide range of applications, its substantial computational cost imposes limitations on application scenarios of neural networks. To alleviate this problem, low-rank compression is proposed as a class of efficient and hardware-friendly network compression methods, which reduce computation by replacing large matrices in neural networks with products of two small ones. In this paper, we implement low-rank networks by inserting a sufficiently narrow linear layer without bias between each of two adjacent nonlinear layers. We prove that low-rank Swish networks with a fixed depth are capable of approximating any function from the Hölder ball $\mathcal{C}^{\beta, R}([0,1]^d)$ within an arbitrarily small error where $\beta$ is the smooth parameter and $R$ is the radius. Our proposed constructive approximation ensures that the width of linear hidden layers required for approximation is no more than one-third of the width of nonlinear layers, which implies that the computational cost can be decreased by at least one-third compared with a network with the same depth and width of nonlinear layers but without narrow linear hidden layers. Our theoretical finding can offer a theoretical basis for low-rank compression from the perspective of universal approximation theory.

Lay Summary

Deep learning drives breakthroughs in AI, but its massive computational demands hinder real-world use. Cutting network size via low-rank compression shows promise, yet lacks theoritical guarantees, limiting reliability. We offer a theoretical basis for low-rank compression from the perspective of universal approximation theory by proving any function from a board class can be approximated by a Swish network with low-rank weight matrices. Our findings partially guarantee that low-rank compression often serves as a viable approach for network compression, as it generally maintains performance while reducing model size.

Video

Chat is not available.