Fast linear transforms are ubiquitous in machine learning, including the discrete Fourier transform, discrete cosine transform, and other structured transformations such as convolutions. All of these transforms can be represented by dense matrixvector multiplication, yet each has a specialized and highly efficient (subquadratic) algorithm. We ask to what extent handcrafting these algorithms and implementations is necessary, what structural prior they encode, and how much knowledge is required to automatically learn a fast algorithm for a provided structured transform. Motivated by a characterization of fast matrixvector multiplication as products of sparse matrices, we introduce a parameterization of divideandconquer methods that is capable of representing a large class of transforms. This generic formulation can automatically learn an efficient algorithm for many important transforms; for example, it recovers the $O(N \log N)$ CooleyTukey FFT algorithm to machine precision, for dimensions $N$ up to $1024$. Furthermore, our method can be incorporated as a lightweight replacement of generic matrices in machine learning pipelines to learn efficient and compressible transformations. On a standard task of compressing a single hiddenlayer network, our method exceeds the classification accuracy of unconstrained matrices on CIFAR10 by 3.9 pointsthe first time a structured approach has done sowith 4X faster inference speed and 40X fewer parameters.
Author Information
Tri Dao (Stanford)
Albert Gu (Stanford University)
Matthew Eichhorn (University at Buffalo)
Atri Rudra (University at Buffalo, SUNY)
Christopher Re (Stanford)
Related Events (a corresponding poster, oral, or spotlight)

2019 Oral: Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations »
Thu Jun 13th 04:00  04:20 PM Room Hall A
More from the Same Authors

2019 Poster: Learning Dependency Structures for Weak Supervision Models »
Paroma Varma · Frederic Sala · Ann He · Alexander J Ratner · Christopher Re 
2019 Oral: Learning Dependency Structures for Weak Supervision Models »
Paroma Varma · Frederic Sala · Ann He · Alexander J Ratner · Christopher Re 
2019 Poster: A Kernel Theory of Modern Data Augmentation »
Tri Dao · Albert Gu · Alexander J Ratner · Virginia Smith · Christopher De Sa · Christopher Re 
2019 Oral: A Kernel Theory of Modern Data Augmentation »
Tri Dao · Albert Gu · Alexander J Ratner · Virginia Smith · Christopher De Sa · Christopher Re 
2018 Poster: Representation Tradeoffs for Hyperbolic Embeddings »
Frederic Sala · Christopher De Sa · Albert Gu · Christopher Re 
2018 Oral: Representation Tradeoffs for Hyperbolic Embeddings »
Frederic Sala · Christopher De Sa · Albert Gu · Christopher Re 
2017 Poster: Learning the Structure of Generative Models without Labeled Data »
Stephen Bach · Bryan He · Alexander J Ratner · Christopher Re 
2017 Talk: Learning the Structure of Generative Models without Labeled Data »
Stephen Bach · Bryan He · Alexander J Ratner · Christopher Re