Timezone: »

SWALP : Stochastic Weight Averaging in Low Precision Training
Guandao Yang · Tianyi Zhang · Polina Kirichenko · Junwen Bai · Andrew Wilson · Christopher De Sa

Tue Jun 11 05:05 PM -- 05:10 PM (PDT) @ Hall B

Low precision operations can provide scalability, memory savings, portability, and energy efficiency. This paper proposes SWALP, an approach to low precision training that averages low-precision SGD iterates with a modified learning rate schedule. SWALP is easy to implement and can match the performance of full-precision SGD even with all numbers quantized down to 8 bits, including the gradient accumulators. Additionally, we show that SWALP converges arbitrarily close to the optimal solution for quadratic objectives, and to a noise ball asymptotically smaller than low precision SGD in strongly convex settings.

Author Information

Guandao Yang (Cornell University)
Tianyi Zhang (Cornell University)
Polina Kirichenko (Cornell)
Junwen Bai (Cornell)
Andrew Wilson (Cornell University)
Andrew Wilson

Andrew Gordon Wilson is faculty in the Courant Institute and Center for Data Science at NYU. His interests include probabilistic modelling, Gaussian processes, Bayesian statistics, physics inspired machine learning, and loss surfaces and generalization in deep learning. His webpage is https://cims.nyu.edu/~andrewgw.

Christopher De Sa (Cornell)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors