Skip to yearly menu bar Skip to main content


Poster
in
Workshop: New Frontiers in Learning, Control, and Dynamical Systems

Statistics estimation in neural network training: a recursive identification approach

Ruth Crasto · Xuchan Bao · Roger Grosse


Abstract:

A common practice in mini-batch neural network training is to estimate global statistics using exponential moving averages (EMA). However, such methods can be sensitive to the EMA decay parameter, which is typically set by hand. In this paper, we introduce Adaptive Linear State Estimation (ALiSE), an online method for adapting the parameters of a linear estimation model such as an EMA. Our work establishes a connection between parameter estimation methods in deep learning, including ALiSE, and recursive identification techniques in control theory. We apply ALiSE to a range of deep learning scenarios and show that it can learn sensible schedules for the EMA decay parameter. Compared to the naive EMA baseline, ALiSE leads to matching or accelerated convergence during training.

Chat is not available.