Timezone: »

Generalization Guarantees for Neural Architecture Search with Train-Validation Split
Samet Oymak · Mingchen Li · Mahdi Soltanolkotabi

Wed Jul 21 05:40 AM -- 05:45 AM (PDT) @ None

Neural Architecture Search (NAS) is a popular method for automatically designing optimized deep-learning architectures. NAS methods commonly use bilevel optimization where one optimizes the weights over the training data (lower-level problem) and hyperparameters - such as the architecture - over the validation data (upper-level problem). This paper explores the statistical aspects of such problems with train-validation splits. In practice, the lower-level problem is often overparameterized and can easily achieve zero loss. Thus, a-priori, it seems impossible to distinguish the right hyperparameters based on training loss alone which motivates a better understanding of train-validation split. To this aim, we first show that refined properties of the validation loss such as risk and hyper-gradients are indicative of those of the true test loss and help prevent overfitting with a near-minimal validation sample size. Importantly, this is established for continuous search spaces which are relevant for differentiable search schemes. We then establish generalization bounds for NAS problems with an emphasis on an activation search problem and gradient-based methods. Finally, we show rigorous connections between NAS and low-rank matrix learning which leads to algorithmic insights where the solution of the upper problem can be accurately learned via spectral methods to achieve near-minimal risk.

Author Information

Samet Oymak (University of California, Riverside)
Mingchen Li (University of California, Riverside)
Mahdi Soltanolkotabi (University of Southern California)

Mahdi Soltanolkotabi is an assistant professor in the Ming Hsieh Department of Electrical and Computer Engineering and Computer Science at the University of Southern California where he holds an Andrew and Erna Viterbi Early Career Chair. Prior to joining USC, he completed his PhD in electrical engineering at Stanford in 2014. He was a postdoctoral researcher in the EECS department at UC Berkeley during the 2014-2015 academic year. His research focuses on developing the mathematical foundations of data analysis at the confluence of optimization, machine learning, signal processing, high dimensional statistics, computational imaging and artificial intelligence. Mahdi is the recipient of the Packard Fellowship in Science and Engineering, a Sloan Research Fellowship, an NSF Career award, an Airforce Office of Research Young Investigator award (AFOSR-YIP), and a Google faculty research award.

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors