Timezone: »
Mixture-of-Experts (MoE) is a widely popular model for ensemble learning and is a basic building block of highly successful modern neural networks as well as a component in Gated Recurrent Units (GRU) and Attention networks. However, present algorithms for learning MoE, including the EM algorithm and gradient descent, are known to get stuck in local optima. From a theoretical viewpoint, finding an efficient and provably consistent algorithm to learn the parameters remains a long standing open problem for more than two decades. In this paper, we introduce the first algorithm that learns the true parameters of a MoE model for a wide class of non-linearities with global consistency guarantees. While existing algorithms jointly or iteratively estimate the expert parameters and the gating parameters in the MoE, we propose a novel algorithm that breaks the deadlock and can directly estimate the expert parameters by sensing its echo in a carefully designed cross-moment tensor between the inputs and the output. Once the experts are known, the recovery of gating parameters still requires an EM algorithm; however, we show that the EM algorithm for this simplified problem, unlike the joint EM algorithm, converges to the true parameters. We empirically validate our algorithm on both the synthetic and real data sets in a variety of settings, and show superior performance to standard baselines.
Author Information
Ashok Vardhan Makkuva (UIUC)
Pramod Viswanath (UIUC)
Sreeram Kannan (University of Washington)
Sewoong Oh (University of Washington)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms »
Fri. Jun 14th 01:30 -- 04:00 AM Room Pacific Ballroom #210
More from the Same Authors
-
2021 : Robust and Differentially Private Covariance Estimation »
Logan Gnanapragasam · Jonathan Hayase · Sewoong Oh -
2023 Poster: Private Federated Learning with Autotuned Compression »
Enayat Ullah · Christopher Choquette-Choo · Peter Kairouz · Sewoong Oh -
2023 Poster: Why Is Public Pretraining Necessary for Private Model Training? »
Arun Ganesh · Mahdi Haghifam · Milad Nasresfahani · Sewoong Oh · Thomas Steinke · Om Thakkar · Abhradeep Guha Thakurta · Lun Wang -
2023 Poster: CRISP: Curriculum based Sequential neural decoders for Polar code family »
S Ashwin Hebbar · Viraj Nadkarni · Ashok Vardhan Makkuva · Suma Bhat · Sewoong Oh · Pramod Viswanath -
2021 Poster: KO codes: inventing nonlinear encoding and decoding for reliable wireless communication via deep-learning »
Ashok Vardhan Makkuva · Xiyang Liu · Mohammad Vahid Jamali · Hessam Mahdavifar · Sewoong Oh · Pramod Viswanath -
2021 Spotlight: KO codes: inventing nonlinear encoding and decoding for reliable wireless communication via deep-learning »
Ashok Vardhan Makkuva · Xiyang Liu · Mohammad Vahid Jamali · Hessam Mahdavifar · Sewoong Oh · Pramod Viswanath -
2020 Poster: Optimal transport mapping via input convex neural networks »
Ashok Vardhan Makkuva · Amirhossein Taghvaei · Sewoong Oh · Jason Lee -
2019 Poster: Rate Distortion For Model Compression:From Theory To Practice »
Weihao Gao · Yu-Han Liu · Chong Wang · Sewoong Oh -
2019 Oral: Rate Distortion For Model Compression:From Theory To Practice »
Weihao Gao · Yu-Han Liu · Chong Wang · Sewoong Oh