Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Dynamic Neural Networks

Is a Modular Architecture Enough?

Sarthak Mittal · Yoshua Bengio · Guillaume Lajoie


Abstract:

Inspired from human cognition, machine learning systems are now revealing advantages of sparser and more modular architectures. Recent works demonstrate that not only do some modular architectures generalize well, but they also lead to better out-of-distribution generalization, scaling properties, learning speed, and interpretability. A key intuition behind the success of such systems is that the data generating system for most real-world settings is considered to consist of sparsely interacting parts, promoting the use of similar inductive biases in the models. However, the field has been lacking in a rigorous quantitative assessment of such systems because these real-world data distributions are complex and unknown. Hence, we provide a thorough assessment of common modular architectures, through the lens of simple and known modular data distributions. We highlight the benefits of modularity and sparsity and reveal insights on the challenges faced while optimizing modular systems. We also propose evaluation metrics that highlight the regimes in which these benefits of modularity are substantial, as well as the sub-optimality of current end-to-end learned modular systems as opposed to their claimed potential.

Chat is not available.