Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Structured Probabilistic Inference and Generative Modeling

Attention as Implicit Structural Inference

Ryan Singh · Christopher Buckley

Keywords: [ Graphical Models ] [ Predictive Coding ] [ Variational Inference ] [ attention ] [ Structural Inference ]


Abstract:

Attention mechanisms play a crucial role in cognitive systems by allowing them to flexibly allocate cognitive resources. Transformers, in particular, have become a dominant architecture in machine learning, with attention as their central innovation. However, the underlying intuition and formalism of attention in Transformers is based on ideas of keys and queries in database management systems. In this work, we pursue a structural inference perspective, building upon, and bringing together, previous theoretical descriptions of attention such as; Gaussian Mixture Models, alignment mechanisms and Hopfield Networks. Specifically, we demonstrate that attention can be viewed as inference over an implicitly defined set of possible adjacency structures in a graphical model, revealing the generality of such a mechanism. This perspective unifies different attentional architectures in machine learning and suggests potential modifications and generalizations of attention. We hope by providing a new lens on attention architectures our work can guide the development of new and improved attentional mechanisms.

Chat is not available.