Timezone: »

From block-Toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked Transformers
Krzysztof Choromanski · Han Lin · Haoxian Chen · Tianyi Zhang · Arijit Sehanobish · Valerii Likhosherstov · Jack Parker-Holder · Tamas Sarlos · Adrian Weller · Thomas Weingarten

Thu Jul 21 03:00 PM -- 05:00 PM (PDT) @ Hall E #406

In this paper we provide, to the best of our knowledge, the first comprehensive approach for incorporating various masking mechanisms into Transformers architectures in a scalable way. We show that recent results on linear causal attention (Choromanski et al., 2021) and log-linear RPE-attention (Luo et al., 2021) are special cases of this general mechanism. However by casting the problem as a topological (graph-based) modulation of unmasked attention, we obtain several results unknown before, including efficient d-dimensional RPE-masking and graph-kernel masking. We leverage many mathematical techniques ranging from spectral analysis through dynamic programming and random walks to new algorithms for solving Markov processes on graphs. We provide a corresponding empirical evaluation.

Author Information

Krzysztof Choromanski (Google Brain Robotics)
Han Lin (Columbia University)

Columbia master student major in computer science. Research interests focus on the theories of structured random featuresfor kernel approximation and their applications to build efficient Transformers and GNNs.

Haoxian Chen (Columbia University)
Tianyi Zhang (Columbia University)
Arijit Sehanobish (Covera Health)
Valerii Likhosherstov (University of Cambridge)
Jack Parker-Holder (University of Oxford)
Tamas Sarlos (Google)
Adrian Weller (University of Cambridge, Alan Turing Institute)
Adrian Weller

Adrian Weller is Programme Director for AI at The Alan Turing Institute, the UK national institute for data science and AI, and is a Turing AI Fellow leading work on trustworthy Machine Learning (ML). He is a Principal Research Fellow in ML at the University of Cambridge, and at the Leverhulme Centre for the Future of Intelligence where he is Programme Director for Trust and Society. His interests span AI, its commercial applications and helping to ensure beneficial outcomes for society. Previously, Adrian held senior roles in finance. He received a PhD in computer science from Columbia University, and an undergraduate degree in mathematics from Trinity College, Cambridge.

Thomas Weingarten (Google)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors