Timezone: »
Recent advances in Transformer models allow for unprecedented sequence lengths, due to linear space and time complexity. In the meantime, relative positional encoding (RPE) was proposed as beneficial for classical Transformers and consists in exploiting lags instead of absolute positions for inference. Still, RPE is not available for the recent linear-variants of the Transformer, because it requires the explicit computation of the attention matrix, which is precisely what is avoided by such methods. In this paper, we bridge this gap and present Stochastic Positional Encoding as a way to generate PE that can be used as a replacement to the classical additive (sinusoidal) PE and provably behaves like RPE. The main theoretical contribution is to make a connection between positional encoding and cross-covariance structures of correlated Gaussian processes. We illustrate the performance of our approach on the Long-Range Arena benchmark and on music generation.
Author Information
Antoine Liutkus (Inria)
Ondřej Cífka (Télécom Paris, Institut Polytechnique de Paris)
Shih-Lun Wu (National Taiwan University)
Umut Simsekli (Inria/ENS)
Yi-Hsuan Yang (Academia Sinica)
Gaël RICHARD (Télécom Paris)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Oral: Relative Positional Encoding for Transformers with Linear Complexity »
Tue. Jul 20th 12:00 -- 12:20 PM Room
More from the Same Authors
-
2023 Poster: Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions »
Anant Raj · Lingjiong Zhu · Mert Gurbuzbalaban · Umut Simsekli -
2023 Poster: Generalization Bounds using Data-Dependent Fractal Dimensions »
Benjamin Dupuis · George Deligiannidis · Umut Simsekli -
2022 Poster: Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers »
Liam Hodgkinson · Umut Simsekli · Rajiv Khanna · Michael Mahoney -
2022 Spotlight: Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers »
Liam Hodgkinson · Umut Simsekli · Rajiv Khanna · Michael Mahoney -
2021 Poster: Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections »
Alexander D Camuto · Xiaoyu Wang · Lingjiong Zhu · Christopher Holmes · Mert Gurbuzbalaban · Umut Simsekli -
2021 Spotlight: Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections »
Alexander D Camuto · Xiaoyu Wang · Lingjiong Zhu · Christopher Holmes · Mert Gurbuzbalaban · Umut Simsekli -
2021 Poster: The Heavy-Tail Phenomenon in SGD »
Mert Gurbuzbalaban · Umut Simsekli · Lingjiong Zhu -
2021 Spotlight: The Heavy-Tail Phenomenon in SGD »
Mert Gurbuzbalaban · Umut Simsekli · Lingjiong Zhu -
2019 Poster: Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization »
Thanh Huy Nguyen · Umut Simsekli · Gaël RICHARD -
2019 Poster: Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions »
Antoine Liutkus · Umut Simsekli · Szymon Majewski · Alain Durmus · Fabian-Robert Stöter -
2019 Oral: Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization »
Thanh Huy Nguyen · Umut Simsekli · Gaël RICHARD -
2019 Oral: Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions »
Antoine Liutkus · Umut Simsekli · Szymon Majewski · Alain Durmus · Fabian-Robert Stöter -
2018 Poster: Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization »
Umut Simsekli · Cagatay Yildiz · Thanh Huy Nguyen · Ali Taylan Cemgil · Gaël RICHARD -
2018 Oral: Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization »
Umut Simsekli · Cagatay Yildiz · Thanh Huy Nguyen · Ali Taylan Cemgil · Gaël RICHARD