Workshop
Next Generation of Sequence Modeling Architectures
Caglar Gulcehre · Razvan Pascanu · Antonio Orvieto · Carmen Amo Alonso · Maciej Wołczyk
Straus 3
Fri 26 Jul, midnight PDT
This workshop aims to bring together various researchers to chart the course for the next generation of sequence models. The focus will be on better understanding the limitations of existing models like transformer architectures, recurrent neural networks, and state space models (e.g., S4, Mamba), as well as describing existing open problems. We will touch on topics such as memory, long-range context and in-context learning, optimization stability of these architectures, and their ability to represent different classes of problems. We will also cover interpretability and pragmatic aspects of getting these models to be efficient and perform well: how they should be scaled up, and the trade-offs and limitations imposed by current hardware. We will place additional emphasis on the discussion regarding how we should evaluate and benchmark sequential models at scale, for example, in the context of language or other domains like vision, audio, or biological signals.
Schedule
Fri 12:00 a.m. - 12:10 a.m.
|
Opening Remarks
(
Opening Remarks
)
>
SlidesLive Video |
Caglar Gulcehre · Razvan Pascanu · Antonio Orvieto · Carmen Amo Alonso · Maciej Wołczyk 🔗 |
Fri 12:10 a.m. - 12:40 a.m.
|
Sepp Hochreiter
(
Invited Talk
)
>
SlidesLive Video |
Sepp Hochreiter 🔗 |
Fri 12:40 a.m. - 1:00 a.m.
|
Poster Spotlights
(
Poster Spotlight Talks
)
>
SlidesLive Video |
🔗 |
Fri 1:00 a.m. - 1:30 a.m.
|
Albert Gu
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Fri 1:30 a.m. - 2:00 a.m.
|
Soham De
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Fri 2:00 a.m. - 2:30 a.m.
|
Angela Fan
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Fri 2:30 a.m. - 3:00 a.m.
|
Joao Sacramento
(
Invited Talk
)
>
SlidesLive Video |
Joao Sacramento 🔗 |
Fri 3:00 a.m. - 4:30 a.m.
|
Poster Session
(
Poster Session
)
>
|
🔗 |
Fri 4:30 a.m. - 5:00 a.m.
|
Lunch Break
(
Lunch Break
)
>
|
🔗 |
Fri 5:00 a.m. - 5:30 a.m.
|
Stephanie Chan
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Fri 5:30 a.m. - 6:00 a.m.
|
Hava Siegelman
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
Fri 6:00 a.m. - 6:30 a.m.
|
Poster Spotlights
(
Poster Spotlight Talks
)
>
SlidesLive Video |
🔗 |
Fri 6:30 a.m. - 7:00 a.m.
|
Coffee Break
(
Coffee Break
)
>
|
🔗 |
Fri 7:00 a.m. - 8:00 a.m.
|
Panel Discussion
(
Panel
)
>
SlidesLive Video |
🔗 |
-
|
Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling ( Poster ) > link | Raunaq Bhirangi · Chenyu Wang · Venkatesh Pattabiraman · Carmel Majidi · Abhinav Gupta · Tess Hellebrekers · Lerrel Pinto 🔗 |
-
|
Fleet of Agents: Coordinated Problem Solving with Large Language Models using Genetic Particle Filtering ( Poster ) > link | Akhil Arora · Lars Klein · Nearchos Potamitis · Roland Aydin · Caglar Gulcehre · Robert West 🔗 |
-
|
Delay Embedding Theory of Neural Sequence Models ( Poster ) > link | Mitchell Ostrow · Adam Eisen · Ila R. Fiete 🔗 |
-
|
Pretrained Hybrids with MAD Skills ( Poster ) > link | Nicholas Roberts · Samuel Guo · Zhiqi Gao · Satya Sai Srinath Namburi GNVV · Sonia Cromp · Chengjun Wu · Chengyu Duan · Frederic Sala 🔗 |
-
|
When can transformers compositionally generalize in-context? ( Poster ) > link | Seijin Kobayashi · Simon Schug · Yassir Akram · Florian Redhardt · Johannes Von Oswald · Razvan Pascanu · Guillaume Lajoie · Joao Sacramento 🔗 |
-
|
Multi-Task Instruction Training of Text Diffusion Models ( Poster ) > link | Changyou Chen · Gargi Balasubramaniam · Rui Meng · Han Zhao · Bunyamin Sisman · qingjun cui 🔗 |
-
|
HiPPO-Prophecy: State-Space Models can Provably Learn Dynamical Systems in Context ( Poster ) > link | Federico Arangath Joseph · Noah Liniger · Kilian Haefeli 🔗 |
-
|
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts ( Poster ) > link | Maciej Pióro · Kamil Ciebiera · Krystian Król · Jan Ludziejewski · Michał Krutul · Jakub Krajewski · Szymon Antoniak · Piotr Milos · Marek Cygan · Sebastian Jaszczur 🔗 |
-
|
Needle in the Haystack for Memory Based Large Language Models ( Poster ) > link | Elliot Nelson · Soham Dan · Georgios Kollias · Payel Das · Subhajit Chaudhury 🔗 |
-
|
QSMixer: Connecting SSMs with Mixer Models via Quasi-Separable Matrices ( Poster ) > link | Ali Behrouz · Michele Santacatterina · Ramin Zabih 🔗 |
-
|
Parallelizing Autoregressive Generation with Variational State-Space Models ( Poster ) > link | Gaspard Lambrechts · Yann Claes · Pierre Geurts · Damien Ernst 🔗 |
-
|
ECG Signal Denoising Using Multi-scale Patch Embedding and Transformers ( Poster ) > link | Ding Zhu · Vishnu Chhabra · Mohammad Mahdi Khalili 🔗 |
-
|
The Role of State Matrix Initialization in SSMs: A Perspective on the Approximation-Estimation Tradeoff ( Poster ) > link | Fusheng Liu · Qianxiao Li 🔗 |
-
|
State soup: in-context skill learning, retrieval and mixing ( Poster ) > link | Maciej Pióro · Maciej Wołczyk · Razvan Pascanu · Johannes Von Oswald · Joao Sacramento 🔗 |
-
|
RotRNN: Modelling Long Sequences with Rotations ( Poster ) > link | Rares Dolga · Kai Biegun · Jake Cunningham · David Barber 🔗 |
-
|
Towards a theory of learning dynamics in deep state space models ( Poster ) > link | Jakub Smekal · Jimmy Smith · Michael Kleinman · Dan Biderman · Scott Linderman 🔗 |
-
|
Recurrent VAE with Gaussian Process Decoders for De novo Molecular Generation ( Poster ) > link | Vidhi Lalchand · David Lines · Neil Lawrence 🔗 |
-
|
DynaGraph: Dynamic Contrastive Graph for Interpretable Multi-label Prediction using Time-Series EHR Data ( Poster ) > link | Munib Mesinovic · Soheila Molaei · Peter Watkinson · Tingting Zhu 🔗 |
-
|
Repurposing Language Models into Embedding Models: Finding the Compute-Optimal Recipe ( Poster ) > link | Albert Jiang · Alicja Ziarko · Bartosz Piotrowski · Wenda Li · Mateja Jamnik · Piotr Milos 🔗 |
-
|
Enhancing Transformer RNNs with Multiple Temporal Perspectives ( Poster ) > link | Razvan Dumitru · Darius Peteleaza · Mihai Surdeanu 🔗 |
-
|
State Space Models for Brain Computer Interfaces? ( Poster ) > link | Pablo Soëtard · Miran Özdogan · Oiwi Parker Jones 🔗 |
-
|
Reparameterized Multi-Resolution Convolutions for Long Sequence Modelling ( Poster ) > link | Jake Cunningham · Giorgio Giannone · Mingtian Zhang · Marc Deisenroth 🔗 |
-
|
On the Power of Convolution-Augmented Transformer ( Poster ) > link | Mingchen Li · Xuechen Zhang · Yixiao HUANG · Samet Oymak 🔗 |
-
|
Vision-LSTM: xLSTM as Generic Vision Backbone ( Poster ) > link | Benedikt Alkin · Maximilian Beck · Korbinian Pöppel · Sepp Hochreiter · Johannes Brandstetter 🔗 |
-
|
EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation ( Poster ) > link | Yuqiao Wen · Behzad Shayegh · Chenyang Huang · Yanshuai Cao · Lili Mou 🔗 |
-
|
Towards Dynamic Feature Acquisition on Medical Time Series by Maximizing Conditional Mutual Information ( Poster ) > link | Fedor Sergeev · Paola Malsot · Gunnar Ratsch · Vincent Fortuin 🔗 |
-
|
Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective ( Poster ) > link | Zhen Qin · Xuyang Shen · Dong Li · Weigao Sun · Stan Birchfield · Richard I Hartley · Yiran Zhong 🔗 |
-
|
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents ( Poster ) > link | Zihao Wang · Shaofei Cai · Zhancun Mu · Haowei Lin · Ceyao Zhang · Xuejie Liu · Qing Li · Anji Liu · Xiaojian Ma · Yitao Liang 🔗 |
-
|
Reservoir Structured State Space Models ( Poster ) > link | Giuseppe Lombardi · Claudio Gallicchio · Andrea Ceni 🔗 |
-
|
Recurrent Action Transformer with Memory ( Poster ) > link | Egor Cherepanov · Aleksei Staroverov · Dmitry Yudin · Alexey Kovalev · Aleksandr Panov 🔗 |
-
|
Rough Transformers: Lightweight Continuous-Time Sequence Modelling with Path Signatures ( Poster ) > link | Fernando Moreno-Pino · Alvaro Arroyo · Harrison Waldon · Xiaowen Dong · Alvaro Cartea 🔗 |
-
|
Viewing Attention as a Recurrent Neural Network ( Poster ) > link | Leo Feng · Frederick Tung · Hossein Hajimirsadeghi · Mohamed Osama Ahmed · Yoshua Bengio · Greg Mori 🔗 |
-
|
Latte: Latent Attention for Linear Time Transformers ( Poster ) > link | Rares Dolga · Marius Cobzarenco · Ahmed Shahin · David Barber 🔗 |
-
|
An All-MLP Sequence Modeling Architecture That Excels at Copying ( Poster ) > link | Chenwei Cui · Zehao Yan · Gedeon Muhawenayo · Hannah Kerner 🔗 |
-
|
Randomized Signatures for processing long-range Sequences on Graphs ( Poster ) > link | Lukas Gruber · Bernhard Schäfl · Johannes Brandstetter · Sepp Hochreiter 🔗 |
-
|
Selective Attention: Enhancing Transformer through Principled Context Control ( Poster ) > link | Xuechen Zhang · Xiangyu Chang · Mingchen Li · Amit Roy-Chowdhury · Jiasi Chen · Samet Oymak 🔗 |
-
|
FutureTST: When Transformers Meet Future Exogenous Drivers ( Poster ) > link | Kshitij Tayal · Arvind Renganathan · Vipin Kumar · Dan Lu 🔗 |
-
|
On Feature Learning in Structured State Space Models ( Poster ) > link | Leena Chennuru Vankadara · Jin Xu · Moritz Haas · Volkan Cevher 🔗 |
-
|
xLSTM: Extended Long Short-Term Memory ( Poster ) > link | Maximilian Beck · Korbinian Pöppel · Markus Spanring · Andreas Auer · Oleksandra Prudnikova · Michael Kopp · Günter Klambauer · Johannes Brandstetter · Sepp Hochreiter 🔗 |
-
|
Investigating Low-Rank Training in Transformer Language Models: Efficiency and Scaling Analysis ( Poster ) > link | Xiuying Wei · Skander Moalla · Razvan Pascanu · Caglar Gulcehre 🔗 |
-
|
MSAMamba: Adapting Subquadratic Models To Long-Context DNA MSA Analysis ( Poster ) > link | Vishrut Thoutam · Dina Ellsworth 🔗 |
-
|
BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts ( Poster ) > link |
11 presentersQizhen Zhang · Nikolas Gritsch · Dwaraknath Gnaneshwar · Simon Guo · David Cairuz · Bharat Venkitesh · Jakob Foerster · Phil Blunsom · Sebastian Ruder · Ahmet Üstün · Acyr Locatelli |
-
|
Reservoir Memory Networks: Long-range temporal dependencies with untrained RNNs ( Poster ) > link | Claudio Gallicchio · Andrea Ceni 🔗 |
-
|
On the Bottleneck of State Space Models: Locality and Oversmoothing ( Poster ) > link | Pragya Srivastava · Peihao Wang · Ruisi Cai · Jiajun Zhu · Pan Li · Zhangyang “Atlas” Wang 🔗 |
-
|
LongSSM: On the Length Extension of State-space Models in Language Modelling ( Poster ) > link | Shida Wang 🔗 |
-
|
KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty ( Poster ) > link | Philipp Becker · Niklas Freymuth · Gerhard Neumann 🔗 |
-
|
Associative Recurrent Memory Transformer ( Poster ) > link | Ivan Rodkin · Yuri Kuratov · Aidar Bulatov · Mikhail Burtsev 🔗 |
-
|
Orthogonal residual connections for long-term memory retention in recurrent neural networks ( Poster ) > link | Andrea Ceni · Claudio Gallicchio 🔗 |
-
|
Enhancing Sequence Modeling with Multi-Resolution State Space Models ( Poster ) > link | Mahdi Karami · Ali Behrouz 🔗 |
-
|
SeRpEnt: Selective Resampling for Expressive State Space Models ( Poster ) > link | Stefano Rando · Luca Romani · Matteo Migliarini · Denis Gudovskiy · Luca Franco · Luca Rigazio · Fabio Galasso 🔗 |
-
|
Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models ( Poster ) > link | Ali Behrouz · Michele Santacatterina · Ramin Zabih 🔗 |
-
|
Q-S5 Towards Quantized State Space Models ( Poster ) > link | Steven Abreu · Jens Egholm Pedersen · Kade Heckel · Alessandro Pierro 🔗 |
-
|
Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models ( Poster ) > link | Mohammad Shahab Sepehri · Zalan Fabian · Mahdi Soltanolkotabi 🔗 |
-
|
Probing the Decision Boundaries of In-context Learning in Large Language Models ( Poster ) > link | Siyan Zhao · Tung Nguyen · Aditya Grover 🔗 |
-
|
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations ( Poster ) > link | Alexander Hägele · Elie Bakouch · Atli Kosson · Loubna Ben allal · Leandro Von Werra · Martin Jaggi 🔗 |
-
|
Length independent generalization bounds for deep SSM architectures ( Poster ) > link | Dániel Rácz · Mihaly Petreczky · Balint Daroczy 🔗 |