Next Generation of Sequence Modeling Architectures

Workshop

Next Generation of Sequence Modeling Architectures

Caglar Gulcehre · Razvan Pascanu · Antonio Orvieto · Carmen Amo Alonso · Maciej Wołczyk

Straus 3

Fri 26 Jul, midnight PDT

[ Abstract ] Workshop Website

This workshop aims to bring together various researchers to chart the course for the next generation of sequence models. The focus will be on better understanding the limitations of existing models like transformer architectures, recurrent neural networks, and state space models (e.g., S4, Mamba), as well as describing existing open problems. We will touch on topics such as memory, long-range context and in-context learning, optimization stability of these architectures, and their ability to represent different classes of problems. We will also cover interpretability and pragmatic aspects of getting these models to be efficient and perform well: how they should be scaled up, and the trade-offs and limitations imposed by current hardware. We will place additional emphasis on the discussion regarding how we should evaluate and benchmark sequential models at scale, for example, in the context of language or other domains like vision, audio, or biological signals.

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Fri 12:00 a.m. - 12:10 a.m.	Opening Remarks ( Opening Remarks ) > SlidesLive Video	Caglar Gulcehre · Razvan Pascanu · Antonio Orvieto · Carmen Amo Alonso · Maciej Wołczyk 🔗
Fri 12:10 a.m. - 12:40 a.m.	Sepp Hochreiter ( Invited Talk ) > SlidesLive Video	Sepp Hochreiter 🔗
Fri 12:40 a.m. - 1:00 a.m.	Poster Spotlights ( Poster Spotlight Talks ) > SlidesLive Video	🔗
Fri 1:00 a.m. - 1:30 a.m.	Albert Gu ( Invited Talk ) > SlidesLive Video	🔗
Fri 1:30 a.m. - 2:00 a.m.	Soham De ( Invited Talk ) > SlidesLive Video	🔗
Fri 2:00 a.m. - 2:30 a.m.	Angela Fan ( Invited Talk ) > SlidesLive Video	🔗
Fri 2:30 a.m. - 3:00 a.m.	Joao Sacramento ( Invited Talk ) > SlidesLive Video	Joao Sacramento 🔗
Fri 3:00 a.m. - 4:30 a.m.	Poster Session ( Poster Session ) >	🔗
Fri 4:30 a.m. - 5:00 a.m.	Lunch Break ( Lunch Break ) >	🔗
Fri 5:00 a.m. - 5:30 a.m.	Stephanie Chan ( Invited Talk ) > SlidesLive Video	🔗
Fri 5:30 a.m. - 6:00 a.m.	Hava Siegelman ( Invited Talk ) > SlidesLive Video	🔗
Fri 6:00 a.m. - 6:30 a.m.	Poster Spotlights ( Poster Spotlight Talks ) > SlidesLive Video	🔗
Fri 6:30 a.m. - 7:00 a.m.	Coffee Break ( Coffee Break ) >	🔗
Fri 7:00 a.m. - 8:00 a.m.	Panel Discussion ( Panel ) > SlidesLive Video	🔗
-	Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling ( Poster ) > link Link	Raunaq Bhirangi · Chenyu Wang · Venkatesh Pattabiraman · Carmel Majidi · Abhinav Gupta · Tess Hellebrekers · Lerrel Pinto 🔗
-	Fleet of Agents: Coordinated Problem Solving with Large Language Models using Genetic Particle Filtering ( Poster ) > link Link	Akhil Arora · Lars Klein · Nearchos Potamitis · Roland Aydin · Caglar Gulcehre · Robert West 🔗
-	Delay Embedding Theory of Neural Sequence Models ( Poster ) > link Link	Mitchell Ostrow · Adam Eisen · Ila R. Fiete 🔗
-	Pretrained Hybrids with MAD Skills ( Poster ) > link Link	Nicholas Roberts · Samuel Guo · Zhiqi Gao · Satya Sai Srinath Namburi GNVV · Sonia Cromp · Chengjun Wu · Chengyu Duan · Frederic Sala 🔗
-	When can transformers compositionally generalize in-context? ( Poster ) > link Link	Seijin Kobayashi · Simon Schug · Yassir Akram · Florian Redhardt · Johannes Von Oswald · Razvan Pascanu · Guillaume Lajoie · Joao Sacramento 🔗
-	Multi-Task Instruction Training of Text Diffusion Models ( Poster ) > link Link	Changyou Chen · Gargi Balasubramaniam · Rui Meng · Han Zhao · Bunyamin Sisman · qingjun cui 🔗
-	HiPPO-Prophecy: State-Space Models can Provably Learn Dynamical Systems in Context ( Poster ) > link Link	Federico Arangath Joseph · Noah Liniger · Kilian Haefeli 🔗
-	MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts ( Poster ) > link Link	Maciej Pióro · Kamil Ciebiera · Krystian Król · Jan Ludziejewski · Michał Krutul · Jakub Krajewski · Szymon Antoniak · Piotr Milos · Marek Cygan · Sebastian Jaszczur 🔗
-	Needle in the Haystack for Memory Based Large Language Models ( Poster ) > link Link	Elliot Nelson · Soham Dan · Georgios Kollias · Payel Das · Subhajit Chaudhury 🔗
-	QSMixer: Connecting SSMs with Mixer Models via Quasi-Separable Matrices ( Poster ) > link Link	Ali Behrouz · Michele Santacatterina · Ramin Zabih 🔗
-	Parallelizing Autoregressive Generation with Variational State-Space Models ( Poster ) > link Link	Gaspard Lambrechts · Yann Claes · Pierre Geurts · Damien Ernst 🔗
-	ECG Signal Denoising Using Multi-scale Patch Embedding and Transformers ( Poster ) > link Link	Ding Zhu · Vishnu Chhabra · Mohammad Mahdi Khalili 🔗
-	The Role of State Matrix Initialization in SSMs: A Perspective on the Approximation-Estimation Tradeoff ( Poster ) > link Link	Fusheng Liu · Qianxiao Li 🔗
-	State soup: in-context skill learning, retrieval and mixing ( Poster ) > link Link	Maciej Pióro · Maciej Wołczyk · Razvan Pascanu · Johannes Von Oswald · Joao Sacramento 🔗
-	RotRNN: Modelling Long Sequences with Rotations ( Poster ) > link Link	Rares Dolga · Kai Biegun · Jake Cunningham · David Barber 🔗
-	Towards a theory of learning dynamics in deep state space models ( Poster ) > link Link	Jakub Smekal · Jimmy Smith · Michael Kleinman · Dan Biderman · Scott Linderman 🔗
-	Recurrent VAE with Gaussian Process Decoders for De novo Molecular Generation ( Poster ) > link Link	Vidhi Lalchand · David Lines · Neil Lawrence 🔗
-	DynaGraph: Dynamic Contrastive Graph for Interpretable Multi-label Prediction using Time-Series EHR Data ( Poster ) > link Link	Munib Mesinovic · Soheila Molaei · Peter Watkinson · Tingting Zhu 🔗
-	Repurposing Language Models into Embedding Models: Finding the Compute-Optimal Recipe ( Poster ) > link Link	Albert Jiang · Alicja Ziarko · Bartosz Piotrowski · Wenda Li · Mateja Jamnik · Piotr Milos 🔗
-	Enhancing Transformer RNNs with Multiple Temporal Perspectives ( Poster ) > link Link	Razvan Dumitru · Darius Peteleaza · Mihai Surdeanu 🔗
-	State Space Models for Brain Computer Interfaces? ( Poster ) > link Link	Pablo Soëtard · Miran Özdogan · Oiwi Parker Jones 🔗
-	Reparameterized Multi-Resolution Convolutions for Long Sequence Modelling ( Poster ) > link Link	Jake Cunningham · Giorgio Giannone · Mingtian Zhang · Marc Deisenroth 🔗
-	On the Power of Convolution-Augmented Transformer ( Poster ) > link Link	Mingchen Li · Xuechen Zhang · Yixiao HUANG · Samet Oymak 🔗
-	Vision-LSTM: xLSTM as Generic Vision Backbone ( Poster ) > link Link	Benedikt Alkin · Maximilian Beck · Korbinian Pöppel · Sepp Hochreiter · Johannes Brandstetter 🔗
-	EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation ( Poster ) > link Link	Yuqiao Wen · Behzad Shayegh · Chenyang Huang · Yanshuai Cao · Lili Mou 🔗
-	Towards Dynamic Feature Acquisition on Medical Time Series by Maximizing Conditional Mutual Information ( Poster ) > link Link	Fedor Sergeev · Paola Malsot · Gunnar Ratsch · Vincent Fortuin 🔗
-	Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective ( Poster ) > link Link	Zhen Qin · Xuyang Shen · Dong Li · Weigao Sun · Stan Birchfield · Richard I Hartley · Yiran Zhong 🔗
-	OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents ( Poster ) > link Link	Zihao Wang · Shaofei Cai · Zhancun Mu · Haowei Lin · Ceyao Zhang · Xuejie Liu · Qing Li · Anji Liu · Xiaojian Ma · Yitao Liang 🔗
-	Reservoir Structured State Space Models ( Poster ) > link Link	Giuseppe Lombardi · Claudio Gallicchio · Andrea Ceni 🔗
-	Recurrent Action Transformer with Memory ( Poster ) > link Link	Egor Cherepanov · Aleksei Staroverov · Dmitry Yudin · Alexey Kovalev · Aleksandr Panov 🔗
-	Rough Transformers: Lightweight Continuous-Time Sequence Modelling with Path Signatures ( Poster ) > link Link	Fernando Moreno-Pino · Alvaro Arroyo · Harrison Waldon · Xiaowen Dong · Alvaro Cartea 🔗
-	Viewing Attention as a Recurrent Neural Network ( Poster ) > link Link	Leo Feng · Frederick Tung · Hossein Hajimirsadeghi · Mohamed Osama Ahmed · Yoshua Bengio · Greg Mori 🔗
-	Latte: Latent Attention for Linear Time Transformers ( Poster ) > link Link	Rares Dolga · Marius Cobzarenco · Ahmed Shahin · David Barber 🔗
-	An All-MLP Sequence Modeling Architecture That Excels at Copying ( Poster ) > link Link	Chenwei Cui · Zehao Yan · Gedeon Muhawenayo · Hannah Kerner 🔗
-	Randomized Signatures for processing long-range Sequences on Graphs ( Poster ) > link Link	Lukas Gruber · Bernhard Schäfl · Johannes Brandstetter · Sepp Hochreiter 🔗
-	Selective Attention: Enhancing Transformer through Principled Context Control ( Poster ) > link Link	Xuechen Zhang · Xiangyu Chang · Mingchen Li · Amit Roy-Chowdhury · Jiasi Chen · Samet Oymak 🔗
-	FutureTST: When Transformers Meet Future Exogenous Drivers ( Poster ) > link Link	Kshitij Tayal · Arvind Renganathan · Vipin Kumar · Dan Lu 🔗
-	On Feature Learning in Structured State Space Models ( Poster ) > link Link	Leena Chennuru Vankadara · Jin Xu · Moritz Haas · Volkan Cevher 🔗
-	xLSTM: Extended Long Short-Term Memory ( Poster ) > link Link	Maximilian Beck · Korbinian Pöppel · Markus Spanring · Andreas Auer · Oleksandra Prudnikova · Michael Kopp · Günter Klambauer · Johannes Brandstetter · Sepp Hochreiter 🔗
-	Investigating Low-Rank Training in Transformer Language Models: Efficiency and Scaling Analysis ( Poster ) > link Link	Xiuying Wei · Skander Moalla · Razvan Pascanu · Caglar Gulcehre 🔗
-	MSAMamba: Adapting Subquadratic Models To Long-Context DNA MSA Analysis ( Poster ) > link Link	Vishrut Thoutam · Dina Ellsworth 🔗
-	BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts ( Poster ) > link Link	11 presenters Qizhen Zhang · Nikolas Gritsch · Dwaraknath Gnaneshwar · Simon Guo · David Cairuz · Bharat Venkitesh · Jakob Foerster · Phil Blunsom · Sebastian Ruder · Ahmet Üstün · Acyr Locatelli 🔗
-	Reservoir Memory Networks: Long-range temporal dependencies with untrained RNNs ( Poster ) > link Link	Claudio Gallicchio · Andrea Ceni 🔗
-	On the Bottleneck of State Space Models: Locality and Oversmoothing ( Poster ) > link Link	Pragya Srivastava · Peihao Wang · Ruisi Cai · Jiajun Zhu · Pan Li · Zhangyang “Atlas” Wang 🔗
-	LongSSM: On the Length Extension of State-space Models in Language Modelling ( Poster ) > link Link	Shida Wang 🔗
-	KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty ( Poster ) > link Link	Philipp Becker · Niklas Freymuth · Gerhard Neumann 🔗
-	Associative Recurrent Memory Transformer ( Poster ) > link Link	Ivan Rodkin · Yuri Kuratov · Aidar Bulatov · Mikhail Burtsev 🔗
-	Orthogonal residual connections for long-term memory retention in recurrent neural networks ( Poster ) > link Link	Andrea Ceni · Claudio Gallicchio 🔗
-	Enhancing Sequence Modeling with Multi-Resolution State Space Models ( Poster ) > link Link	Mahdi Karami · Ali Behrouz 🔗
-	SeRpEnt: Selective Resampling for Expressive State Space Models ( Poster ) > link Link	Stefano Rando · Luca Romani · Matteo Migliarini · Denis Gudovskiy · Luca Franco · Luca Rigazio · Fabio Galasso 🔗
-	Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models ( Poster ) > link Link	Ali Behrouz · Michele Santacatterina · Ramin Zabih 🔗
-	Q-S5 Towards Quantized State Space Models ( Poster ) > link Link	Steven Abreu · Jens Egholm Pedersen · Kade Heckel · Alessandro Pierro 🔗
-	Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models ( Poster ) > link Link	Mohammad Shahab Sepehri · Zalan Fabian · Mahdi Soltanolkotabi 🔗
-	Probing the Decision Boundaries of In-context Learning in Large Language Models ( Poster ) > link Link	Siyan Zhao · Tung Nguyen · Aditya Grover 🔗
-	Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations ( Poster ) > link Link	Alexander Hägele · Elie Bakouch · Atli Kosson · Loubna Ben allal · Leandro Von Werra · Martin Jaggi 🔗
-	Length independent generalization bounds for deep SSM architectures ( Poster ) > link Link	Dániel Rácz · Mihaly Petreczky · Balint Daroczy 🔗