Timezone: »
Recent advances in deep learning have relied heavily on the use of large Transformers due to their ability to learn at scale. However, the core building block of Transformers, the attention operator, exhibits quadratic cost in sequence length, limiting the amount of context accessible. Existing subquadratic methods based on low-rank and sparse approximations need to be combined with dense attention layers to match Transformers at scale, indicating a gap in capability. In this work, we propose Hyena, a subquadratic drop-in replacement for attention constructed by interleaving implicitly parametrized long convolutions and data-controlled gating. In challenging reasoning tasks on sequences of thousands to hundreds of thousands of tokens, Hyena improves accuracy by more than 50 points over operators relying on state-space models, transfer functions, and other implicit and explicit methods, matching attention-based models. We set a new state-of-the-art for dense-attention-free architectures on language modeling in standard datasets WikiText103 and The Pile, reaching Transformer quality with a 20% reduction in training compute required at sequence length 2k. Hyena operators are 2x faster than highly optimized attention at sequence length 8k, with speedups of 100x at 64k.
Author Information
Michael Poli (Stanford University)
Stefano Massaroli (Mila)
Eric Nguyen (Stanford)
PhD student at Stanford in bioengineering and machine learning.
Daniel Y Fu (Stanford University)
Tri Dao (Stanford)
Stephen Baccus (Stanford University)
Yoshua Bengio (Mila - Quebec AI Institute)
Stefano Ermon (Stanford University)
Christopher Re (Stanford University)
Related Events (a corresponding poster, oral, or spotlight)
-
2023 Poster: Hyena Hierarchy: Towards Larger Convolutional Language Models »
Thu. Jul 27th 12:00 -- 01:30 AM Room Exhibit Hall 1 #129
More from the Same Authors
-
2021 : Gradient Starvation: A Learning Proclivity in Neural Networks »
Mohammad Pezeshki · Sékou-Oumar Kaba · Yoshua Bengio · Aaron Courville · Doina Precup · Guillaume Lajoie -
2021 : Epoch-Wise Double Descent: A Theory of Multi-scale Feature Learning Dynamics »
Mohammad Pezeshki · Amartya Mitra · Yoshua Bengio · Guillaume Lajoie -
2021 : A Standardized Data Collection Toolkit for Model Benchmarking »
Avanika Narayan · Piero Molino · Karan Goel · Christopher Re -
2021 : Exploration-Driven Representation Learning in Reinforcement Learning »
Akram Erraqabi · Mingde Zhao · Marlos C. Machado · Yoshua Bengio · Sainbayar Sukhbaatar · Ludovic Denoyer · Alessandro Lazaric -
2021 : Variational Causal Networks: Approximate Bayesian Inference over Causal Structures »
Yashas Annadani · Jonas Rothfuss · Alexandre Lacoste · Nino Scherrer · Anirudh Goyal · Yoshua Bengio · Stefan Bauer -
2022 : BARACK: Partially Supervised Group Robustness With Guarantees »
Nimit Sohoni · Maziar Sanjabi · Nicolas Ballas · Aditya Grover · Shaoliang Nie · Hamed Firooz · Christopher Re -
2022 : Contrastive Adapters for Foundation Model Group Robustness »
Michael Zhang · Christopher Re -
2022 : The Importance of Background Information for Out of Distribution Generalization »
Jupinder Parmar · Khaled Saab · Brian Pogatchnik · Daniel Rubin · Christopher Ré -
2022 : On the Generalization and Adaption Performance of Causal Models »
Nino Scherrer · Anirudh Goyal · Stefan Bauer · Yoshua Bengio · Rosemary Nan Ke -
2022 : Efficient Continuous Spatio-Temporal Simulation with Graph Spline Networks »
Chuanbo HUA · Federico Berto · Michael Poli · Stefano Massaroli · Jinkyoo Park -
2022 : Efficient Continuous Spatio-Temporal Simulation with Graph Spline Networks »
Chuanbo HUA · Federico Berto · Michael Poli · Stefano Massaroli · Jinkyoo Park -
2022 : MAgNet: Mesh Agnostic Neural PDE Solver »
Oussama Boussif · Yoshua Bengio · Loubna Benabbou · Dan Assouline -
2022 : Transform Once: Efficient Operator Learning in Frequency Domain »
Michael Poli · Stefano Massaroli · Federico Berto · Jinkyoo Park · Tri Dao · Christopher Re · Stefano Ermon -
2023 : The Role of Linguistic Priors in Measuring Compositional Generalization of Vision-language Models »
Chenwei Wu · Li Li · Stefano Ermon · Patrick Haffner · Rong Ge · Zaiwei Zhang -
2023 : Improving and Generalizing Flow-Based Generative Models with Minibatch Optimal Transport »
Alexander Tong · Nikolay Malkin · Guillaume Huguet · Yanlei Zhang · Jarrid Rector-Brooks · Kilian Fatras · Guy Wolf · Yoshua Bengio -
2023 : Simulation-Free Schrödinger Bridges via Score and Flow Matching »
Alexander Tong · Nikolay Malkin · Kilian Fatras · Lazar Atanackovic · Yanlei Zhang · Guillaume Huguet · Guy Wolf · Yoshua Bengio -
2023 : Parallel Sampling of Diffusion Models »
Andy Shih · Suneel Belkhale · Stefano Ermon · Dorsa Sadigh · Nima Anari -
2023 : Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models »
Mayee Chen · Nicholas Roberts · Kush Bhatia · Jue Wang · Ce Zhang · Frederic Sala · Christopher Ré -
2023 : OC-NMN: Object-centric Compositional Neural Module Network for Generative Visual Analogical Reasoning »
Rim Assouel · Pau Rodriguez · Perouz Taslakian · David Vazquez · Yoshua Bengio -
2023 : Prospectors: Leveraging Short Contexts to Mine Salient Objects in High-dimensional Imagery »
Gautam Machiraju · Arjun Desai · James Zou · Christopher Re · Parag Mallick -
2023 : Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment Effect Estimation »
Chris Emezue · Alexandre Drouin · Tristan Deleu · Stefan Bauer · Yoshua Bengio -
2023 : Joint Bayesian Inference of Graphical Structure and Parameters with a Single Generative Flow Network »
Tristan Deleu · Mizu Nishikawa-Toomey · Jithendaraa Subramanian · Nikolay Malkin · Laurent Charlin · Yoshua Bengio -
2023 : On the Equivalence of Consistency-Type Models: Consistency Models, Consistent Diffusion Models, and Fokker-Planck Regularization »
Chieh-Hsin Lai · Yuhta Takida · Toshimitsu Uesaka · Naoki Murata · Yuki Mitsufuji · Stefano Ermon -
2023 : BatchGFN: Generative Flow Networks for Batch Active Learning »
Shreshth Malik · Salem Lahlou · Andrew Jesson · Moksh Jain · Nikolay Malkin · Tristan Deleu · Yoshua Bengio · Yarin Gal -
2023 : Parallel Sampling of Diffusion Models »
Andy Shih · Suneel Belkhale · Stefano Ermon · Dorsa Sadigh · Nima Anari -
2023 : Thompson Sampling for Improved Exploration in GFlowNets »
Jarrid Rector-Brooks · Kanika Madan · Moksh Jain · Maksym Korablyov · Chenghao Liu · Sarath Chandar · Nikolay Malkin · Yoshua Bengio -
2023 : GFlowNets for Causal Discovery: an Overview »
Dragos Cristian Manta · Edward Hu · Yoshua Bengio -
2023 : Accelerating LLM Inference with Staged Speculative Decoding »
Benjamin F Spector · Christopher Re -
2023 : H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models »
Zhenyu Zhang · Ying Sheng · Tianyi Zhou · Tianlong Chen · Lianmin Zheng · Ruisi Cai · Zhao Song · Yuandong Tian · Christopher Re · Clark Barrett · Zhangyang “Atlas” Wang · Beidi Chen -
2023 : Constant Memory Attention Block »
Leo Feng · Frederick Tung · Hossein Hajimirsadeghi · Yoshua Bengio · Mohamed Osama Ahmed -
2023 : What if We Enrich day-ahead Solar Irradiance Time Series Forecasting with Spatio-Temporal Context? »
Oussama Boussif · Ghait Boukachab · Dan Assouline · Stefano Massaroli · Tianle Yuan · Loubna Benabbou · Yoshua Bengio -
2023 : Direct Preference Optimization: Your Language Model is Secretly a Reward Model »
Rafael Rafailov · Archit Sharma · Eric Mitchell · Stefano Ermon · Christopher Manning · Chelsea Finn -
2023 : GFlowNets for Causal Discovery: an Overview »
Dragos Cristian Manta · Edward Hu · Yoshua Bengio -
2023 Workshop: ES-FoMo: Efficient Systems for Foundation Models »
Julien Launay · Daniel Y Fu · Tri Dao · Daniel Hesslow · Beidi Chen · Azalia Mirhoseini · Percy Liang -
2023 : Invited Talk by Stefano Ermon »
Stefano Ermon -
2023 Workshop: Structured Probabilistic Inference and Generative Modeling »
Dinghuai Zhang · Yuanqi Du · Chenlin Meng · Shawn Tan · Yingzhen Li · Max Welling · Yoshua Bengio -
2023 Workshop: Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators »
Felix Petersen · Marco Cuturi · Mathias Niepert · Hilde Kuehne · Michael Kagan · Willie Neiswanger · Stefano Ermon -
2023 : Opening Remark »
Dinghuai Zhang · Yuanqi Du · Chenlin Meng · Shawn Tan · Yingzhen Li · Max Welling · Yoshua Bengio -
2023 Oral: Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time »
Zichang Liu · Jue Wang · Tri Dao · Tianyi Zhou · Binhang Yuan · Zhao Song · Anshumali Shrivastava · Ce Zhang · Yuandong Tian · Christopher Re · Beidi Chen -
2023 Poster: Equivariance with Learned Canonicalization Functions »
Sékou-Oumar Kaba · Arnab Kumar Mondal · Yan Zhang · Yoshua Bengio · Siamak Ravanbakhsh -
2023 Poster: GFlowOut: Dropout with Generative Flow Networks »
Dianbo Liu · Moksh Jain · Bonaventure F. P. Dossou · Qianli Shen · Salem Lahlou · Anirudh Goyal · Nikolay Malkin · Chris Emezue · Dinghuai Zhang · Nadhir Hassen · Xu Ji · Kenji Kawaguchi · Yoshua Bengio -
2023 Poster: Geometric Latent Diffusion Models for 3D Molecule Generation »
Minkai Xu · Alexander Powers · Ron Dror · Stefano Ermon · Jure Leskovec -
2023 Poster: Simple Hardware-Efficient Long Convolutions for Sequence Modeling »
Daniel Y Fu · Elliot L Epstein · Eric Nguyen · Armin Thomas · Michael Zhang · Tri Dao · Atri Rudra · Christopher Re -
2023 Poster: Reflected Diffusion Models »
Aaron Lou · Stefano Ermon -
2023 Poster: Discrete Key-Value Bottleneck »
Frederik Träuble · Anirudh Goyal · Nasim Rahaman · Michael Mozer · Kenji Kawaguchi · Yoshua Bengio · Bernhard Schölkopf -
2023 Poster: FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU »
Ying Sheng · Lianmin Zheng · Binhang Yuan · Zhuohan Li · Max Ryabinin · Beidi Chen · Percy Liang · Christopher Re · Ion Stoica · Ce Zhang -
2023 Poster: Long Horizon Temperature Scaling »
Andy Shih · Dorsa Sadigh · Stefano Ermon -
2023 Oral: FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU »
Ying Sheng · Lianmin Zheng · Binhang Yuan · Zhuohan Li · Max Ryabinin · Beidi Chen · Percy Liang · Christopher Re · Ion Stoica · Ce Zhang -
2023 Poster: GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration »
Naoki Murata · Koichi Saito · Chieh-Hsin Lai · Yuhta Takida · Toshimitsu Uesaka · Yuki Mitsufuji · Stefano Ermon -
2023 Poster: Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning »
Sébastien Lachapelle · Tristan Deleu · Divyat Mahajan · Ioannis Mitliagkas · Yoshua Bengio · Simon Lacoste-Julien · Quentin Bertrand -
2023 Poster: Better Training of GFlowNets with Local Credit and Incomplete Trajectories »
Ling Pan · Nikolay Malkin · Dinghuai Zhang · Yoshua Bengio -
2023 Poster: Learning GFlowNets From Partial Episodes For Improved Convergence And Stability »
Kanika Madan · Jarrid Rector-Brooks · Maksym Korablyov · Emmanuel Bengio · Moksh Jain · Andrei-Cristian Nica · Tom Bosc · Yoshua Bengio · Nikolay Malkin -
2023 Poster: CocktailSGD: Fine-tuning Foundation Models over 500Mbps Networks »
Jue Wang · Yucheng Lu · Binhang Yuan · Beidi Chen · Percy Liang · Chris De Sa · Christopher Re · Ce Zhang -
2023 Poster: FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation »
Chieh-Hsin Lai · Yuhta Takida · Naoki Murata · Toshimitsu Uesaka · Yuki Mitsufuji · Stefano Ermon -
2023 Poster: Deep Latent State Space Models for Time-Series Generation »
Linqi Zhou · Michael Poli · Winnie Xu · Stefano Massaroli · Stefano Ermon -
2023 Oral: Interventional Causal Representation Learning »
Kartik Ahuja · Divyat Mahajan · Yixin Wang · Yoshua Bengio -
2023 Oral: GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration »
Naoki Murata · Koichi Saito · Chieh-Hsin Lai · Yuhta Takida · Toshimitsu Uesaka · Yuki Mitsufuji · Stefano Ermon -
2023 Oral: Learning GFlowNets From Partial Episodes For Improved Convergence And Stability »
Kanika Madan · Jarrid Rector-Brooks · Maksym Korablyov · Emmanuel Bengio · Moksh Jain · Andrei-Cristian Nica · Tom Bosc · Yoshua Bengio · Nikolay Malkin -
2023 Poster: FAENet: Frame Averaging Equivariant GNN for Materials Modeling »
ALEXANDRE DUVAL · Victor Schmidt · Alex Hernandez-Garcia · Santiago Miret · Fragkiskos Malliaros · Yoshua Bengio · David Rolnick -
2023 Poster: Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time »
Zichang Liu · Jue Wang · Tri Dao · Tianyi Zhou · Binhang Yuan · Zhao Song · Anshumali Shrivastava · Ce Zhang · Yuandong Tian · Christopher Re · Beidi Chen -
2023 Poster: Multi-Objective GFlowNets »
Moksh Jain · Sharath Chandra Raparthy · Alex Hernandez-Garcia · Jarrid Rector-Brooks · Yoshua Bengio · Santiago Miret · Emmanuel Bengio -
2023 Poster: CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations »
Gengchen Mai · Ni Lao · Yutong He · Jiaming Song · Stefano Ermon -
2023 Poster: Interventional Causal Representation Learning »
Kartik Ahuja · Divyat Mahajan · Yixin Wang · Yoshua Bengio -
2023 Poster: A theory of continuous generative flow networks »
Salem Lahlou · Tristan Deleu · Pablo Lemos · Dinghuai Zhang · Alexandra Volokhova · Alex Hernandez-Garcia · Lena Nehale Ezzine · Yoshua Bengio · Nikolay Malkin -
2023 Poster: GFlowNet-EM for Learning Compositional Latent Variable Models »
Edward Hu · Nikolay Malkin · Moksh Jain · Katie Everett · Alexandros Graikos · Yoshua Bengio -
2022 : FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness »
Tri Dao · Daniel Y Fu · Stefano Ermon · Atri Rudra · Christopher Re -
2022 : Generative Modeling with Stochastic Differential Equations »
Stefano Ermon -
2022 Workshop: Hardware-aware efficient training (HAET) »
Gonçalo Mordido · Yoshua Bengio · Ghouthi BOUKLI HACENE · Vincent Gripon · François Leduc-Primeau · Vahid Partovi Nia · Julie Grollier -
2022 : Neural Geometric Embedding Flows »
Aaron Lou · Yang Song · Jiaming Song · Stefano Ermon -
2022 : Is a Modular Architecture Enough? »
Sarthak Mittal · Yoshua Bengio · Guillaume Lajoie -
2022 Workshop: Adaptive Experimental Design and Active Learning in the Real World »
Mojmir Mutny · Willie Neiswanger · Ilija Bogunovic · Stefano Ermon · Yisong Yue · Andreas Krause -
2022 Poster: Imitation Learning by Estimating Expertise of Demonstrators »
Mark Beliaev · Andy Shih · Stefano Ermon · Dorsa Sadigh · Ramtin Pedarsani -
2022 Poster: Building Robust Ensembles via Margin Boosting »
Dinghuai Zhang · Hongyang Zhang · Aaron Courville · Yoshua Bengio · Pradeep Ravikumar · Arun Sai Suggala -
2022 Poster: It’s Raw! Audio Generation with State-Space Models »
Karan Goel · Albert Gu · Chris Donahue · Christopher Re -
2022 Poster: Multi-scale Feature Learning Dynamics: Insights for Double Descent »
Mohammad Pezeshki · Amartya Mitra · Yoshua Bengio · Guillaume Lajoie -
2022 Spotlight: Imitation Learning by Estimating Expertise of Demonstrators »
Mark Beliaev · Andy Shih · Stefano Ermon · Dorsa Sadigh · Ramtin Pedarsani -
2022 Spotlight: Building Robust Ensembles via Margin Boosting »
Dinghuai Zhang · Hongyang Zhang · Aaron Courville · Yoshua Bengio · Pradeep Ravikumar · Arun Sai Suggala -
2022 Spotlight: Multi-scale Feature Learning Dynamics: Insights for Double Descent »
Mohammad Pezeshki · Amartya Mitra · Yoshua Bengio · Guillaume Lajoie -
2022 Oral: It’s Raw! Audio Generation with State-Space Models »
Karan Goel · Albert Gu · Chris Donahue · Christopher Re -
2022 Poster: A General Recipe for Likelihood-free Bayesian Optimization »
Jiaming Song · Lantao Yu · Willie Neiswanger · Stefano Ermon -
2022 Poster: Biological Sequence Design with GFlowNets »
Moksh Jain · Emmanuel Bengio · Alex Hernandez-Garcia · Jarrid Rector-Brooks · Bonaventure Dossou · Chanakya Ekbote · Jie Fu · Tianyu Zhang · Michael Kilgour · Dinghuai Zhang · Lena Simine · Payel Das · Yoshua Bengio -
2022 Poster: Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning »
Mayee Chen · Daniel Y Fu · Avanika Narayan · Michael Zhang · Zhao Song · Kayvon Fatahalian · Christopher Re -
2022 Oral: A General Recipe for Likelihood-free Bayesian Optimization »
Jiaming Song · Lantao Yu · Willie Neiswanger · Stefano Ermon -
2022 Spotlight: Biological Sequence Design with GFlowNets »
Moksh Jain · Emmanuel Bengio · Alex Hernandez-Garcia · Jarrid Rector-Brooks · Bonaventure Dossou · Chanakya Ekbote · Jie Fu · Tianyu Zhang · Michael Kilgour · Dinghuai Zhang · Lena Simine · Payel Das · Yoshua Bengio -
2022 Spotlight: Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning »
Mayee Chen · Daniel Y Fu · Avanika Narayan · Michael Zhang · Zhao Song · Kayvon Fatahalian · Christopher Re -
2022 Poster: ButterflyFlow: Building Invertible Layers with Butterfly Matrices »
Chenlin Meng · Linqi Zhou · Kristy Choi · Tri Dao · Stefano Ermon -
2022 Poster: Bit Prioritization in Variational Autoencoders via Progressive Coding »
Rui Shu · Stefano Ermon -
2022 Poster: Generative Flow Networks for Discrete Probabilistic Modeling »
Dinghuai Zhang · Nikolay Malkin · Zhen Liu · Alexandra Volokhova · Aaron Courville · Yoshua Bengio -
2022 Poster: Monarch: Expressive Structured Matrices for Efficient and Accurate Training »
Tri Dao · Beidi Chen · Nimit Sohoni · Arjun Desai · Michael Poli · Jessica Grogan · Alexander Liu · Aniruddh Rao · Atri Rudra · Christopher Re -
2022 Poster: Modular Conformal Calibration »
Charles Marx · Shengjia Zhao · Willie Neiswanger · Stefano Ermon -
2022 Poster: Correct-N-Contrast: a Contrastive Approach for Improving Robustness to Spurious Correlations »
Michael Zhang · Nimit Sohoni · Hongyang Zhang · Chelsea Finn · Christopher Re -
2022 Poster: Towards Scaling Difference Target Propagation by Learning Backprop Targets »
Maxence ERNOULT · Fabrice Normandin · Abhinav Moudgil · Sean Spinney · Eugene Belilovsky · Irina Rish · Blake Richards · Yoshua Bengio -
2022 Spotlight: Towards Scaling Difference Target Propagation by Learning Backprop Targets »
Maxence ERNOULT · Fabrice Normandin · Abhinav Moudgil · Sean Spinney · Eugene Belilovsky · Irina Rish · Blake Richards · Yoshua Bengio -
2022 Spotlight: Generative Flow Networks for Discrete Probabilistic Modeling »
Dinghuai Zhang · Nikolay Malkin · Zhen Liu · Alexandra Volokhova · Aaron Courville · Yoshua Bengio -
2022 Spotlight: Bit Prioritization in Variational Autoencoders via Progressive Coding »
Rui Shu · Stefano Ermon -
2022 Spotlight: Modular Conformal Calibration »
Charles Marx · Shengjia Zhao · Willie Neiswanger · Stefano Ermon -
2022 Oral: Correct-N-Contrast: a Contrastive Approach for Improving Robustness to Spurious Correlations »
Michael Zhang · Nimit Sohoni · Hongyang Zhang · Chelsea Finn · Christopher Re -
2022 Oral: Monarch: Expressive Structured Matrices for Efficient and Accurate Training »
Tri Dao · Beidi Chen · Nimit Sohoni · Arjun Desai · Michael Poli · Jessica Grogan · Alexander Liu · Aniruddh Rao · Atri Rudra · Christopher Re -
2022 Spotlight: ButterflyFlow: Building Invertible Layers with Butterfly Matrices »
Chenlin Meng · Linqi Zhou · Kristy Choi · Tri Dao · Stefano Ermon -
2021 : Invited Talk 5 (Stefano Ermon): Maximum Likelihood Training of Score-Based Diffusion Models »
Stefano Ermon -
2021 Workshop: Tackling Climate Change with Machine Learning »
Hari Prasanna Das · Katarzyna Tokarska · Maria João Sousa · Meareg Hailemariam · David Rolnick · Xiaoxiang Zhu · Yoshua Bengio -
2021 Poster: Temporal Predictive Coding For Model-Based Planning In Latent Space »
Tung Nguyen · Rui Shu · Tuan Pham · Hung Bui · Stefano Ermon -
2021 Poster: HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections »
Ines Chami · Albert Gu · Dat P Nguyen · Christopher Re -
2021 Spotlight: HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections »
Ines Chami · Albert Gu · Dat P Nguyen · Christopher Re -
2021 Spotlight: Temporal Predictive Coding For Model-Based Planning In Latent Space »
Tung Nguyen · Rui Shu · Tuan Pham · Hung Bui · Stefano Ermon -
2021 Poster: Bayesian Algorithm Execution: Estimating Computable Properties of Black-box Functions Using Mutual Information »
Willie Neiswanger · Ke Alexander Wang · Stefano Ermon -
2021 Spotlight: Bayesian Algorithm Execution: Estimating Computable Properties of Black-box Functions Using Mutual Information »
Willie Neiswanger · Ke Alexander Wang · Stefano Ermon -
2021 Poster: An End-to-End Framework for Molecular Conformation Generation via Bilevel Programming »
Minkai Xu · Wujie Wang · Shitong Luo · Chence Shi · Yoshua Bengio · Rafael Gomez-Bombarelli · Jian Tang -
2021 Poster: Mandoline: Model Evaluation under Distribution Shift »
Mayee Chen · Karan Goel · Nimit Sohoni · Fait Poms · Kayvon Fatahalian · Christopher Re -
2021 Poster: Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving »
Yang Song · Chenlin Meng · Renjie Liao · Stefano Ermon -
2021 Spotlight: Mandoline: Model Evaluation under Distribution Shift »
Mayee Chen · Karan Goel · Nimit Sohoni · Fait Poms · Kayvon Fatahalian · Christopher Re -
2021 Spotlight: Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving »
Yang Song · Chenlin Meng · Renjie Liao · Stefano Ermon -
2021 Spotlight: An End-to-End Framework for Molecular Conformation Generation via Bilevel Programming »
Minkai Xu · Wujie Wang · Shitong Luo · Chence Shi · Yoshua Bengio · Rafael Gomez-Bombarelli · Jian Tang -
2021 Poster: Reward Identification in Inverse Reinforcement Learning »
Kuno Kim · Shivam Garg · Kirankumar Shiragur · Stefano Ermon -
2021 Spotlight: Reward Identification in Inverse Reinforcement Learning »
Kuno Kim · Shivam Garg · Kirankumar Shiragur · Stefano Ermon -
2021 Poster: Catformer: Designing Stable Transformers via Sensitivity Analysis »
Jared Quincy Davis · Albert Gu · Krzysztof Choromanski · Tri Dao · Christopher Re · Chelsea Finn · Percy Liang -
2021 Spotlight: Catformer: Designing Stable Transformers via Sensitivity Analysis »
Jared Quincy Davis · Albert Gu · Krzysztof Choromanski · Tri Dao · Christopher Re · Chelsea Finn · Percy Liang -
2020 : QA for invited talk 4 Bengio »
Yoshua Bengio -
2020 : Invited talk 4 Bengio »
Yoshua Bengio -
2020 : Keynote: Yoshua Bengio (Q&A) »
Yoshua Bengio -
2020 : Keynote: Yoshua Bengio »
Yoshua Bengio -
2020 Workshop: Object-Oriented Learning: Perception, Representation, and Reasoning »
Sungjin Ahn · Adam Kosiorek · Jessica Hamrick · Sjoerd van Steenkiste · Yoshua Bengio -
2020 Workshop: MLRetrospectives: A Venue for Self-Reflection in ML Research »
Jessica Forde · Jesse Dodge · Mayoore Jaiswal · Rosanne Liu · Ryan Lowe · Rosanne Liu · Joelle Pineau · Yoshua Bengio -
2020 Poster: Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules »
Sarthak Mittal · Alex Lamb · Anirudh Goyal · Vikram Voleti · Murray Shanahan · Guillaume Lajoie · Michael Mozer · Yoshua Bengio -
2020 Poster: Predictive Coding for Locally-Linear Control »
Rui Shu · Tung Nguyen · Yinlam Chow · Tuan Pham · Khoat Than · Mohammad Ghavamzadeh · Stefano Ermon · Hung Bui -
2020 Poster: Learning to Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning »
Sai Krishna Gottipati · Boris Sattarov · Sufeng Niu · Yashaswi Pathak · Haoran Wei · Shengchao Liu · Shengchao Liu · Simon Blackburn · Karam Thomas · Connor Coley · Jian Tang · Sarath Chandar · Yoshua Bengio -
2020 Poster: Perceptual Generative Autoencoders »
Zijun Zhang · Ruixiang ZHANG · Zongpeng Li · Yoshua Bengio · Liam Paull -
2020 Poster: Bridging the Gap Between f-GANs and Wasserstein GANs »
Jiaming Song · Stefano Ermon -
2020 Poster: Revisiting Fundamentals of Experience Replay »
William Fedus · Prajit Ramachandran · Rishabh Agarwal · Yoshua Bengio · Hugo Larochelle · Mark Rowland · Will Dabney -
2020 Poster: Small-GAN: Speeding up GAN Training using Core-Sets »
Samrath Sinha · Han Zhang · Anirudh Goyal · Yoshua Bengio · Hugo Larochelle · Augustus Odena -
2020 Poster: Individual Calibration with Randomized Forecasting »
Shengjia Zhao · Tengyu Ma · Stefano Ermon -
2020 Poster: Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods »
Daniel Y Fu · Mayee Chen · Frederic Sala · Sarah Hooper · Kayvon Fatahalian · Christopher Re -
2020 Poster: Domain Adaptive Imitation Learning »
Kuno Kim · Yihong Gu · Jiaming Song · Shengjia Zhao · Stefano Ermon -
2020 Poster: Training Deep Energy-Based Models with f-Divergence Minimization »
Lantao Yu · Yang Song · Jiaming Song · Stefano Ermon -
2020 Poster: On the Generalization Effects of Linear Transformations in Data Augmentation »
Sen Wu · Hongyang Zhang · Gregory Valiant · Christopher Re -
2020 Poster: Fair Generative Modeling via Weak Supervision »
Kristy Choi · Aditya Grover · Trisha Singh · Rui Shu · Stefano Ermon -
2019 : AI Commons »
Yoshua Bengio -
2019 : Opening remarks »
Yoshua Bengio -
2019 Workshop: AI For Social Good (AISG) »
Margaux Luck · Kris Sankaran · Tristan Sylvain · Sean McGregor · Jonnie Penn · Girmaw Abebe Tadesse · Virgile Sylvain · Myriam Côté · Lester Mackey · Rayid Ghani · Yoshua Bengio -
2019 : Panel Discussion »
Yoshua Bengio · Andrew Ng · Raia Hadsell · John Platt · Claire Monteleoni · Jennifer Chayes -
2019 : Poster discussion »
Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari -
2019 : Personalized Visualization of the Impact of Climate Change »
Yoshua Bengio -
2019 : Networking Lunch (provided) + Poster Session »
Abraham Stanway · Alex Robson · Aneesh Rangnekar · Ashesh Chattopadhyay · Ashley Pilipiszyn · Benjamin LeRoy · Bolong Cheng · Ce Zhang · Chaopeng Shen · Christian Schroeder · Christian Clough · Clement DUHART · Clement Fung · Cozmin Ududec · Dali Wang · David Dao · di wu · Dimitrios Giannakis · Dino Sejdinovic · Doina Precup · Duncan Watson-Parris · Gege Wen · George Chen · Gopal Erinjippurath · Haifeng Li · Han Zou · Herke van Hoof · Hillary A Scannell · Hiroshi Mamitsuka · Hongbao Zhang · Jaegul Choo · James Wang · James Requeima · Jessica Hwang · Jinfan Xu · Johan Mathe · Jonathan Binas · Joonseok Lee · Kalai Ramea · Kate Duffy · Kevin McCloskey · Kris Sankaran · Lester Mackey · Letif Mones · Loubna Benabbou · Lynn Kaack · Matthew Hoffman · Mayur Mudigonda · Mehrdad Mahdavi · Michael McCourt · Mingchao Jiang · Mohammad Mahdi Kamani · Neel Guha · Niccolo Dalmasso · Nick Pawlowski · Nikola Milojevic-Dupont · Paulo Orenstein · Pedram Hassanzadeh · Pekka Marttinen · Ramesh Nair · Sadegh Farhang · Samuel Kaski · Sandeep Manjanna · Sasha Luccioni · Shuby Deshpande · Soo Kim · Soukayna Mouatadid · Sunghyun Park · Tao Lin · Telmo Felgueira · Thomas Hornigold · Tianle Yuan · Tom Beucler · Tracy Cui · Volodymyr Kuleshov · Wei Yu · yang song · Ydo Wexler · Yoshua Bengio · Zhecheng Wang · Zhuangfang Yi · Zouheir Malki -
2019 Workshop: Climate Change: How Can AI Help? »
David Rolnick · Alexandre Lacoste · Tegan Maharaj · Jennifer Chayes · Yoshua Bengio -
2019 Poster: Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations »
Tri Dao · Albert Gu · Matthew Eichhorn · Atri Rudra · Christopher Re -
2019 Poster: Learning Dependency Structures for Weak Supervision Models »
Paroma Varma · Frederic Sala · Ann He · Alexander J Ratner · Christopher Re -
2019 Poster: State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations »
Alex Lamb · Jonathan Binas · Anirudh Goyal · Sandeep Subramanian · Ioannis Mitliagkas · Yoshua Bengio · Michael Mozer -
2019 Poster: Calibrated Model-Based Deep Reinforcement Learning »
Ali Malik · Volodymyr Kuleshov · Jiaming Song · Danny Nemer · Harlan Seymour · Stefano Ermon -
2019 Poster: Graphite: Iterative Generative Modeling of Graphs »
Aditya Grover · Aaron Zweig · Stefano Ermon -
2019 Poster: Adaptive Antithetic Sampling for Variance Reduction »
Hongyu Ren · Shengjia Zhao · Stefano Ermon -
2019 Poster: On the Spectral Bias of Neural Networks »
Nasim Rahaman · Aristide Baratin · Devansh Arpit · Felix Draxler · Min Lin · Fred Hamprecht · Yoshua Bengio · Aaron Courville -
2019 Oral: Learning Dependency Structures for Weak Supervision Models »
Paroma Varma · Frederic Sala · Ann He · Alexander J Ratner · Christopher Re -
2019 Oral: Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations »
Tri Dao · Albert Gu · Matthew Eichhorn · Atri Rudra · Christopher Re -
2019 Oral: On the Spectral Bias of Neural Networks »
Nasim Rahaman · Aristide Baratin · Devansh Arpit · Felix Draxler · Min Lin · Fred Hamprecht · Yoshua Bengio · Aaron Courville -
2019 Oral: Adaptive Antithetic Sampling for Variance Reduction »
Hongyu Ren · Shengjia Zhao · Stefano Ermon -
2019 Oral: Graphite: Iterative Generative Modeling of Graphs »
Aditya Grover · Aaron Zweig · Stefano Ermon -
2019 Oral: Calibrated Model-Based Deep Reinforcement Learning »
Ali Malik · Volodymyr Kuleshov · Jiaming Song · Danny Nemer · Harlan Seymour · Stefano Ermon -
2019 Oral: State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations »
Alex Lamb · Jonathan Binas · Anirudh Goyal · Sandeep Subramanian · Ioannis Mitliagkas · Yoshua Bengio · Michael Mozer -
2019 Poster: A Kernel Theory of Modern Data Augmentation »
Tri Dao · Albert Gu · Alexander J Ratner · Virginia Smith · Christopher De Sa · Christopher Re -
2019 Poster: Multi-Agent Adversarial Inverse Reinforcement Learning »
Lantao Yu · Jiaming Song · Stefano Ermon -
2019 Poster: Manifold Mixup: Better Representations by Interpolating Hidden States »
Vikas Verma · Alex Lamb · Christopher Beckham · Amir Najafi · Ioannis Mitliagkas · David Lopez-Paz · Yoshua Bengio -
2019 Poster: Neural Joint Source-Channel Coding »
Kristy Choi · Kedar Tatwawadi · Aditya Grover · Tsachy Weissman · Stefano Ermon -
2019 Poster: GMNN: Graph Markov Neural Networks »
Meng Qu · Yoshua Bengio · Jian Tang -
2019 Oral: A Kernel Theory of Modern Data Augmentation »
Tri Dao · Albert Gu · Alexander J Ratner · Virginia Smith · Christopher De Sa · Christopher Re -
2019 Oral: Neural Joint Source-Channel Coding »
Kristy Choi · Kedar Tatwawadi · Aditya Grover · Tsachy Weissman · Stefano Ermon -
2019 Oral: GMNN: Graph Markov Neural Networks »
Meng Qu · Yoshua Bengio · Jian Tang -
2019 Oral: Multi-Agent Adversarial Inverse Reinforcement Learning »
Lantao Yu · Jiaming Song · Stefano Ermon -
2019 Oral: Manifold Mixup: Better Representations by Interpolating Hidden States »
Vikas Verma · Alex Lamb · Christopher Beckham · Amir Najafi · Ioannis Mitliagkas · David Lopez-Paz · Yoshua Bengio -
2018 Poster: Modeling Sparse Deviations for Compressed Sensing using Generative Models »
Manik Dhar · Aditya Grover · Stefano Ermon -
2018 Poster: Mutual Information Neural Estimation »
Mohamed Belghazi · Aristide Baratin · Sai Rajeswar · Sherjil Ozair · Yoshua Bengio · R Devon Hjelm · Aaron Courville -
2018 Oral: Modeling Sparse Deviations for Compressed Sensing using Generative Models »
Manik Dhar · Aditya Grover · Stefano Ermon -
2018 Oral: Mutual Information Neural Estimation »
Mohamed Belghazi · Aristide Baratin · Sai Rajeswar · Sherjil Ozair · Yoshua Bengio · R Devon Hjelm · Aaron Courville -
2018 Poster: Representation Tradeoffs for Hyperbolic Embeddings »
Frederic Sala · Christopher De Sa · Albert Gu · Christopher Re -
2018 Poster: Focused Hierarchical RNNs for Conditional Sequence Processing »
Rosemary Nan Ke · Konrad Zolna · Alessandro Sordoni · Zhouhan Lin · Adam Trischler · Yoshua Bengio · Joelle Pineau · Laurent Charlin · Christopher Pal -
2018 Poster: Accelerating Natural Gradient with Higher-Order Invariance »
Yang Song · Jiaming Song · Stefano Ermon -
2018 Poster: Accurate Uncertainties for Deep Learning Using Calibrated Regression »
Volodymyr Kuleshov · Nathan Fenner · Stefano Ermon -
2018 Oral: Representation Tradeoffs for Hyperbolic Embeddings »
Frederic Sala · Christopher De Sa · Albert Gu · Christopher Re -
2018 Oral: Accelerating Natural Gradient with Higher-Order Invariance »
Yang Song · Jiaming Song · Stefano Ermon -
2018 Oral: Accurate Uncertainties for Deep Learning Using Calibrated Regression »
Volodymyr Kuleshov · Nathan Fenner · Stefano Ermon -
2018 Oral: Focused Hierarchical RNNs for Conditional Sequence Processing »
Rosemary Nan Ke · Konrad Zolna · Alessandro Sordoni · Zhouhan Lin · Adam Trischler · Yoshua Bengio · Joelle Pineau · Laurent Charlin · Christopher Pal -
2017 Workshop: Reproducibility in Machine Learning Research »
Rosemary Nan Ke · Anirudh Goyal · Alex Lamb · Joelle Pineau · Samy Bengio · Yoshua Bengio -
2017 Poster: Sharp Minima Can Generalize For Deep Nets »
Laurent Dinh · Razvan Pascanu · Samy Bengio · Yoshua Bengio -
2017 Poster: A Closer Look at Memorization in Deep Networks »
David Krueger · Yoshua Bengio · Stanislaw Jastrzebski · Maxinder S. Kanwal · Nicolas Ballas · Asja Fischer · Emmanuel Bengio · Devansh Arpit · Tegan Maharaj · Aaron Courville · Simon Lacoste-Julien -
2017 Talk: A Closer Look at Memorization in Deep Networks »
David Krueger · Yoshua Bengio · Stanislaw Jastrzebski · Maxinder S. Kanwal · Nicolas Ballas · Asja Fischer · Emmanuel Bengio · Devansh Arpit · Tegan Maharaj · Aaron Courville · Simon Lacoste-Julien -
2017 Poster: Learning the Structure of Generative Models without Labeled Data »
Stephen Bach · Bryan He · Alexander J Ratner · Christopher Re -
2017 Poster: Learning Hierarchical Features from Deep Generative Models »
Shengjia Zhao · Jiaming Song · Stefano Ermon -
2017 Talk: Learning Hierarchical Features from Deep Generative Models »
Shengjia Zhao · Jiaming Song · Stefano Ermon -
2017 Talk: Learning the Structure of Generative Models without Labeled Data »
Stephen Bach · Bryan He · Alexander J Ratner · Christopher Re -
2017 Talk: Sharp Minima Can Generalize For Deep Nets »
Laurent Dinh · Razvan Pascanu · Samy Bengio · Yoshua Bengio