Timezone: »
This work examines the challenges of training neural networks using vector quantization using straight-through estimation. We find that the main cause of training instability is the discrepancy between the model embedding and the code-vector distribution. We identify the factors that contribute to this issue, including the codebook gradient sparsity and the asymmetric nature of the commitment loss, which leads to misaligned code-vector assignments. We propose to address this issue via affine re-parameterization of the code vectors. Additionally, we introduce an alternating optimization to reduce the gradient error introduced by the straight-through estimation. Moreover, we propose an improvement to the commitment loss to ensure better alignment between the codebook representation and the model embedding. These optimization methods improve the mathematical approximation of the straight-through estimation and, ultimately, the model performance. We demonstrate the effectiveness of our methods on several common model architectures, such as AlexNet, ResNet, and ViT, across various tasks, including image classification and generative modeling.
Author Information
Minyoung Huh (MIT)
Brian Cheung (MIT)
Pulkit Agrawal (MIT)
Phillip Isola (MIT)
More from the Same Authors
-
2021 : Topological Experience Replay for Fast Q-Learning »
Zhang-Wei Hong · Tao Chen · Yen-Chen Lin · Joni Pajarinen · Pulkit Agrawal -
2021 : Topological Experience Replay for Fast Q-Learning »
Zhang-Wei Hong · Tao Chen · Yen-Chen Lin · Joni Pajarinen · Pulkit Agrawal -
2021 : Understanding the Generalization Gap in Visual Reinforcement Learning »
Anurag Ajay · Ge Yang · Ofir Nachum · Pulkit Agrawal -
2022 : Distributionally Adaptive Meta Reinforcement Learning »
Anurag Ajay · Dibya Ghosh · Sergey Levine · Pulkit Agrawal · Abhishek Gupta -
2022 : Distributionally Adaptive Meta Reinforcement Learning »
Anurag Ajay · Dibya Ghosh · Sergey Levine · Pulkit Agrawal · Abhishek Gupta -
2023 : Visual Dexterity: In-hand Dexterous Manipulation from Depth »
Tao Chen · Megha Tippur · Siyang Wu · Vikash Kumar · Edward Adelson · Pulkit Agrawal -
2023 : Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-loop feedback »
Marcel Torne Villasevil · Max Balsells I Pamies · Zihan Wang · Samedh Desai · Tao Chen · Pulkit Agrawal · Abhishek Gupta -
2023 Workshop: Challenges in Deployable Generative AI »
Swami Sankaranarayanan · Thomas Hartvigsen · Camille Bilodeau · Ryutaro Tanno · Cheng Zhang · Florian Tramer · Phillip Isola -
2023 Poster: Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation »
Zechu Li · Tao Chen · Zhang-Wei Hong · Anurag Ajay · Pulkit Agrawal -
2023 Poster: Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning »
Tongzhou Wang · Antonio Torralba · Phillip Isola · Amy Zhang -
2023 Poster: Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation »
Andi Peng · Aviv Netanyahu · Mark Ho · Tianmin Shu · Andreea Bobu · Julie Shah · Pulkit Agrawal -
2023 Poster: System Identification of Neural Systems: If We Got It Right, Would We Know? »
Yena Han · Tomaso A Poggio · Brian Cheung -
2023 Poster: Statistical Learning under Heterogenous Distribution Shift »
Max Simchowitz · Anurag Ajay · Pulkit Agrawal · Akshay Krishnamurthy -
2023 Poster: TGRL: An Algorithm for Teacher Guided Reinforcement Learning »
Idan Shenfeld · Zhang-Wei Hong · Aviv Tamar · Pulkit Agrawal -
2022 Poster: Discovering Generalizable Spatial Goal Representations via Graph-based Active Reward Learning »
Aviv Netanyahu · Tianmin Shu · Josh Tenenbaum · Pulkit Agrawal -
2022 Spotlight: Discovering Generalizable Spatial Goal Representations via Graph-based Active Reward Learning »
Aviv Netanyahu · Tianmin Shu · Josh Tenenbaum · Pulkit Agrawal -
2022 Poster: Denoised MDPs: Learning World Models Better Than the World Itself »
Tongzhou Wang · Simon Du · Antonio Torralba · Phillip Isola · Amy Zhang · Yuandong Tian -
2022 Spotlight: Denoised MDPs: Learning World Models Better Than the World Itself »
Tongzhou Wang · Simon Du · Antonio Torralba · Phillip Isola · Amy Zhang · Yuandong Tian -
2022 Poster: Offline RL Policies Should Be Trained to be Adaptive »
Dibya Ghosh · Anurag Ajay · Pulkit Agrawal · Sergey Levine -
2022 Oral: Offline RL Policies Should Be Trained to be Adaptive »
Dibya Ghosh · Anurag Ajay · Pulkit Agrawal · Sergey Levine -
2021 Workshop: Self-Supervised Learning for Reasoning and Perception »
Pengtao Xie · Shanghang Zhang · Ishan Misra · Pulkit Agrawal · Katerina Fragkiadaki · Ruisi Zhang · Tassilo Klein · Asli Celikyilmaz · Mihaela van der Schaar · Eric Xing -
2021 Poster: Learning Task Informed Abstractions »
Xiang Fu · Ge Yang · Pulkit Agrawal · Tommi Jaakkola -
2021 Spotlight: Learning Task Informed Abstractions »
Xiang Fu · Ge Yang · Pulkit Agrawal · Tommi Jaakkola -
2020 Poster: Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere »
Tongzhou Wang · Phillip Isola -
2018 Poster: Investigating Human Priors for Playing Video Games »
Rachit Dubey · Pulkit Agrawal · Deepak Pathak · Tom Griffiths · Alexei Efros -
2018 Oral: Investigating Human Priors for Playing Video Games »
Rachit Dubey · Pulkit Agrawal · Deepak Pathak · Tom Griffiths · Alexei Efros -
2017 Poster: Curiosity-driven Exploration by Self-supervised Prediction »
Deepak Pathak · Pulkit Agrawal · Alexei Efros · Trevor Darrell -
2017 Talk: Curiosity-driven Exploration by Self-supervised Prediction »
Deepak Pathak · Pulkit Agrawal · Alexei Efros · Trevor Darrell