Timezone: »

 
Poster
Characterizing and Overcoming the Greedy Nature of Learning in Multi-modal Deep Neural Networks
Nan Wu · Stanislaw Jastrzebski · Kyunghyun Cho · Krzysztof J Geras

Wed Jul 20 03:30 PM -- 05:30 PM (PDT) @ Hall E #332

We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, ModelNet40, and NVIDIA Dynamic Hand Gesture.

Author Information

Nan Wu (New York University)
Stanislaw Jastrzebski (Molecule.one / Jagiellonian University)
Kyunghyun Cho (New York University, Genentech)
Kyunghyun Cho

Kyunghyun Cho is an associate professor of computer science and data science at New York University and CIFAR Fellow of Learning in Machines & Brains. He is also a senior director of frontier research at the Prescient Design team within Genentech Research & Early Development (gRED). He was a research scientist at Facebook AI Research from June 2017 to May 2020 and a postdoctoral fellow at University of Montreal until Summer 2015 under the supervision of Prof. Yoshua Bengio, after receiving MSc and PhD degrees from Aalto University April 2011 and April 2014, respectively, under the supervision of Prof. Juha Karhunen, Dr. Tapani Raiko and Dr. Alexander Ilin. He received the Samsung Ho-Am Prize in Engineering in 2021. He tries his best to find a balance among machine learning, natural language processing, and life, but almost always fails to do so.

Krzysztof J Geras (New York University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors

  • 2021 : True Few-Shot Learning with Language Models »
    Ethan Perez · Douwe Kiela · Kyunghyun Cho
  • 2022 : Linear Connectivity Reveals Generalization Strategies »
    Jeevesh Juneja · Rachit Bansal · Kyunghyun Cho · João Sedoc · Naomi Saphra
  • 2023 Poster: Towards Understanding and Improving GFlowNet Training »
    Max Shen · Emmanuel Bengio · Ehsan Hajiramezanali · Andreas Loukas · Kyunghyun Cho · Tommaso Biancalani
  • 2021 Poster: Rissanen Data Analysis: Examining Dataset Characteristics via Description Length »
    Ethan Perez · Douwe Kiela · Kyunghyun Cho
  • 2021 Spotlight: Rissanen Data Analysis: Examining Dataset Characteristics via Description Length »
    Ethan Perez · Douwe Kiela · Kyunghyun Cho
  • 2021 Poster: Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization »
    Stanislaw Jastrzebski · Devansh Arpit · Oliver Astrand · Giancarlo Kerg · Huan Wang · Caiming Xiong · Richard Socher · Kyunghyun Cho · Krzysztof J Geras
  • 2021 Spotlight: Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization »
    Stanislaw Jastrzebski · Devansh Arpit · Oliver Astrand · Giancarlo Kerg · Huan Wang · Caiming Xiong · Richard Socher · Kyunghyun Cho · Krzysztof J Geras
  • 2019 : Poster Session »
    Boli Fang · Ananth Balashankar · Sonam Damani · Emma Beauxis-Aussalet · Nan Wu · Elizabeth Bondi · Marc Rußwurm · David Ruhe · Nripsuta Saxena · Katie Spoon
  • 2019 : Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening »
    Krzysztof J Geras · Nan Wu
  • 2019 Workshop: Workshop on Multi-Task and Lifelong Reinforcement Learning »
    Sarath Chandar · Shagun Sodhani · Khimya Khetarpal · Tom Zahavy · Daniel J. Mankowitz · Shie Mannor · Balaraman Ravindran · Doina Precup · Chelsea Finn · Abhishek Gupta · Amy Zhang · Kyunghyun Cho · Andrei A Rusu · Facebook Rob Fergus
  • 2019 : Poster discussion »
    Roman Novak · Maxime Gabella · Frederic Dreyer · Siavash Golkar · Anh Tong · Irina Higgins · Mirco Milletari · Joe Antognini · Sebastian Goldt · Adín Ramírez Rivera · Roberto Bondesan · Ryo Karakida · Remi Tachet des Combes · Michael Mahoney · Nicholas Walker · Stanislav Fort · Samuel Smith · Rohan Ghosh · Aristide Baratin · Diego Granziol · Stephen Roberts · Dmitry Vetrov · Andrew Wilson · César Laurent · Valentin Thomas · Simon Lacoste-Julien · Dar Gilboa · Daniel Soudry · Anupam Gupta · Anirudh Goyal · Yoshua Bengio · Erich Elsen · Soham De · Stanislaw Jastrzebski · Charles H Martin · Samira Shabanian · Aaron Courville · Shorato Akaho · Lenka Zdeborova · Ethan Dyer · Maurice Weiler · Pim de Haan · Taco Cohen · Max Welling · Ping Luo · zhanglin peng · Nasim Rahaman · Loic Matthey · Danilo J. Rezende · Jaesik Choi · Kyle Cranmer · Lechao Xiao · Jaehoon Lee · Yasaman Bahri · Jeffrey Pennington · Greg Yang · Jiri Hron · Jascha Sohl-Dickstein · Guy Gur-Ari
  • 2019 : Poster Session 1 (all papers) »
    Matilde Gargiani · Yochai Zur · Chaim Baskin · Evgenii Zheltonozhskii · Liam Li · Ameet Talwalkar · Xuedong Shang · Harkirat Singh Behl · Atilim Gunes Baydin · Ivo Couckuyt · Tom Dhaene · Chieh Lin · Wei Wei · Min Sun · Orchid Majumder · Michele Donini · Yoshihiko Ozaki · Ryan P. Adams · Christian Geißler · Ping Luo · zhanglin peng · · Ruimao Zhang · John Langford · Rich Caruana · Debadeepta Dey · Charles Weill · Xavi Gonzalvo · Scott Yang · Scott Yak · Eugen Hotaj · Vladimir Macko · Mehryar Mohri · Corinna Cortes · Stefan Webb · Jonathan Chen · Martin Jankowiak · Noah Goodman · Aaron Klein · Frank Hutter · Mojan Javaheripi · Mohammad Samragh · Sungbin Lim · Taesup Kim · SUNGWOONG KIM · Michael Volpp · Iddo Drori · Yamuna Krishnamurthy · Kyunghyun Cho · Stanislaw Jastrzebski · Quentin de Laroussilhe · Mingxing Tan · Xiao Ma · Neil Houlsby · Andrea Gesmundo · Zalán Borsos · Krzysztof Maziarz · Felipe Petroski Such · Joel Lehman · Kenneth Stanley · Jeff Clune · Pieter Gijsbers · Joaquin Vanschoren · Felix Mohr · Eyke Hüllermeier · Zheng Xiong · Wenpeng Zhang · Wenwu Zhu · Weijia Shao · Aleksandra Faust · Michal Valko · Michael Y Li · Hugo Jair Escalante · Marcel Wever · Andrey Khorlin · Tara Javidi · Anthony Francis · Saurajit Mukherjee · Jungtaek Kim · Michael McCourt · Saehoon Kim · Tackgeun You · Seungjin Choi · Nicolas Knudde · Alexander Tornede · Ghassen Jerfel
  • 2019 Poster: Non-Monotonic Sequential Text Generation »
    Sean Welleck · Kiante Brantley · Hal Daumé III · Kyunghyun Cho
  • 2019 Poster: Parameter-Efficient Transfer Learning for NLP »
    Neil Houlsby · Andrei Giurgiu · Stanislaw Jastrzebski · Bruna Morrone · Quentin de Laroussilhe · Andrea Gesmundo · Mona Attariyan · Sylvain Gelly
  • 2019 Oral: Non-Monotonic Sequential Text Generation »
    Sean Welleck · Kiante Brantley · Hal Daumé III · Kyunghyun Cho
  • 2019 Oral: Parameter-Efficient Transfer Learning for NLP »
    Neil Houlsby · Andrei Giurgiu · Stanislaw Jastrzebski · Bruna Morrone · Quentin de Laroussilhe · Andrea Gesmundo · Mona Attariyan · Sylvain Gelly
  • 2017 Poster: A Closer Look at Memorization in Deep Networks »
    David Krueger · Yoshua Bengio · Stanislaw Jastrzebski · Maxinder S. Kanwal · Nicolas Ballas · Asja Fischer · Emmanuel Bengio · Devansh Arpit · Tegan Maharaj · Aaron Courville · Simon Lacoste-Julien
  • 2017 Talk: A Closer Look at Memorization in Deep Networks »
    David Krueger · Yoshua Bengio · Stanislaw Jastrzebski · Maxinder S. Kanwal · Nicolas Ballas · Asja Fischer · Emmanuel Bengio · Devansh Arpit · Tegan Maharaj · Aaron Courville · Simon Lacoste-Julien