( events)   Timezone: »  
Sat Jun 15 08:30 AM -- 06:00 PM (PDT) @ 203
The How2 Challenge: New Tasks for Vision & Language
Florian Metze · Lucia Specia · Desmond Elliot · Loic Barrault · Ramon Sanabria · Shruti Palaskar

Research at the intersection of vision and language has been attracting a lot of attention in recent years. Topics include the study of multi-modal representations, translation between modalities, bootstrapping of labels from one modality into another, visually-grounded question answering, segmentation and storytelling, and grounding the meaning of language in visual data. An ever-increasing number of tasks and datasets are appearing around this recently-established field.

At NeurIPS 2018, we released the How2 data-set, containing more than 85,000 (2000h) videos, with audio, transcriptions, translations, and textual summaries. We believe it presents an ideal resource to bring together researchers working on the previously mentioned separate tasks around a single, large dataset. This rich dataset will facilitate the comparison of tools and algorithms, and hopefully foster the creation of additional annotations and tasks. We want to foster discussion about useful tasks, metrics, and labeling techniques, in order to develop a better understanding of the role and value of multi-modality in vision and language. We seek to create a venue to encourage collaboration between different sub-fields, and help establish new research directions and collaborations that we believe will sustain machine learning research for years to come.

Workshop Homepage

Welcome (Break)
The How2 Database and Challenge (Presentation)
Lucia Specia, Ramon Sanabria
Coffee Break (Break)
Forcing Vision + Language Models To Actually See, Not Just Talk (Invited Talk 1)
Devi Parikh
Topics in Vision and Language: Grounding, Segmentation and Author Anonymity (Invited Talk 2 (Bernt Schiele))
Learning to Reason: Modular and Relational Representations for Visual Questions and Referring Expressions (Invited Talk 3)
Kate Saenko
Multi-agent communication from raw perceptual input: what works, what doesn't and what's next (Invited Talk 4)
Angeliki Lazaridou
Overcoming Bias in Captioning Models (Invited Talk 5)
Lisa Anne Hendricks
Embodied language grounding (Invited Talk 6 (Katerina Fragkiadaki))
Katerina Fragkiadaki
Poster Session and Coffee (Poster Session and Break)
Ramon Sanabria, Tejas Srinivasan, Vikas Raunak, Luowei Zhou, Aradhyo Kundu, Roma Patel, Lucia Specia, Sang Keun Choe, Anna Belova
Unsupervised Bilingual Lexicon Induction from mono-lingual multimodal data (Invited Talk 7 (remote))
Qin Jin
New Directions for Vision & Language (Discussion Panel)
Florian Metze, Shruti Palaskar