Workshop Sat, Jul 11, 2026 • 8:00 AM – 5:00 PM KST ASEM Ballroom 201

Efficient Multimodal Question Answering

Jordan L Boyd-Graber ⋅ Martin Fajčík

Project Page [ OpenReview]

Abstract

Efficient multimodal question answering is becoming increasingly important as large language models expand into real-world settings where users rely on systems that must operate under constraints of latency, cost, connectivity, and device resources. This workshop brings together researchers from machine learning, NLP, and information retrieval to explore methods for answering questions over text, images, tables, and audio while balancing accuracy with computational efficiency. Building on the success of the NeurIPS 2020 EfficientQA competition, we highlight retrieval-augmented and hybrid generative–extractive approaches, multimodal reasoning under resource limits, and evaluation frameworks that incorporate human oversight. The workshop will feature invited talks, a shared task on efficient multimodal QA, poster sessions, and an exciting live human–computer question answering event designed to engage both participants and spectators. Our goal is to advance practical, trustworthy QA systems that remain deployable across diverse domains and global contexts.