Workshop
Text, camera, action! Frontiers in controllable video generation
Michal Geyer · Joanna Materzynska · Jack Parker-Holder · Yuge Shi · Trevor Darrell · Nando de Freitas · Antonio Torralba
Hall A8
Sat 27 Jul, midnight PDT
The past few years have seen the rapid development of Generative AI, with powerful foundation models demonstrating the ability to generate new, creative content in multiple modalities. Following breakthroughs in text and image generation, it is clear the next frontier lies in video. One challenging but compelling aspect unique to video generation is the various forms in which one could control such generation: from specifying the content of a video with text, to viewing a scene with different camera angles, or even directing the actions of characters within the video. We have also seen the use cases of these models diversify, with works that extend generation to 3D scenes, use such models to learn policies for robotics tasks or create an interactive environment for gameplay. Given the great variety of algorithmic approaches, the rapid progress, and the tremendous potential for applications, we believe now is the perfect time to engage the broader machine learning community in this exciting new research area. We thus propose the first workshop on Controllable Video Generation (CVG), focused on algorithms that can control videos with multiple modalities and frequencies, and the swathe of potential applications. We anticipate CVG would be uniquely relevant to ICML as it brings together a variety of different communities: from traditional computer vision, to safety and alignment, to those working on world models in a reinforcement learning or robotics setting. This makes ICML the perfect venue, where seemingly unrelated communities can join together and share ideas in this new emerging area of AI research.