Invited Talk
in
Workshop: Text, camera, action! Frontiers in controllable video generation
Boyi Li - Leveraging LLMs to Imagine Like Humans by Aligning Representations from Vision and Language
Boyi Li
The machine learning community has embraced specialized models tailored to specific data domains. However, relying solely on a singular data type may constrain flexibility and generality, necessitating additional labeled data and limiting user interaction. Furthermore, existing content creation techniques often exhibit poor reasoning ability, even when trained with large datasets. To address these challenges, this talk will focus on building efficient intelligent systems that leverage language models to generate and edit images and videos, specifically in the areas of text-to-image and text-to-video generation. These findings effectively mitigate the limitations of current model setups and pave the way for multimodal representations that unify various signals within a single, comprehensive model.