Skip to yearly menu bar Skip to main content


Invited Talk
in
Workshop: Text, camera, action! Frontiers in controllable video generation

Boyi Li - Leveraging LLMs to Imagine Like Humans by Aligning Representations from Vision and Language

Boyi Li

[ ]
Sat 27 Jul 7:30 a.m. PDT — 8 a.m. PDT

Abstract:

The machine learning community has embraced specialized models tailored to specific data domains. However, relying solely on a singular data type may constrain flexibility and generality, necessitating additional labeled data and limiting user interaction. Furthermore, existing content creation techniques often exhibit poor reasoning ability, even when trained with large datasets. To address these challenges, this talk will focus on building efficient intelligent systems that leverage language models to generate and edit images and videos, specifically in the areas of text-to-image and text-to-video generation. These findings effectively mitigate the limitations of current model setups and pave the way for multimodal representations that unify various signals within a single, comprehensive model.

Chat is not available.