firstbacksecondback
54 Results
Poster
|
Tue 4:30 |
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models Siddharth Karamcheti · Suraj Nair · Ashwin Balakrishna · Percy Liang · Thomas Kollar · Dorsa Sadigh |
|
Oral
|
Tue 1:45 |
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization Yang Jin · Zhicheng Sun · Kun Xu · Kun Xu · Liwei Chen · Hao Jiang · Quzhe Huang · Chengru Song · Yuliang Liu · Di ZHANG · Yang Song · Kun Gai · Yadong Mu |
|
Poster
|
Wed 4:30 |
FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning Yuwei Fu · Haichao Zhang · di wu · Wei Xu · Benoit Boulet |
|
Poster
|
Wed 4:30 |
Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models Jinhao Li · Haopeng Li · Sarah Erfani · Lei Feng · James Bailey · Feng Liu |
|
Poster
|
Tue 2:30 |
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization Yang Jin · Zhicheng Sun · Kun Xu · Kun Xu · Liwei Chen · Hao Jiang · Quzhe Huang · Chengru Song · Yuliang Liu · Di ZHANG · Yang Song · Kun Gai · Yadong Mu |
|
Poster
|
Thu 2:30 |
Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning Shibo Jie · Yehui Tang · Ning Ding · Zhi-Hong Deng · Kai Han · Yunhe Wang |
|
Poster
|
Tue 2:30 |
Machine Vision Therapy: Multimodal Large Language Models Can Enhance Visual Robustness via Denoising In-Context Learning Zhuo Huang · Chang Liu · Yinpeng Dong · Hang Su · Shibao Zheng · Tongliang Liu |
|
Oral
|
Tue 8:15 |
Rejuvenating image-GPT as Strong Visual Representation Learners Sucheng Ren · Zeyu Wang · Hongru Zhu · Junfei Xiao · Alan Yuille · Cihang Xie |
|
Poster
|
Tue 2:30 |
IIANet: An Intra- and Inter-Modality Attention Network for Audio-Visual Speech Separation Kai Li · Runxuan Yang · Fuchun Sun · Xiaolin Hu |
|
Poster
|
Wed 4:30 |
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context yunxin li · Baotian Hu · Haoyuan Shi · Wei Wang · Longyue Wang · Min Zhang |
|
Poster
|
Thu 2:30 |
Unifying Image Processing as Visual Prompting Question Answering Yihao Liu · Xiangyu Chen · Xianzheng Ma · Xintao Wang · Jiantao Zhou · Yu Qiao · Chao Dong |
|
Poster
|
Thu 2:30 |
Don't trust your eyes: on the (un)reliability of feature visualizations Robert Geirhos · Roland S. Zimmermann · Blair Bilodeau · Wieland Brendel · Been Kim |