Skip to yearly menu bar Skip to main content


Do Vision Language Models infer human intention without visual perspective-taking? Towards a scalable "One-Image-Probe-All" dataset

Bingyang Wang · Yijiang Li · Qingyang Zhou · Hui Yi Leong · Tianwei Zhao · Letian Ye · Hokin Deng · Dezhi Luo · Nuno Vasconcelos

Abstract

Chat is not available.