Skip to yearly menu bar Skip to main content


Do Vision Language Models infer human intention without visual perspective-taking? Towards a scalable "One-Image-Probe-All" dataset

Bingyang Wang ⋅ Yijiang Li ⋅ Qingyang Zhou ⋅ Hui Yi Leong ⋅ Tianwei Zhao ⋅ Letian Ye ⋅ Hokin Deng ⋅ Dezhi Luo ⋅ Nuno Vasconcelos

Abstract

Chat is not available.