Learning Attribute–Affordance Hierarchies in Hyperbolic Space for Open-Vocabulary 3D Object Affordance Grounding
Abstract
This paper pays attention to open-vocabulary 3D object affordance grounding (OVAG), which aims to localize affordance regions on 3D objects by leveraging interaction images or textual instructions. Most existing methods treat interaction images as sources of external affordance knowledge and align them with 3D visual representations, while overlooking the intrinsic relationship between local object attributes and affordances, which limits localization accuracy and generalization. For instance, a cup handle affords grasping due to its curved shape and appropriate thickness, indicating that affordances emerge from specific attribute compositions rather than global object appearance. Motivated by this, we propose Attribute-Affordance Hierarchies (AAH) learning framework that explicitly models the hierarchical relationships between object-region attributes and affordances. Our approach first captures local region relationships using hypergraph, and then projects these region-level concepts into a hyperbolic space to encode their hierarchical organization. Furthermore, we introduce counterfactual attribute samples to encourage robust learning of attribute–affordance dependencies under varying conditions. By jointly modeling visual structure and hierarchical concept information, our method achieves more accurate affordance localization. Extensive experiments and qualitative analyses demonstrate the effectiveness of our approach.