Skip to yearly menu bar Skip to main content


Poster

VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception

Zhaoliang Wan · Yonggen Ling · Senlin Yi · Lu Qi · Wang Lee · Minglei Lu · Sicheng Yang · Xiao Teng · Peng Lu · Xu Yang · Ming-Hsuan Yang · Hui Cheng

Hall C 4-9 #204
[ ] [ Project Page ] [ Paper PDF ]
[ Poster
Tue 23 Jul 4:30 a.m. PDT — 6 a.m. PDT

Abstract:

This paper addresses the scarcity of large-scale datasets for accurate object-in-hand pose estimation, which is crucial for robotic in-hand manipulation within the "Perception-Planning-Control" paradigm. Specifically, we introduce VinT-6D, the first extensive multi-modal dataset integrating vision, touch, and proprioception, to enhance robotic manipulation. VinT-6D comprises 2 million VinT-Sim and 0.1 million VinT-Real entries, collected via simulations in Mujoco and Blender and a custom-designed real-world platform. This dataset is tailored for robotic hands, offering models with whole-hand tactile perception and high-quality, well-aligned data. To the best of our knowledge, the VinT-Real is the largest considering the collection difficulties in the real-world environment so it can bridge the gap of simulation to real compared to the previous works. Built upon VinT-6D, we present a benchmark method that shows significant improvements in performance by fusing multi-modal information. The project is available at https://VinT-6D.github.io/.

Chat is not available.