Poster
in
Workshop: ICML 2024 Workshop on Foundation Models in the Wild
Quantum 3D Visual Grounding: A Step Towards Quantum-inspired AI-Visualization
Adib Bazgir · Rama Madugula · Yuwen Zhang
Keywords: [ Visual and geometric information ] [ Quantum circuits ] [ Depth perception ] [ Object Detection ] [ Quantum Foundation Model ] [ Quantum 3D Visual Grounding ]
We introduce an advanced task of quantum 3D visual grounding in RGB images using language descriptions enriched with appearance and geometric information through quantum computing paradigms. In this work, we propose a framework which can enhance the existing classical 3D visual grounding techniques by leveraging the inherent parallelism and high-dimensional processing capabilities of quantum computing. This framework, Quantum3DVG, integrates quantum neural networks, including Quantum CNN (QCNN), Quantum Visual/Depth Encoder (QVDE), Quantum Text-Guided Visual/Depth Adapter (QTGVDA), and Quantum MLP (QMLP), to process both visual features and geometric data. At the heart of the proposed model, QVDE and QCNN encode image patches and depth information as quantum states, allowing for a high-level abstraction and quantum feature extraction. The QTGVDA is then re-envisioned as quantum circuit that refines these quantum states, employing quantum gates to align multi-scale visual and geometric features with textual descriptions. Finally, a quantum MLP is utilized for final object localization and classification.