Cross-Tactile Sensor Representation Learning
Abstract
Visuo-tactile sensors have been widely adopted in robotic manipulation. However, inherent heterogeneity in sensor designs hinders the learning of unified tactile representations in cross-sensor scenarios. Existing methods that focus on reconstruction or task-specific supervision often fail to capture the common information between different tactile sensors, particularly in the presence of substantial sensor variations, resulting in limited generalization to unseen sensors. To address this, we propose Cross-Tactile Sensor Representation Learning (CTSRL), a unified framework for sensor-agnostic tactile representation learning. CTSRL introduces a Cross-Sensor Modulator (CSM) to eliminate sensor-specific biases and adopts a two-stage learning paradigm: (1) leveraging aligned synthetic data for cross-sensor self-supervised learning to extract shared latent representations across sensor domains; and (2) integrating real-world multimodal tactile data to bridge the sim-to-real semantic gap through cross-modal alignment, thereby enriching representations with fine-grained semantic attributes. Experimental results show that our method demonstrates strong multi-sensor generalization, significantly improving sensor-agnostic representation learning.