Semi-Supervised Gaze Estimation via Disentangled Subspace Contrastive Learning
Abstract
Current appearance-based gaze estimation suffers from poor generalization due to the scarcity of annotated samples and insufficient diversity of datasets. Leading methods have explored weakly supervised learning to generate large-scale pseudo-labeled data collected by unconstrained scenarios to mitigate domain shift in the wild. In this work, we devise a simple yet efficient semi-supervised contrastive learning framework to exploit unlabeled data for generalized gaze estimation, thereby reducing reliance on manual annotations. Our key insight is to leverage the Jacobian regularization constraint to disentangle representation into identifiable subspaces dedicated to specific gaze components, e.g., pitch and yaw angles. Then we exploit the inner ordinal ranking relationship for contrastive learning in each specific subspace to learn a robust gaze representation from labeled and unlabeled samples, which leads to our Disentangled Subspace Contrastive Learning (shortened to DSCL) framework. Extensive experiments across multiple benchmarks demonstrate that the proposed method is plug-and-play, which achieves competitive performance with 20%, 10%, and even 5% annotated data in both in-domain and cross-domain evaluations.