RELO: Reinforcement Learning to Localize for Visual Object Tracking
Xin Chen ⋅ Chuanyu Sun ⋅ Jiao Xu ⋅ Houwen Peng ⋅ Dong Wang ⋅ Huchuan Lu ⋅ Kede Ma
Abstract
Existing one-stream Transformer-based visual trackers localize targets by training a classification head with a handcrafted spatial prior encoded as a heatmap. However, this heuristic supervision merely serves as a surrogate objective, which misaligns with evaluation metrics such as IoU and AUC. To address this limitation, we propose RELO, a reinforcement-learning tracking framework that formulates target localization as a decision-making problem within the Transformer-based tracking paradigm. Unlike prior-driven localization learning, RELO performs sequence-level reinforcement learning to optimize localization behavior using both instantaneous IoU and sequence-level AUC rewards, better aligning the training objective with real evaluation criteria. As a result, RELO not only eliminates the need for handcrafted heatmaps, but also achieves superior performance. For instance, RELO attains 57.5\% AUC on LaSOT$_\mathrm{ext}$ without template updates, establishing a new state-of-the-art performance. Code and models will be made available.
Successful Page Load