Current-Inference Transformer Policies for Robust AUV Navigation from World Feedback
Abstract
Reinforcement learning (RL) from feedback is not limited to human preferences in physical systems, i.e., the environment itself provides feedback through progress signals, safety events, and action-outcome mismatches. We study this idea in autonomous underwater vehicle (AUV) navigation, where strong ocean currents create hidden disturbances that are not directly observed by the policy. Accordingly, we propose a current-inference transformer RL framework that infers latent current effects from observation-action histories and uses this inferred world feedback for residual control. A causal transformer encodes recent interaction history, an auxiliary head predicts the current vector using privileged simulator labels during training, and the actor conditions residual actions on the inferred disturbance representation. Our training results show decreasing current-estimation loss and mean absolute error, suggesting that the transformer learns current-related representations from observation-action histories. In an approximately 1.5x intensified-current stress test, the proposed policy achieves 65% success compared with 37% for a guidance-only controller, suggesting improved robustness when current disturbances exceed the nominal regime.