Poster
in
Workshop: Foundations of Reinforcement Learning and Control: Connections and Perspectives
Pink Noise LQR: How does Colored Noise affect the Optimal Policy in RL?
Jakob Hollenstein · Marko Zaric · Samuele Tosatto · Justus Piater
Colored noise, a class of temporally correlated noise processes, hasshown promising results for improving exploration in deepreinforcement learning for both off-policy and on-policyalgorithms. However, It is unclear how temporally correlated colorednoise affects policy learning apart from changing explorationproperties. In this paper, we investigate the influence of colorednoise on the optimal policy in a simplified linear quadratic regulator(LQR) setting. We show that the expectedtrajectory remains independent of the noise color for a given linear policy. We derive a closed-form solution for the expected cost and find that the noise affectsboth the expected cost and the optimal policy. The cost splitsinto two parts: a state-cost part equaling the cost for theunperturbed system and a noise-cost term independent of the initialstate. Far from the goal state, the state cost dominates, and theeffect due to the noise is negligible: the policy approaches theoptimal policy of the unperturbed system. Near the goal state, the noise cost dominates, changing the optimal policy.