Spotlight
in
Workshop: Continuous Time Perspectives in Machine Learning
Epsilon-Greedy Reinforcement Learning Policy in Continuous-Time Systems
Mohamad Kazem Shirani Faradonbeh
Abstract:
This work studies theoretical performance guarantees of a ubiquitous reinforcement learning policy for a canonicalcontinuous-time model. We show that epsilon-Greedy addresses the exploration-exploitation dilemma forminimizing quadratic costs in linear dynamical systems that evolve according to stochastic differential equations.More precisely, we establish square-root of time regret bounds, indicating that epsilon-Greedy learns optimalcontrol actions fast from a single state trajectory. Further, linear scaling of the regret with the number of parametersis shown. The presented analysis introduces novel and useful technical approaches, and sheds light on fundamentalchallenges of continuous-time reinforcement learning.
Chat is not available.