Skip to yearly menu bar Skip to main content


Spotlight
in
Workshop: Continuous Time Perspectives in Machine Learning

Epsilon-Greedy Reinforcement Learning Policy in Continuous-Time Systems

Mohamad Kazem Shirani Faradonbeh


Abstract:

This work studies theoretical performance guarantees of a ubiquitous reinforcement learning policy for a canonicalcontinuous-time model. We show that epsilon-Greedy addresses the exploration-exploitation dilemma forminimizing quadratic costs in linear dynamical systems that evolve according to stochastic differential equations.More precisely, we establish square-root of time regret bounds, indicating that epsilon-Greedy learns optimalcontrol actions fast from a single state trajectory. Further, linear scaling of the regret with the number of parametersis shown. The presented analysis introduces novel and useful technical approaches, and sheds light on fundamentalchallenges of continuous-time reinforcement learning.

Chat is not available.