ICML Epsilon-Greedy Reinforcement Learning Policy in Continuous-Time Systems

Spotlight
in
Workshop: Continuous Time Perspectives in Machine Learning

Epsilon-Greedy Reinforcement Learning Policy in Continuous-Time Systems

Mohamad Kazem Shirani Faradonbeh

[ Abstract ]

Abstract:

This work studies theoretical performance guarantees of a ubiquitous reinforcement learning policy for a canonicalcontinuous-time model. We show that epsilon-Greedy addresses the exploration-exploitation dilemma forminimizing quadratic costs in linear dynamical systems that evolve according to stochastic differential equations.More precisely, we establish square-root of time regret bounds, indicating that epsilon-Greedy learns optimalcontrol actions fast from a single state trajectory. Further, linear scaling of the regret with the number of parametersis shown. The presented analysis introduces novel and useful technical approaches, and sheds light on fundamentalchallenges of continuous-time reinforcement learning.

Chat is not available.

Spotlight in Workshop: Continuous Time Perspectives in Machine Learning

Epsilon-Greedy Reinforcement Learning Policy in Continuous-Time Systems

Mohamad Kazem Shirani Faradonbeh

Spotlight
in
Workshop: Continuous Time Perspectives in Machine Learning