Oral
Exploration Conscious Reinforcement Learning Revisited
Lior Shani · Yonathan Efroni · Shie Mannor

Wed Jun 12th 05:10 -- 05:15 PM @ Room 104

The Exploration-Exploitation tradeoff is one of the main problems of Reinforcement Learning. In practice, this tradeoff is resolved by using some inherent exploration mechanism, such as the $\epsilon$-greedy exploration or adding Gaussian action noise, while still trying to learn an optimal policy. We take a different approach, defining a surrogate optimality objective: an optimal policy with respect to the exploration scheme. As we show throughout the paper, although solving this criterion does not necessarily lead to an optimal policy, the problem becomes easier to solve. We continue by analyzing this notion of optimality, devise algorithms derived from this approach, which reveal connections to existing work, and test them empirically on tabular and deep Reinforcement Learning domains.

Author Information

Lior Shani (Technion)
Yonathan Efroni (Technion)
Shie Mannor (Technion)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors