Oral
Exploration Conscious Reinforcement Learning Revisited
Lior Shani · Yonathan Efroni · Shie Mannor
Abstract:
The Exploration-Exploitation tradeoff is one of the main problems of Reinforcement Learning. In practice, this tradeoff is resolved by using some inherent exploration mechanism, such as the $\epsilon$-greedy exploration or adding Gaussian action noise, while still trying to learn an optimal policy. We take a different approach, defining a surrogate optimality objective: an optimal policy with respect to the exploration scheme. As we show throughout the paper, although solving this criterion does not necessarily lead to an optimal policy, the problem becomes easier to solve. We continue by analyzing this notion of optimality, devise algorithms derived from this approach, which reveal connections to existing work, and test them empirically on tabular and deep Reinforcement Learning domains.
Chat is not available.
Successful Page Load