Automatic Shaping and Decomp osition of Reward Functions
Bhaskara Marthi - Massachusetts Institute of Technology, USA
This paper investigates the problem of automatically learning how to restructure the reward function of a Markov decision process so as to speed up reinforcement learning. We begin by describing a method that learns a shaped reward function given a set of state and temporal abstractions. Next, we consider decomposition of the per-timestep reward in multieffector problems, in which the overall agent can be decomposed into multiple units that are concurrently carrying out various tasks. We show by example that to find a good reward decomposition, it is often necessary to first shape the rewards appropriately. We then give a function approximation algorithm for solving both problems together. Standard reinforcement learning algorithms can be augmented with our methods, and we show experimentally that in each case, significantly faster learning results.