Poster
in
Workshop: Decision Awareness in Reinforcement Learning
Deep Policy Generators
Francesco Faccio · Vincent Herrmann · Aditya Ramesh · Louis Kirsch · Jürgen Schmidhuber
Traditional Reinforcement Learning (RL) learns policies that maximize expected return. Here we study neural nets (NNs) that learn to generate policies in form of context-specific weight matrices, similar to Fast Weight Programmers and other methods from the 1990s. Using context commands of the form "generate a policy that achieves a desired expected return," our NN generators combine powerful exploration of parameter space with greedy command choices to iteratively find better and better policies. A form of weight-sharing HyperNetworks and policy embeddings scales our method to generate deep NNs. Experiments show how a single learned policy generator can produce policies that achieve any return seen during training. Finally, we evaluate our algorithm on a set of continuous control tasks where it exhibits competitive performance.