DF-ExpEnse: Diffusion Filtered Exploration for Sample Efficient Finetuning
Abstract
A promising recipe towards intelligent robotic decision-making is the finetuning of pretrained generative control policies, which can summarize offline experience effectively through behavior cloning, with reinforcement learning techniques to adapt them to online experience. In this work we present Diffusion Filtered Exploration via Ensembles (DF-ExpEnse), an exploration technique that meaningfully improves the quality of online experience collection, thus increasing the sample efficiency of the finetuning procedure. DF-ExpEnse first leverages the multimodal modeling capability of the generative control policy to create an expressive and tractably evaluatable candidate set. Then, it utilizes an ensemble of critics to identify an action with high exploration interest that best balances quality with uncertainty. When instantiated in a parallelized fleet, DF-ExpEnse further utilizes cross-agent communication to facilitate collaborative exploration as a group. As it is only used for online experience collection, DF-ExpEnse can be seamlessly integrated on top of existing techniques that seek to finetune pretrained generative control policies via reinforcement learning. We experimentally validate consistent sample-efficiency benefits when using DF-ExpEnse for exploration over both manipulation and locomotion tasks, compared to default finetuning and alternative action selection schemes.