Poster
in
Workshop: Sampling and Optimization in Discrete Space
Discrete Diffusion Reward Guidance Methods for Offline Reinforcement Learning
Matthew Coleman · Olga Russakovsky · Christine Allen-Blanchette · Ye Zhu
As reinforcement learning challenges involve larger amounts of data in different forms, new techniques will be required in order to generate high-quality plans with only a compact representation of the original information. While novel diffusion generative policies have provided a way to model complex action distributions directly in the original, high-dimensional feature space, they suffer from slow inference speed and have not yet been applied with reduced dimension or to discrete tasks. In this work, we propose three diffusion-guidance techniques with a reduced representation of the state provided by quantile discretization: a gradient-based approach, a stochastic beam search approach, and a Q-learning approach. Our findings indicate that the gradient-based and beam search approaches are capable of improving scores on an offline reinforcement learning task by a significant margin.