BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement Learning
Abstract
Offline Reinforcement Learning (RL) relies on static datasets and often enforces conservative constraints to mitigate out-of-distribution errors, but this inevitably gives rise to learning dataset biases and limited behavioral generalization. Recent Data Augmentation (DA) methods leverage generative models to enrich offline data, yet they mainly operate within a single rollout paradigm and tend to preserve the original trajectory-level connectivity of the dataset. As a result, such methods often introduce local variations and fail to recover connections between distinct behavior patterns. In this paper, we propose Bidirectional Trajectory Diffusion (BiTrajDiff), a novel DA framework that explicitly addresses this limitation. BiTrajDiff decomposes trajectory synthesis into two independent diffusion processes that generate forward-future and backward-history segments conditioned on shared intermediate anchor states. By stitching the generated segments at these anchors, BiTrajDiff can synthesize trajectories that bridge disconnected behavior patterns and recover global trajectory-level connectivity absent from the original data. Extensive experiments on the D4RL benchmark demonstrate that BiTrajDiff consistently outperforms advanced DA methods across a range of offline RL backbones.