Poster
in
Workshop: DMLR Workshop: Data-centric Machine Learning Research
Can Expert Demonstration Guarantee Offline Performance in Sparse Reward Environment?
Jeyeon Eo · Dongsu Lee · Minhae Kwon
The reinforcement learning paradigm has shifted from online to offline with the insight of supervised learning. Interestingly, we have empirically figured out that the expert demonstration dataset underperforms in the sparse reward environment. We conjecture that this result originates from the given dataset’s properties: reward ratio and trajectory diversity. Those properties can be associated with reward experience and trajectory stitching ability, which are significant factors in the sparse reward problem. This study investigates the aforementioned properties to deeper comprehend the dataset’s influence on offline performance in the sparse reward environment. Experiment results demonstrate that the offline RL performance is proportional to the product of reward ratio and trajectory diversity. Moreover, we have identified these two properties are in a trade-off.