Poster
Visual-Language Models as Fuzzy Rewards for Reinforcement Learning
Yuwei Fu · Haichao Zhang · di wu · Wei Xu · Benoit Boulet
[
Abstract
]
Abstract:
In this work, we investigate how to leverage pretrained visual-language models (VLM) for online Reinforcement Learning (RL). In particular, we focus on sparse reward tasks with a predefined textual task description. We first point out the problem of reward misalignment in applying VLM as rewards to RL tasks. As a remedy, we introduce a lightweight fine-tuning method, named Fuzzy VLM reward-aided RL (FuRL), based on reward alignment and relay RL. Experiments on the benchmark tasks showcase the efficacy of the proposed method. Code will be released at: https://github.com/Anonymous/FuRL.
Live content is unavailable. Log in and register to view live content