Skip to yearly menu bar Skip to main content


Poster
in
Workshop: RLxF: RL from World Feedback

Coherent Off-Policy Improvement of Large Behaviour Models with Learned Rewards

Christian Scherer ⋅ Joe Watson ⋅ Daniel Palenicek ⋅ Theo Gruner ⋅ Ingmar Posner ⋅ Jan Peters

Abstract

Log in and register to view live content