Skip to yearly menu bar Skip to main content


Reinforcement learning with Human Feedback: Learning Dynamic Choices via Pessimism

Zihao Li · Zhuoran Yang · Mengdi Wang

Abstract

Video

Chat is not available.