Skip to yearly menu bar Skip to main content


Reinforcement learning with Human Feedback: Learning Dynamic Choices via Pessimism

Zihao Li ⋅ Zhuoran Yang ⋅ Mengdi Wang

Abstract

Video

Chat is not available.