Invited Talk

Proxy objectives in reinforcement learning from human feedback

John Schulman

2023 Invited Talk

Abstract

Proxy objectives are a fundamental concept in machine learning. That is, there's a true objective that we care about, but it's hard to compute or estimate, so instead we construct a locally-valid approximation and optimize that. I will examine reinforcement from human feedback with this lens, as a chain of approximations, each of which can widen the gap between the desired and achieved result.

Speaker

John Schulman

John now leads a team working on ChatGPT and RL from Human Feedback at OpenAI, where he was a cofounder. His recent published work includes combining language models with retrieval (WebGPT) and scaling laws of RL and alignment. Earlier he developed some of the foundational methods of deep RL (TRPO, PPO). Before OpenAI, John got a PhD from UC Berkeley, advised by Pieter Abbeel. In his free time, he enjoys running, jazz piano, and raising chickens.

Video

Chat is not available.