Skip to yearly menu bar Skip to main content


Reward Inside the Model: A Lightweight Hidden‑State Reward Model for LLM's Best-of-N sampling

Jizhou Guo · Zhaomin Wu · Philip Yu

Abstract

Chat is not available.