Skip to yearly menu bar Skip to main content


Poster

Plan Then Action: High-Level Planning Guidance Reinforcement Learning for LLM Reasoning

Zhihao Dou ⋅ Qinjian Zhao ⋅ Zhongwei Wan ⋅ Zhang Dinggen ⋅ Weida Wang ⋅ Benteng Chen ⋅ Towsif Raiyan ⋅ Qingtao Pan ⋅ Yang Ouyang ⋅ Chaoda Song ⋅ Zhiqiang Gao ⋅ shufei zhang ⋅ Sumon Biswas

Abstract

Log in and register to view live content