firstbacksecondback
21 Results
Workshop
|
Fri 2:00 |
MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences Souradip Chakraborty · Jiahao Qiu · Hui Yuan · Alec Koppel · Furong Huang · Dinesh Manocha · Amrit Singh Bedi · Mengdi Wang |
|
Poster
|
Wed 4:30 |
MaxMin-RLHF: Alignment with Diverse Human Preferences Souradip Chakraborty · Jiahao Qiu · Hui Yuan · Alec Koppel · Dinesh Manocha · Furong Huang · Amrit Singh Bedi · Mengdi Wang |
|
Workshop
|
Fri 8:00 |
MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences Souradip Chakraborty · Jiahao Qiu · Hui Yuan · Alec Koppel · Furong Huang · Dinesh Manocha · Amrit Singh Bedi · Mengdi Wang |
|
Poster
|
Tue 2:30 |
Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization Yihan Du · Anna Winnicki · Gal Dalal · Shie Mannor · R Srikant |
|
Poster
|
Thu 2:30 |
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint Wei Xiong · Hanze Dong · Chenlu Ye · Ziqi Wang · Han Zhong · Heng Ji · Nan Jiang · Tong Zhang |
|
Poster
|
Tue 4:30 |
ODIN: Disentangled Reward Mitigates Hacking in RLHF Lichang Chen · Chen Zhu · Jiuhai Chen · Davit Soselia · Tianyi Zhou · Tom Goldstein · Heng Huang · Mohammad Shoeybi · Bryan Catanzaro |
|
Workshop
|
Fri 8:00 |
RLHF and IIA: Perverse Incentives Wanqiao Xu · Shi Dong · Xiuyuan Lu · Grace Lam · Zheng Wen · Benjamin Van Roy |
|
Workshop
|
Fri 1:00 |
RLHF and IIA: Perverse Incentives Wanqiao Xu · Shi Dong · Xiuyuan Lu · Grace Lam · Zheng Wen · Benjamin Van Roy |
|
Workshop
|
It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF TaiMing Lu · Lingfeng Shen · Xinyu Yang · Weiting Tan · Beidi Chen · Huaxiu Yao |
||
Workshop
|
A Critical Look At Tokenwise Reward-Guided Text Generation Ahmad Rashid · Ruotian Wu · Julia Grosse · Agustinus Kristiadi · Pascal Poupart |
||
Workshop
|
Active Preference Optimization for Sample Efficient RLHF Nirjhar Das · Souradip Chakraborty · Aldo Pacchiano · Sayak Ray Chowdhury |
||
Workshop
|
RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation Chanwoo Park · Mingyang Liu · Dingwen Kong · Kaiqing Zhang · Asuman Ozdaglar |