Workshop
The Many Facets of Preference-Based Learning
Aadirupa Saha · Mohammad Ghavamzadeh · Robert Busa-Fekete · Branislav Kveton · Viktor Bengs
Meeting Room 316 AB
Fri 28 Jul, noon PDT
Learning from human preferences or preference-based learning has been critical to major advancements in AI and machine learning. Since human beings are naturally more reliable at providing feedback on a relative scale compared to numerical values, collecting preference feedback is more budget-friendly and involves less bias. The broad objective of this workshop is twofold:1) Bring together different communities where preference-based learning has played a major role. This includes dueling bandits, multi-agent games, econometrics, social choice theory, reinforcement learning, optimization, robotics and many more, for which we aim to create a suitable forum to exchange techniques, ideas, learn from each other and potentially create new and innovative research questions. 2) Connect theory to practice by identifying real-world systems which can benefit from incorporating preference feedback, such as marketing, revenue management, search engine optimization, recommender systems, healthcare, language modeling, interactive chatbots, text summarization, robotics, and so on.We will consider our workshop a success if it inspires researchers to embark on novel insights in the general area of preference-based learning: Bringing attention from different communities to foster dissemination, cross-fertilization and discussion at scale. Especially, building bridges between experimental researchers and theorists towards developing better models and practical algorithms, and encouraging participants to propose, sketch, and discuss new starting points, questions or applications.
Schedule
Fri 12:00 p.m. - 12:05 p.m.
|
Opening Remarks
(
Opening Remarks
)
>
SlidesLive Video |
🔗 |
Fri 12:05 p.m. - 12:35 p.m.
|
MNL-Bandit: Sequential Learning Approach to Assortment Selection
(
Invited Talk
)
>
SlidesLive Video |
Vineet Goyal 🔗 |
Fri 12:35 p.m. - 1:05 p.m.
|
Aligning Robots with Human Preferences
(
Invited Talk
)
>
SlidesLive Video |
Dorsa Sadigh 🔗 |
Fri 1:05 p.m. - 1:50 p.m.
|
1st Poster Session
(
Poster Session
)
>
|
🔗 |
Fri 1:50 p.m. - 2:20 p.m.
|
Learning from Pairwise Preferences: From Search Rankings to ChatBots
(
Invited Talk
)
>
SlidesLive Video |
Thorsten Joachims 🔗 |
Fri 2:20 p.m. - 2:50 p.m.
|
Eliciting Human Judgments for Moral Artificial Intelligence
(
Invited Talk
)
>
SlidesLive Video |
Vincent Conitzer 🔗 |
Fri 2:50 p.m. - 3:05 p.m.
|
Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity
(
Oral
)
>
link
SlidesLive Video |
Charlie Hou · Kiran Thekumparampil · Michael Shavlovsky · Giulia Fanti · Yesh Dattatreya · Sujay Sanghavi 🔗 |
Fri 3:05 p.m. - 3:20 p.m.
|
Zeroth-Order Optimization Meets Human Feedback: Provable Learning via Ranking Oracles
(
Oral
)
>
link
SlidesLive Video |
Zhiwei Tang · Dmitry Rybin · Tsung-Hui Chang 🔗 |
Fri 3:20 p.m. - 4:15 p.m.
|
2nd Poster Session
(
Poster session
)
>
|
🔗 |
Fri 4:15 p.m. - 4:45 p.m.
|
Vignettes on Pairwise-Feedback Mechanisms for Learning with Uncertain Preferences
(
Invited Talk
)
>
SlidesLive Video |
Sanmi Koyejo 🔗 |
Fri 4:45 p.m. - 5:15 p.m.
|
Efficient Optimization with Many Objectives
(
Invited Talk
)
>
SlidesLive Video |
Eytan Bakshy 🔗 |
Fri 5:15 p.m. - 5:30 p.m.
|
Preference Proxies: Evaluating Large Language Models in capturing Human Preferences in Human-AI Tasks
(
Oral
)
>
link
SlidesLive Video |
Mudit Verma · Siddhant Bhambri · Subbarao Kambhampati 🔗 |
Fri 5:30 p.m. - 5:45 p.m.
|
Learning Optimal Advantage from Preferences and Mistaking it for Reward
(
Oral
)
>
link
SlidesLive Video |
William Knox · Stephane Hatgis-Kessell · Sigurdur Adalgeirsson · Serena Booth · Anca Dragan · Peter Stone · Scott Niekum 🔗 |
Fri 5:45 p.m. - 6:30 p.m.
|
3rd Poster Session
(
Poster session
)
>
|
🔗 |
Fri 6:30 p.m. - 7:00 p.m.
|
Dueling Bandits for Online Preference Learning
(
Invited Talk
)
>
SlidesLive Video |
Yisong Yue 🔗 |
Fri 7:00 p.m. - 7:30 p.m.
|
Is RLHF More Difficult than Standard RL?
(
Invited Talk
)
>
SlidesLive Video |
Chi Jin 🔗 |
Fri 7:30 p.m. - 7:45 p.m.
|
Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons
(
Oral
)
>
link
SlidesLive Video |
Banghua Zhu · Michael Jordan · Jiantao Jiao 🔗 |
Fri 7:45 p.m. - 8:00 p.m.
|
How to Query Human Feedback Efficiently in RL?
(
Oral
)
>
link
SlidesLive Video |
Wenhao Zhan · Masatoshi Uehara · Wen Sun · Jason Lee 🔗 |
-
|
Multi-Objective Agency Requires Non-Markovian Rewards ( Poster ) > link | Silviu Pitis 🔗 |
-
|
Failure Modes of Learning Reward Models for LLMs and other Sequence Models ( Poster ) > link | Silviu Pitis 🔗 |
-
|
Video-Guided Skill Discovery ( Poster ) > link | Manan Tomar · Dibya Ghosh · Vivek Myers · Anca Dragan · Matthew Taylor · Philip Bachman · Sergey Levine 🔗 |
-
|
Learning from Pairwise Comparisons Under Preference Reversals ( Poster ) > link | Abdul Bakey Mir · Arun Rajkumar 🔗 |
-
|
Randomized Smoothing (almost) in Real Time? ( Poster ) > link | Emmanouil Seferis 🔗 |
-
|
Exploiting Action Distances for Reward Learning from Human Preferences ( Poster ) > link | Mudit Verma · Siddhant Bhambri · Subbarao Kambhampati 🔗 |
-
|
Reward Collapse in Aligning Large Language Models: A Prompt-Aware Approach to Preference Rankings ( Poster ) > link | Ziang Song · Tianle Cai · Jason Lee · Weijie Su 🔗 |
-
|
Direct Preference Optimization: Your Language Model is Secretly a Reward Model ( Poster ) > link | Rafael Rafailov · Archit Sharma · Eric Mitchell · Stefano Ermon · Christopher Manning · Chelsea Finn 🔗 |
-
|
Ranking with Abstention ( Poster ) > link | Anqi Mao · Mehryar Mohri · Yutao Zhong 🔗 |
-
|
Learning Higher Order Skills that Efficiently Compose ( Poster ) > link | Anthony Liu · Dong Ki Kim · Sungryull Sohn · Honglak Lee 🔗 |
-
|
DIP-RL: Demonstration-Inferred Preference Learning in Minecraft ( Poster ) > link | Ellen Novoseller · Vinicius G. Goecks · David Watkins · Josh Miller · Nicholas Waytowich 🔗 |
-
|
Differentially Private Reward Estimation from Preference Based Feedback ( Poster ) > link | Sayak Ray Chowdhury · Xingyu Zhou 🔗 |
-
|
Intention is what you need to estimate: Attention-driven prediction of goal pose in a human-centric telemanipulation of a robotic hand ( Poster ) > link | MUNEEB AHMED · Rajesh Kumar · Arzad Kherani · Brejesh Lall 🔗 |
-
|
Representation Learning in Low-rank Slate-based Recommender Systems ( Poster ) > link | Yijia Dai · Wen Sun 🔗 |
-
|
Borda Regret Minimization for Generalized Linear Dueling Bandits ( Poster ) > link | Yue Wu · Tao Jin · Qiwei Di · Hao Lou · Farzad Farnoud · Quanquan Gu 🔗 |
-
|
Learning Populations of Preferences via Pairwise Comparison Queries ( Poster ) > link | Gokcan Tatli · Yi Chen · Ramya Vinayak 🔗 |
-
|
A Ranking Game for Imitation Learning ( Poster ) > link | Harshit Sikchi · Akanksha Saran · Wonjoon Goo · Scott Niekum 🔗 |
-
|
AdaptiveRec: Adaptively Construct Pairs for Contrastive Learning in Sequential Recommendation ( Poster ) > link | Jae Heyoung Jeon · Jung Hyun Ryu · Jewoong Cho · Myungjoo Kang 🔗 |
-
|
Perceptual adjustment queries: An inverted measurement paradigm for low-rank metric learning ( Poster ) > link | Austin Xu · Andrew McRae · Jingyan Wang · Mark Davenport · Ashwin Pananjady 🔗 |
-
|
Rating-based Reinforcement Learning ( Poster ) > link | Devin White · Mingkang Wu · Ellen Novoseller · Vernon Lawhern · Nicholas Waytowich · Yongcan Cao 🔗 |
-
|
HIP-RL: Hallucinated Inputs for Preference-based Reinforcement Learning in Continuous Domains ( Poster ) > link | Chen Bo Calvin Zhang · Giorgia Ramponi 🔗 |
-
|
Fairness in Preference-based Reinforcement Learning ( Poster ) > link | Umer Siddique · Abhinav Sinha · Yongcan Cao 🔗 |
-
|
Optimal Scalarizations for Sublinear Hypervolume Regret ( Poster ) > link | Richard Zhang 🔗 |
-
|
Fisher-Weighted Merge of Contrastive Learning Models in Sequential Recommendation ( Poster ) > link | Jung Hyun Ryu · Jae Heyoung Jeon · Jewoong Cho · Myungjoo Kang 🔗 |
-
|
Thomas: Learning to Explore Human Preference via Probabilistic Reward Model ( Poster ) > link | Sang Truong · Duc Nguyen · Tho Quan · Sanmi Koyejo 🔗 |
-
|
Two-Sided Bandit Learning in Fully-Decentralized Matching Markets ( Poster ) > link | Tejas Pagare · Avishek Ghosh 🔗 |
-
|
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning ( Poster ) > link | Mitsuhiko Nakamoto · Yuexiang Zhai · Anikait Singh · Max Sobol Mark · Yi Ma · Chelsea Finn · Aviral Kumar · Sergey Levine 🔗 |
-
|
Preferential Multi-Attribute Bayesian Optimization with Application to Exoskeleton Personalization ( Poster ) > link | Raul Astudillo · Amy Li · Maegan Tucker · Chu Xin Cheng · Aaron Ames · Yisong Yue 🔗 |
-
|
Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games ( Poster ) > link | Yang Cai · Haipeng Luo · Chen-Yu Wei · Weiqiang Zheng 🔗 |
-
|
Predict-then-Optimize v/s Probabilistic Approximations: Tackling Uncertainties and Error Propagation ( Poster ) > link | Priya Shanmugasundaram · Saurabh Jha · Kumar Muthuraman 🔗 |
-
|
Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight ( Poster ) > link | Jiacheng Guo · Minshuo Chen · Huan Wang · Caiming Xiong · Mengdi Wang · Yu Bai 🔗 |
-
|
Rethinking Incentives in Recommender Systems: Are Monotone Rewards Always Beneficial? ( Poster ) > link | Fan Yao · Chuanhao Li · Karthik Abinav Sankararaman · Yiming Liao · Yan Zhu · Qifan Wang · Hongning Wang · Haifeng Xu 🔗 |
-
|
Learning Formal Specifications from Membership and Preference Queries ( Poster ) > link | Ameesh Shah · Marcell Vazquez-Chanlatte · Sebastian Junges · Sanjit Seshia 🔗 |
-
|
Kernelized Offline Contextual Dueling Bandits ( Poster ) > link | Viraj Mehta · Ojash Neopane · Vikramjeet Das · Sen Lin · Jeff Schneider · Willie Neiswanger 🔗 |
-
|
Preference Elicitation for Music Recommendations ( Poster ) > link | Ofer Meshi · Jon Feldman · Li Yang · Ben Scheetz · Yanli Cai · Mohammad Hossein Bateni · Corbyn Salisbury · Vikram Aggarwal · Craig Boutilier 🔗 |
-
|
SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits ( Poster ) > link | Subhojyoti Mukherjee · Qiaomin Xie · Josiah Hanna · Robert Nowak 🔗 |
-
|
Augmenting Bayesian Optimization with Preference-based Expert Feedback ( Poster ) > link | Daolang Huang · Louis Filstroff · Petrus Mikkola · Runkai Zheng · Milica Todorovic · Samuel Kaski 🔗 |
-
|
A Head Start Matters: Dynamic-Calibrated Representation Alignment and Uniformity for Recommendations ( Poster ) > link | Zhongyu Ouyang · Shifu Hou · Chunhui Zhang · Chuxu Zhang · Yanfang Ye 🔗 |
-
|
Robustness of Inverse Reinforcement Learning ( Poster ) > link | Ezgi Korkmaz 🔗 |
-
|
Training Diffusion Models with Reinforcement Learning ( Poster ) > link | Kevin Black · Michael Janner · Yilun Du · Ilya Kostrikov · Sergey Levine 🔗 |
-
|
Optimistic Thompson Sampling for No-Regret Learning in Unknown Games ( Poster ) > link | Yingru Li · Liangqi LIU · Wenqiang Pu · Zhi-Quan Luo 🔗 |
-
|
Extracting Reward Functions from Diffusion Models ( Poster ) > link | Felipe Nuti · Tim Franzmeyer · Joao Henriques 🔗 |
-
|
Optimizing Chatbot Fallback Intent Selections with Reinforcement Learning ( Poster ) > link | Jeremy Curuksu 🔗 |
-
|
Query-Policy Misalignment in Preference-Based Reinforcement Learning ( Poster ) > link | Xiao Hu · Jianxiong Li · Xianyuan Zhan · Qing-Shan Jia · Ya-Qin Zhang 🔗 |
-
|
Distinguishing Feature Model for Learning From Pairwise Comparisons ( Poster ) > link | Elisha Parhi · Arun Rajkumar 🔗 |
-
|
Specifying Behavior Preference with Tiered Reward Functions ( Poster ) > link | Zhiyuan Zhou · Henry Sowerby · Michael L. Littman 🔗 |
-
|
Who to imitate: Imitating desired behavior from diverse multi-agent datasets ( Poster ) > link | Tim Franzmeyer · Jakob Foerster · Edith Elkind · Phil Torr · Joao Henriques 🔗 |
-
|
Competing Bandits in Non-Stationary Matching Markets ( Poster ) > link | Avishek Ghosh · Abishek Sankararaman · Kannan Ramchandran · Tara Javidi · Arya Mazumdar 🔗 |
-
|
Strategic Apple Tasting ( Poster ) > link | Keegan Harris · Chara Podimata · Steven Wu 🔗 |
-
|
Strategyproof Decision-Making in Panel Data Settings and Beyond ( Poster ) > link | Keegan Harris · Anish Agarwal · Chara Podimata · Steven Wu 🔗 |
-
|
Provable Offline Reinforcement Learning with Human Feedback ( Poster ) > link | Wenhao Zhan · Masatoshi Uehara · Nathan Kallus · Jason Lee · Wen Sun 🔗 |
-
|
Contextual Bandits and Imitation Learning with Preference-Based Active Queries ( Poster ) > link | Ayush Sekhari · Karthik Sridharan · Wen Sun · Runzhe Wu 🔗 |
-
|
Inverse Game Theory for Stackelberg Games: the Blessing of Bounded Rationality ( Poster ) > link | Jibang Wu · Weiran Shen · Fei Fang · Haifeng Xu 🔗 |
-
|
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards ( Poster ) > link | Alexandre Rame · Guillaume Couairon · Corentin Dancette · Jean-Baptiste Gaya · Mustafa Shukor · Laure Soulier · Matthieu Cord 🔗 |
-
|
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism ( Poster ) > link | Zihao Li 🔗 |