Workshop
The Many Facets of Preference-Based Learning
Aadirupa Saha 路 Mohammad Ghavamzadeh 路 Robert Busa-Fekete 路 Branislav Kveton 路 Viktor Bengs
Meeting Room 316 AB
Fri 28 Jul, noon PDT
Learning from human preferences or preference-based learning has been critical to major advancements in AI and machine learning. Since human beings are naturally more reliable at providing feedback on a relative scale compared to numerical values, collecting preference feedback is more budget-friendly and involves less bias. The broad objective of this workshop is twofold:1) Bring together different communities where preference-based learning has played a major role. This includes dueling bandits, multi-agent games, econometrics, social choice theory, reinforcement learning, optimization, robotics and many more, for which we aim to create a suitable forum to exchange techniques, ideas, learn from each other and potentially create new and innovative research questions. 2) Connect theory to practice by identifying real-world systems which can benefit from incorporating preference feedback, such as marketing, revenue management, search engine optimization, recommender systems, healthcare, language modeling, interactive chatbots, text summarization, robotics, and so on.We will consider our workshop a success if it inspires researchers to embark on novel insights in the general area of preference-based learning: Bringing attention from different communities to foster dissemination, cross-fertilization and discussion at scale. Especially, building bridges between experimental researchers and theorists towards developing better models and practical algorithms, and encouraging participants to propose, sketch, and discuss new starting points, questions or applications.
Schedule
Fri 12:00 p.m. - 12:05 p.m.
|
Opening Remarks
(
Opening Remarks
)
>
SlidesLive Video |
馃敆 |
Fri 12:05 p.m. - 12:35 p.m.
|
MNL-Bandit: Sequential Learning Approach to Assortment Selection
(
Invited Talk
)
>
SlidesLive Video |
Vineet Goyal 馃敆 |
Fri 12:35 p.m. - 1:05 p.m.
|
Aligning Robots with Human Preferences
(
Invited Talk
)
>
SlidesLive Video |
Dorsa Sadigh 馃敆 |
Fri 1:05 p.m. - 1:50 p.m.
|
1st Poster Session
(
Poster Session
)
>
|
馃敆 |
Fri 1:50 p.m. - 2:20 p.m.
|
Learning from Pairwise Preferences: From Search Rankings to ChatBots
(
Invited Talk
)
>
SlidesLive Video |
Thorsten Joachims 馃敆 |
Fri 2:20 p.m. - 2:50 p.m.
|
Eliciting Human Judgments for Moral Artificial Intelligence
(
Invited Talk
)
>
SlidesLive Video |
Vincent Conitzer 馃敆 |
Fri 2:50 p.m. - 3:05 p.m.
|
Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity
(
Oral
)
>
link
SlidesLive Video |
Charlie Hou 路 Kiran Thekumparampil 路 Michael Shavlovsky 路 Giulia Fanti 路 Yesh Dattatreya 路 Sujay Sanghavi 馃敆 |
Fri 3:05 p.m. - 3:20 p.m.
|
Zeroth-Order Optimization Meets Human Feedback: Provable Learning via Ranking Oracles
(
Oral
)
>
link
SlidesLive Video |
Zhiwei Tang 路 Dmitry Rybin 路 Tsung-Hui Chang 馃敆 |
Fri 3:20 p.m. - 4:15 p.m.
|
2nd Poster Session
(
Poster session
)
>
|
馃敆 |
Fri 4:15 p.m. - 4:45 p.m.
|
Vignettes on Pairwise-Feedback Mechanisms for Learning with Uncertain Preferences
(
Invited Talk
)
>
SlidesLive Video |
Sanmi Koyejo 馃敆 |
Fri 4:45 p.m. - 5:15 p.m.
|
Efficient Optimization with Many Objectives
(
Invited Talk
)
>
SlidesLive Video |
Eytan Bakshy 馃敆 |
Fri 5:15 p.m. - 5:30 p.m.
|
Preference Proxies: Evaluating Large Language Models in capturing Human Preferences in Human-AI Tasks
(
Oral
)
>
link
SlidesLive Video |
Mudit Verma 路 Siddhant Bhambri 路 Subbarao Kambhampati 馃敆 |
Fri 5:30 p.m. - 5:45 p.m.
|
Learning Optimal Advantage from Preferences and Mistaking it for Reward
(
Oral
)
>
link
SlidesLive Video |
William Knox 路 Stephane Hatgis-Kessell 路 Sigurdur Adalgeirsson 路 Serena Booth 路 Anca Dragan 路 Peter Stone 路 Scott Niekum 馃敆 |
Fri 5:45 p.m. - 6:30 p.m.
|
3rd Poster Session
(
Poster session
)
>
|
馃敆 |
Fri 6:30 p.m. - 7:00 p.m.
|
Dueling Bandits for Online Preference Learning
(
Invited Talk
)
>
SlidesLive Video |
Yisong Yue 馃敆 |
Fri 7:00 p.m. - 7:30 p.m.
|
Is RLHF More Difficult than Standard RL?
(
Invited Talk
)
>
SlidesLive Video |
Chi Jin 馃敆 |
Fri 7:30 p.m. - 7:45 p.m.
|
Principled Reinforcement Learning with Human Feedback from Pairwise or -wise Comparisons
(
Oral
)
>
link
SlidesLive Video |
Banghua Zhu 路 Michael Jordan 路 Jiantao Jiao 馃敆 |
Fri 7:45 p.m. - 8:00 p.m.
|
How to Query Human Feedback Efficiently in RL?
(
Oral
)
>
link
SlidesLive Video |
Wenhao Zhan 路 Masatoshi Uehara 路 Wen Sun 路 Jason Lee 馃敆 |
-
|
Multi-Objective Agency Requires Non-Markovian Rewards ( Poster ) > link | Silviu Pitis 馃敆 |
-
|
Failure Modes of Learning Reward Models for LLMs and other Sequence Models ( Poster ) > link | Silviu Pitis 馃敆 |
-
|
Video-Guided Skill Discovery ( Poster ) > link | Manan Tomar 路 Dibya Ghosh 路 Vivek Myers 路 Anca Dragan 路 Matthew Taylor 路 Philip Bachman 路 Sergey Levine 馃敆 |
-
|
Learning from Pairwise Comparisons Under Preference Reversals ( Poster ) > link | Abdul Bakey Mir 路 Arun Rajkumar 馃敆 |
-
|
Randomized Smoothing (almost) in Real Time? ( Poster ) > link | Emmanouil Seferis 馃敆 |
-
|
Exploiting Action Distances for Reward Learning from Human Preferences ( Poster ) > link | Mudit Verma 路 Siddhant Bhambri 路 Subbarao Kambhampati 馃敆 |
-
|
Reward Collapse in Aligning Large Language Models: A Prompt-Aware Approach to Preference Rankings ( Poster ) > link | Ziang Song 路 Tianle Cai 路 Jason Lee 路 Weijie Su 馃敆 |
-
|
Direct Preference Optimization: Your Language Model is Secretly a Reward Model ( Poster ) > link | Rafael Rafailov 路 Archit Sharma 路 Eric Mitchell 路 Stefano Ermon 路 Christopher Manning 路 Chelsea Finn 馃敆 |
-
|
Ranking with Abstention ( Poster ) > link | Anqi Mao 路 Mehryar Mohri 路 Yutao Zhong 馃敆 |
-
|
Learning Higher Order Skills that Efficiently Compose ( Poster ) > link | Anthony Liu 路 Dong Ki Kim 路 Sungryull Sohn 路 Honglak Lee 馃敆 |
-
|
DIP-RL: Demonstration-Inferred Preference Learning in Minecraft ( Poster ) > link | Ellen Novoseller 路 Vinicius G. Goecks 路 David Watkins 路 Josh Miller 路 Nicholas Waytowich 馃敆 |
-
|
Differentially Private Reward Estimation from Preference Based Feedback ( Poster ) > link | Sayak Ray Chowdhury 路 Xingyu Zhou 馃敆 |
-
|
Intention is what you need to estimate: Attention-driven prediction of goal pose in a human-centric telemanipulation of a robotic hand ( Poster ) > link | MUNEEB AHMED 路 Rajesh Kumar 路 Arzad Kherani 路 Brejesh Lall 馃敆 |
-
|
Representation Learning in Low-rank Slate-based Recommender Systems ( Poster ) > link | Yijia Dai 路 Wen Sun 馃敆 |
-
|
Borda Regret Minimization for Generalized Linear Dueling Bandits ( Poster ) > link | Yue Wu 路 Tao Jin 路 Qiwei Di 路 Hao Lou 路 Farzad Farnoud 路 Quanquan Gu 馃敆 |
-
|
Learning Populations of Preferences via Pairwise Comparison Queries ( Poster ) > link | Gokcan Tatli 路 Yi Chen 路 Ramya Vinayak 馃敆 |
-
|
A Ranking Game for Imitation Learning ( Poster ) > link | Harshit Sikchi 路 Akanksha Saran 路 Wonjoon Goo 路 Scott Niekum 馃敆 |
-
|
AdaptiveRec: Adaptively Construct Pairs for Contrastive Learning in Sequential Recommendation ( Poster ) > link | Jae Heyoung Jeon 路 Jung Hyun Ryu 路 Jewoong Cho 路 Myungjoo Kang 馃敆 |
-
|
Perceptual adjustment queries: An inverted measurement paradigm for low-rank metric learning ( Poster ) > link | Austin Xu 路 Andrew McRae 路 Jingyan Wang 路 Mark Davenport 路 Ashwin Pananjady 馃敆 |
-
|
Rating-based Reinforcement Learning ( Poster ) > link | Devin White 路 Mingkang Wu 路 Ellen Novoseller 路 Vernon Lawhern 路 Nicholas Waytowich 路 Yongcan Cao 馃敆 |
-
|
HIP-RL: Hallucinated Inputs for Preference-based Reinforcement Learning in Continuous Domains ( Poster ) > link | Chen Bo Calvin Zhang 路 Giorgia Ramponi 馃敆 |
-
|
Fairness in Preference-based Reinforcement Learning ( Poster ) > link | Umer Siddique 路 Abhinav Sinha 路 Yongcan Cao 馃敆 |
-
|
Optimal Scalarizations for Sublinear Hypervolume Regret ( Poster ) > link | Richard Zhang 馃敆 |
-
|
Fisher-Weighted Merge of Contrastive Learning Models in Sequential Recommendation ( Poster ) > link | Jung Hyun Ryu 路 Jae Heyoung Jeon 路 Jewoong Cho 路 Myungjoo Kang 馃敆 |
-
|
Thomas: Learning to Explore Human Preference via Probabilistic Reward Model ( Poster ) > link | Sang Truong 路 Duc Nguyen 路 Tho Quan 路 Sanmi Koyejo 馃敆 |
-
|
Two-Sided Bandit Learning in Fully-Decentralized Matching Markets ( Poster ) > link | Tejas Pagare 路 Avishek Ghosh 馃敆 |
-
|
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning ( Poster ) > link | Mitsuhiko Nakamoto 路 Yuexiang Zhai 路 Anikait Singh 路 Max Sobol Mark 路 Yi Ma 路 Chelsea Finn 路 Aviral Kumar 路 Sergey Levine 馃敆 |
-
|
Preferential Multi-Attribute Bayesian Optimization with Application to Exoskeleton Personalization ( Poster ) > link | Raul Astudillo 路 Amy Li 路 Maegan Tucker 路 Chu Xin Cheng 路 Aaron Ames 路 Yisong Yue 馃敆 |
-
|
Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games ( Poster ) > link | Yang Cai 路 Haipeng Luo 路 Chen-Yu Wei 路 Weiqiang Zheng 馃敆 |
-
|
Predict-then-Optimize v/s Probabilistic Approximations: Tackling Uncertainties and Error Propagation ( Poster ) > link | Priya Shanmugasundaram 路 Saurabh Jha 路 Kumar Muthuraman 馃敆 |
-
|
Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight ( Poster ) > link | Jiacheng Guo 路 Minshuo Chen 路 Huan Wang 路 Caiming Xiong 路 Mengdi Wang 路 Yu Bai 馃敆 |
-
|
Rethinking Incentives in Recommender Systems: Are Monotone Rewards Always Beneficial? ( Poster ) > link | Fan Yao 路 Chuanhao Li 路 Karthik Abinav Sankararaman 路 Yiming Liao 路 Yan Zhu 路 Qifan Wang 路 Hongning Wang 路 Haifeng Xu 馃敆 |
-
|
Learning Formal Specifications from Membership and Preference Queries ( Poster ) > link | Ameesh Shah 路 Marcell Vazquez-Chanlatte 路 Sebastian Junges 路 Sanjit Seshia 馃敆 |
-
|
Kernelized Offline Contextual Dueling Bandits ( Poster ) > link | Viraj Mehta 路 Ojash Neopane 路 Vikramjeet Das 路 Sen Lin 路 Jeff Schneider 路 Willie Neiswanger 馃敆 |
-
|
Preference Elicitation for Music Recommendations ( Poster ) > link | Ofer Meshi 路 Jon Feldman 路 Li Yang 路 Ben Scheetz 路 Yanli Cai 路 Mohammad Hossein Bateni 路 Corbyn Salisbury 路 Vikram Aggarwal 路 Craig Boutilier 馃敆 |
-
|
SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits ( Poster ) > link | Subhojyoti Mukherjee 路 Qiaomin Xie 路 Josiah Hanna 路 Robert Nowak 馃敆 |
-
|
Augmenting Bayesian Optimization with Preference-based Expert Feedback ( Poster ) > link | Daolang Huang 路 Louis Filstroff 路 Petrus Mikkola 路 Runkai Zheng 路 Milica Todorovic 路 Samuel Kaski 馃敆 |
-
|
A Head Start Matters: Dynamic-Calibrated Representation Alignment and Uniformity for Recommendations ( Poster ) > link | Zhongyu Ouyang 路 Shifu Hou 路 Chunhui Zhang 路 Chuxu Zhang 路 Yanfang Ye 馃敆 |
-
|
Robustness of Inverse Reinforcement Learning ( Poster ) > link | Ezgi Korkmaz 馃敆 |
-
|
Training Diffusion Models with Reinforcement Learning ( Poster ) > link | Kevin Black 路 Michael Janner 路 Yilun Du 路 Ilya Kostrikov 路 Sergey Levine 馃敆 |
-
|
Optimistic Thompson Sampling for No-Regret Learning in Unknown Games ( Poster ) > link | Yingru Li 路 Liangqi LIU 路 Wenqiang Pu 路 Zhi-Quan Luo 馃敆 |
-
|
Extracting Reward Functions from Diffusion Models ( Poster ) > link | Felipe Nuti 路 Tim Franzmeyer 路 Joao Henriques 馃敆 |
-
|
Optimizing Chatbot Fallback Intent Selections with Reinforcement Learning ( Poster ) > link | Jeremy Curuksu 馃敆 |
-
|
Query-Policy Misalignment in Preference-Based Reinforcement Learning ( Poster ) > link | Xiao Hu 路 Jianxiong Li 路 Xianyuan Zhan 路 Qing-Shan Jia 路 Ya-Qin Zhang 馃敆 |
-
|
Distinguishing Feature Model for Learning From Pairwise Comparisons ( Poster ) > link | Elisha Parhi 路 Arun Rajkumar 馃敆 |
-
|
Specifying Behavior Preference with Tiered Reward Functions ( Poster ) > link | Zhiyuan Zhou 路 Henry Sowerby 路 Michael L. Littman 馃敆 |
-
|
Who to imitate: Imitating desired behavior from diverse multi-agent datasets ( Poster ) > link | Tim Franzmeyer 路 Jakob Foerster 路 Edith Elkind 路 Phil Torr 路 Joao Henriques 馃敆 |
-
|
Competing Bandits in Non-Stationary Matching Markets ( Poster ) > link | Avishek Ghosh 路 Abishek Sankararaman 路 Kannan Ramchandran 路 Tara Javidi 路 Arya Mazumdar 馃敆 |
-
|
Strategic Apple Tasting ( Poster ) > link | Keegan Harris 路 Chara Podimata 路 Steven Wu 馃敆 |
-
|
Strategyproof Decision-Making in Panel Data Settings and Beyond ( Poster ) > link | Keegan Harris 路 Anish Agarwal 路 Chara Podimata 路 Steven Wu 馃敆 |
-
|
Provable Offline Reinforcement Learning with Human Feedback ( Poster ) > link | Wenhao Zhan 路 Masatoshi Uehara 路 Nathan Kallus 路 Jason Lee 路 Wen Sun 馃敆 |
-
|
Contextual Bandits and Imitation Learning with Preference-Based Active Queries ( Poster ) > link | Ayush Sekhari 路 Karthik Sridharan 路 Wen Sun 路 Runzhe Wu 馃敆 |
-
|
Inverse Game Theory for Stackelberg Games: the Blessing of Bounded Rationality ( Poster ) > link | Jibang Wu 路 Weiran Shen 路 Fei Fang 路 Haifeng Xu 馃敆 |
-
|
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards ( Poster ) > link | Alexandre Rame 路 Guillaume Couairon 路 Corentin Dancette 路 Jean-Baptiste Gaya 路 Mustafa Shukor 路 Laure Soulier 路 Matthieu Cord 馃敆 |
-
|
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism ( Poster ) > link | Zihao Li 馃敆 |