Learning from human preferences or preference-based learning has been critical to major advancements in AI and machine learning. Since human beings are naturally more reliable at providing feedback on a relative scale compared to numerical values, collecting preference feedback is more budget-friendly and involves less bias. The broad objective of this workshop is twofold:1) Bring together different communities where preference-based learning has played a major role. This includes dueling bandits, multi-agent games, econometrics, social choice theory, reinforcement learning, optimization, robotics and many more, for which we aim to create a suitable forum to exchange techniques, ideas, learn from each other and potentially create new and innovative research questions. 2) Connect theory to practice by identifying real-world systems which can benefit from incorporating preference feedback, such as marketing, revenue management, search engine optimization, recommender systems, healthcare, language modeling, interactive chatbots, text summarization, robotics, and so on.We will consider our workshop a success if it inspires researchers to embark on novel insights in the general area of preference-based learning: Bringing attention from different communities to foster dissemination, cross-fertilization and discussion at scale. Especially, building bridges between experimental researchers and theorists towards developing better models and practical algorithms, and encouraging participants to propose, sketch, and discuss new starting points, questions or applications.
Opening Remarks | |
MNL-Bandit: Sequential Learning Approach to Assortment Selection (Invited Talk) | |
Aligning Robots with Human Preferences (Invited Talk) | |
1st Poster Session (Poster Session) | |
Learning from Pairwise Preferences: From Search Rankings to ChatBots (Invited Talk) | |
Eliciting Human Judgments for Moral Artificial Intelligence (Invited Talk) | |
Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity (Oral) | |
Zeroth-Order Optimization Meets Human Feedback: Provable Learning via Ranking Oracles (Oral) | |
2nd Poster Session (Poster session) | |
Vignettes on Pairwise-Feedback Mechanisms for Learning with Uncertain Preferences (Invited Talk) | |
Efficient Optimization with Many Objectives (Invited Talk) | |
Preference Proxies: Evaluating Large Language Models in capturing Human Preferences in Human-AI Tasks (Oral) | |
Learning Optimal Advantage from Preferences and Mistaking it for Reward (Oral) | |
3rd Poster Session (Poster session) | |
Dueling Bandits for Online Preference Learning (Invited Talk) | |
Is RLHF More Difficult than Standard RL? (Invited Talk) | |
Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons (Oral) | |
How to Query Human Feedback Efficiently in RL? (Oral) | |
AdaptiveRec: Adaptively Construct Pairs for Contrastive Learning in Sequential Recommendation (Poster) | |
Perceptual adjustment queries: An inverted measurement paradigm for low-rank metric learning (Poster) | |
Rating-based Reinforcement Learning (Poster) | |
HIP-RL: Hallucinated Inputs for Preference-based Reinforcement Learning in Continuous Domains (Poster) | |
Fairness in Preference-based Reinforcement Learning (Poster) | |
Optimal Scalarizations for Sublinear Hypervolume Regret (Poster) | |
Fisher-Weighted Merge of Contrastive Learning Models in Sequential Recommendation (Poster) | |
Thomas: Learning to Explore Human Preference via Probabilistic Reward Model (Poster) | |
Two-Sided Bandit Learning in Fully-Decentralized Matching Markets (Poster) | |
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning (Poster) | |
Preferential Multi-Attribute Bayesian Optimization with Application to Exoskeleton Personalization (Poster) | |
Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games (Poster) | |
Predict-then-Optimize v/s Probabilistic Approximations: Tackling Uncertainties and Error Propagation (Poster) | |
Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight (Poster) | |
Rethinking Incentives in Recommender Systems: Are Monotone Rewards Always Beneficial? (Poster) | |
Learning Formal Specifications from Membership and Preference Queries (Poster) | |
Kernelized Offline Contextual Dueling Bandits (Poster) | |
Preference Elicitation for Music Recommendations (Poster) | |
SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits (Poster) | |
Augmenting Bayesian Optimization with Preference-based Expert Feedback (Poster) | |
A Head Start Matters: Dynamic-Calibrated Representation Alignment and Uniformity for Recommendations (Poster) | |
Robustness of Inverse Reinforcement Learning (Poster) | |
Training Diffusion Models with Reinforcement Learning (Poster) | |
Extracting Reward Functions from Diffusion Models (Poster) | |
Optimizing Chatbot Fallback Intent Selections with Reinforcement Learning (Poster) | |
Query-Policy Misalignment in Preference-Based Reinforcement Learning (Poster) | |
Distinguishing Feature Model for Learning From Pairwise Comparisons (Poster) | |
Specifying Behavior Preference with Tiered Reward Functions (Poster) | |
Who to imitate: Imitating desired behavior from diverse multi-agent datasets (Poster) | |
Competing Bandits in Non-Stationary Matching Markets (Poster) | |
Strategic Apple Tasting (Poster) | |
Strategyproof Decision-Making in Panel Data Settings and Beyond (Poster) | |
Provable Offline Reinforcement Learning with Human Feedback (Poster) | |
Contextual Bandits and Imitation Learning with Preference-Based Active Queries (Poster) | |
Inverse Game Theory for Stackelberg Games: the Blessing of Bounded Rationality (Poster) | |
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards (Poster) | |
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism (Poster) | |
Optimistic Thompson Sampling for No-Regret Learning in Unknown Games (Poster) | |
Multi-Objective Agency Requires Non-Markovian Rewards (Poster) | |
Failure Modes of Learning Reward Models for LLMs and other Sequence Models (Poster) | |
Video-Guided Skill Discovery (Poster) | |
Learning from Pairwise Comparisons Under Preference Reversals (Poster) | |
Randomized Smoothing (almost) in Real Time? (Poster) | |
Exploiting Action Distances for Reward Learning from Human Preferences (Poster) | |
Reward Collapse in Aligning Large Language Models: A Prompt-Aware Approach to Preference Rankings (Poster) | |
Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Poster) | |
Ranking with Abstention (Poster) | |
Learning Higher Order Skills that Efficiently Compose (Poster) | |
DIP-RL: Demonstration-Inferred Preference Learning in Minecraft (Poster) | |
Differentially Private Reward Estimation from Preference Based Feedback (Poster) | |
Intention is what you need to estimate: Attention-driven prediction of goal pose in a human-centric telemanipulation of a robotic hand (Poster) | |
Representation Learning in Low-rank Slate-based Recommender Systems (Poster) | |
Borda Regret Minimization for Generalized Linear Dueling Bandits (Poster) | |
Learning Populations of Preferences via Pairwise Comparison Queries (Poster) | |
A Ranking Game for Imitation Learning (Poster) | |