Learning from human preferences or preference-based learning has been critical to major advancements in AI and machine learning. Since human beings are naturally more reliable at providing feedback on a relative scale compared to numerical values, collecting preference feedback is more budget-friendly and involves less bias. The broad objective of this workshop is twofold:1) Bring together different communities where preference-based learning has played a major role. This includes dueling bandits, multi-agent games, econometrics, social choice theory, reinforcement learning, optimization, robotics and many more, for which we aim to create a suitable forum to exchange techniques, ideas, learn from each other and potentially create new and innovative research questions. 2) Connect theory to practice by identifying real-world systems which can benefit from incorporating preference feedback, such as marketing, revenue management, search engine optimization, recommender systems, healthcare, language modeling, interactive chatbots, text summarization, robotics, and so on.We will consider our workshop a success if it inspires researchers to embark on novel insights in the general area of preference-based learning: Bringing attention from different communities to foster dissemination, cross-fertilization and discussion at scale. Especially, building bridges between experimental researchers and theorists towards developing better models and practical algorithms, and encouraging participants to propose, sketch, and discuss new starting points, questions or applications.
| Opening Remarks | |
| MNL-Bandit: Sequential Learning Approach to Assortment Selection (Invited Talk) | |
| Aligning Robots with Human Preferences (Invited Talk) | |
| 1st Poster Session (Poster Session) | |
| Learning from Pairwise Preferences: From Search Rankings to ChatBots (Invited Talk) | |
| Eliciting Human Judgments for Moral Artificial Intelligence (Invited Talk) | |
| Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity (Oral) | |
| Zeroth-Order Optimization Meets Human Feedback: Provable Learning via Ranking Oracles (Oral) | |
| 2nd Poster Session (Poster session) | |
| Vignettes on Pairwise-Feedback Mechanisms for Learning with Uncertain Preferences (Invited Talk) | |
| Efficient Optimization with Many Objectives (Invited Talk) | |
| Preference Proxies: Evaluating Large Language Models in capturing Human Preferences in Human-AI Tasks (Oral) | |
| Learning Optimal Advantage from Preferences and Mistaking it for Reward (Oral) | |
| 3rd Poster Session (Poster session) | |
| Dueling Bandits for Online Preference Learning (Invited Talk) | |
| Is RLHF More Difficult than Standard RL? (Invited Talk) | |
| Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons (Oral) | |
| How to Query Human Feedback Efficiently in RL? (Oral) | |
| Differentially Private Reward Estimation from Preference Based Feedback (Poster) | |
| Intention is what you need to estimate: Attention-driven prediction of goal pose in a human-centric telemanipulation of a robotic hand (Poster) | |
| Representation Learning in Low-rank Slate-based Recommender Systems (Poster) | |
| Borda Regret Minimization for Generalized Linear Dueling Bandits (Poster) | |
| Fairness in Preference-based Reinforcement Learning (Poster) | |
| A Ranking Game for Imitation Learning (Poster) | |
| AdaptiveRec: Adaptively Construct Pairs for Contrastive Learning in Sequential Recommendation (Poster) | |
| Perceptual adjustment queries: An inverted measurement paradigm for low-rank metric learning (Poster) | |
| Rating-based Reinforcement Learning (Poster) | |
| HIP-RL: Hallucinated Inputs for Preference-based Reinforcement Learning in Continuous Domains (Poster) | |
| Fisher-Weighted Merge of Contrastive Learning Models in Sequential Recommendation (Poster) | |
| Thomas: Learning to Explore Human Preference via Probabilistic Reward Model (Poster) | |
| Two-Sided Bandit Learning in Fully-Decentralized Matching Markets (Poster) | |
| Strategic Apple Tasting (Poster) | |
| Preferential Multi-Attribute Bayesian Optimization with Application to Exoskeleton Personalization (Poster) | |
| Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games (Poster) | |
| Predict-then-Optimize v/s Probabilistic Approximations: Tackling Uncertainties and Error Propagation (Poster) | |
| Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight (Poster) | |
| Rethinking Incentives in Recommender Systems: Are Monotone Rewards Always Beneficial? (Poster) | |
| Learning Formal Specifications from Membership and Preference Queries (Poster) | |
| Kernelized Offline Contextual Dueling Bandits (Poster) | |
| Preference Elicitation for Music Recommendations (Poster) | |
| SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits (Poster) | |
| Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards (Poster) | |
| Augmenting Bayesian Optimization with Preference-based Expert Feedback (Poster) | |
| A Head Start Matters: Dynamic-Calibrated Representation Alignment and Uniformity for Recommendations (Poster) | |
| Training Diffusion Models with Reinforcement Learning (Poster) | |
| Optimistic Thompson Sampling for No-Regret Learning in Unknown Games (Poster) | |
| Extracting Reward Functions from Diffusion Models (Poster) | |
| Query-Policy Misalignment in Preference-Based Reinforcement Learning (Poster) | |
| Distinguishing Feature Model for Learning From Pairwise Comparisons (Poster) | |
| Specifying Behavior Preference with Tiered Reward Functions (Poster) | |
| Who to imitate: Imitating desired behavior from diverse multi-agent datasets (Poster) | |
| Strategyproof Decision-Making in Panel Data Settings and Beyond (Poster) | |
| Provable Offline Reinforcement Learning with Human Feedback (Poster) | |
| Contextual Bandits and Imitation Learning with Preference-Based Active Queries (Poster) | |
| Optimal Scalarizations for Sublinear Hypervolume Regret (Poster) | |
| Inverse Game Theory for Stackelberg Games: the Blessing of Bounded Rationality (Poster) | |
| Multi-Objective Agency Requires Non-Markovian Rewards (Poster) | |
| Failure Modes of Learning Reward Models for LLMs and other Sequence Models (Poster) | |
| Randomized Smoothing (almost) in Real Time? (Poster) | |
| Robustness of Inverse Reinforcement Learning (Poster) | |
| Optimizing Chatbot Fallback Intent Selections with Reinforcement Learning (Poster) | |
| Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism (Poster) | |
| Exploiting Action Distances for Reward Learning from Human Preferences (Poster) | |
| Learning Populations of Preferences via Pairwise Comparison Queries (Poster) | |
| Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning (Poster) | |
| Competing Bandits in Non-Stationary Matching Markets (Poster) | |
| Learning from Pairwise Comparisons Under Preference Reversals (Poster) | |
| Video-Guided Skill Discovery (Poster) | |
| Reward Collapse in Aligning Large Language Models: A Prompt-Aware Approach to Preference Rankings (Poster) | |
| Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Poster) | |
| Ranking with Abstention (Poster) | |
| Learning Higher Order Skills that Efficiently Compose (Poster) | |
| DIP-RL: Demonstration-Inferred Preference Learning in Minecraft (Poster) | |