Timezone: »
The content that a recommender system (RS) shows to users influences them. Therefore, when choosing a recommender to deploy, one is implicitly also choosing to induce specific internal states in users. Even more, systems trained via long-horizon optimization will have direct incentives to manipulate users, e.g. shift their preferences so they are easier to satisfy. We focus on induced preference shifts in users. We argue that – before deployment – system designers should: estimate the shifts a recommender would induce; evaluate whether such shifts would be undesirable; and perhaps even actively optimize to avoid problematic shifts. These steps involve two challenging ingredients: estimation requires anticipating how hypothetical policies would influence user preferences if deployed – we do this by using historical user interaction data to train a predictive user model which implicitly contains their preference dynamics;evaluation and optimization additionally require metrics to assess whether such influences are manipulative or otherwise unwanted – we use the notion of "safe shifts", that define a trust region within which behavior is safe: for instance, the natural way in which users would shift without interference from the system could be deemed "safe". In simulated experiments, we show that our learned preference dynamics model is effective in estimating user preferences and how they would respond to new recommenders. Additionally, we show that recommenders that optimize for staying in the trust region can avoid manipulative behaviors while still generating engagement.
Author Information
Micah Carroll (UC Berkeley)
Anca Dragan (University of California, Berkeley)
Stuart Russell (UC Berkeley)
Dylan Hadfield-Menell (Massachusetts Institute of Technology)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Estimating and Penalizing Induced Preference Shifts in Recommender Systems »
Tue. Jul 19th 05:50 -- 05:55 PM Room Ballroom 3 & 4
More from the Same Authors
-
2022 : A Study of Causal Confusion in Preference-Based Reward Learning »
Jeremy Tien · Zhiyang He · Zackory Erickson · Anca Dragan · Daniel S Brown -
2022 : Counterfactual Metrics for Auditing Black-Box Recommender Systems for Ethical Concerns »
Nil-Jana Akpinar · Liu Leqi · Dylan Hadfield-Menell · Zachary Lipton -
2023 Poster: Contextual Reliability: When Different Features Matter in Different Contexts »
Gaurav Ghosal · Amrith Setlur · Daniel S Brown · Anca Dragan · Aditi Raghunathan -
2023 Poster: Who Needs to Know? Minimal Knowledge for Optimal Coordination »
Niklas Lauffer · Ameesh Shah · Micah Carroll · Michael Dennis · Stuart Russell -
2023 Poster: Automatically Auditing Large Language Models via Discrete Optimization »
Erik Jones · Anca Dragan · Aditi Raghunathan · Jacob Steinhardt -
2023 Workshop: Interactive Learning with Implicit Human Feedback »
Andi Peng · Akanksha Saran · Andreea Bobu · Tengyang Xie · Pierre-Yves Oudeyer · Anca Dragan · John Langford -
2022 : Open Coding for Machine Learning Data »
Magdalena Price · Dylan Hadfield-Menell -
2022 Poster: For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria »
Scott Emmons · Caspar Oesterheld · Andrew Critch · Vincent Conitzer · Stuart Russell -
2022 Spotlight: For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria »
Scott Emmons · Caspar Oesterheld · Andrew Critch · Vincent Conitzer · Stuart Russell -
2022 : Learning to interact: PARTIAL OBSERVABILITY + GAME Theory of mind on steroids »
Anca Dragan -
2022 : Learning to interact: PARTIAL OBSERVABILITY The actions you take as part of the task are the queries! »
Anca Dragan -
2022 : Q&A »
Dorsa Sadigh · Anca Dragan -
2022 Tutorial: Learning for Interactive Agents »
Dorsa Sadigh · Anca Dragan -
2022 : Learning objectives and preferences: WHAT DATA? From diverse types of human data »
Anca Dragan -
2021 Poster: Policy Gradient Bayesian Robust Optimization for Imitation Learning »
Zaynah Javed · Daniel Brown · Satvik Sharma · Jerry Zhu · Ashwin Balakrishna · Marek Petrik · Anca Dragan · Ken Goldberg -
2021 Spotlight: Policy Gradient Bayesian Robust Optimization for Imitation Learning »
Zaynah Javed · Daniel Brown · Satvik Sharma · Jerry Zhu · Ashwin Balakrishna · Marek Petrik · Anca Dragan · Ken Goldberg -
2021 Poster: Value Alignment Verification »
Daniel Brown · Jordan Schneider · Anca Dragan · Scott Niekum -
2021 Spotlight: Value Alignment Verification »
Daniel Brown · Jordan Schneider · Anca Dragan · Scott Niekum -
2020 : Invited Talk 7: Prof. Anca Dragan from UC Berkeley »
Anca Dragan -
2020 : "Active Learning through Physically-embodied, Synthesized-from-“scratch” Queries" »
Anca Dragan -
2020 Poster: Learning Human Objectives by Evaluating Hypothetical Behavior »
Siddharth Reddy · Anca Dragan · Sergey Levine · Shane Legg · Jan Leike -
2019 Poster: On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference »
Rohin Shah · Noah Gundotra · Pieter Abbeel · Anca Dragan -
2019 Oral: On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference »
Rohin Shah · Noah Gundotra · Pieter Abbeel · Anca Dragan -
2019 Poster: Cognitive model priors for predicting human decisions »
Joshua C Peterson · David D Bourgin · Daniel Reichman · Thomas Griffiths · Stuart Russell -
2019 Oral: Cognitive model priors for predicting human decisions »
Joshua C Peterson · David D Bourgin · Daniel Reichman · Thomas Griffiths · Stuart Russell -
2019 Poster: Learning a Prior over Intent via Meta-Inverse Reinforcement Learning »
Kelvin Xu · Ellis Ratner · Anca Dragan · Sergey Levine · Chelsea Finn -
2019 Oral: Learning a Prior over Intent via Meta-Inverse Reinforcement Learning »
Kelvin Xu · Ellis Ratner · Anca Dragan · Sergey Levine · Chelsea Finn -
2018 Poster: An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning »
Dhruv Malik · Malayandi Palaniappan · Jaime Fisac · Dylan Hadfield-Menell · Stuart Russell · Anca Dragan -
2018 Oral: An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning »
Dhruv Malik · Malayandi Palaniappan · Jaime Fisac · Dylan Hadfield-Menell · Stuart Russell · Anca Dragan