Timezone: »

Interactively Learning Preference Constraints in Linear Bandits
David Lindner · Sebastian Tschiatschek · Katja Hofmann · Andreas Krause

Wed Jul 20 03:30 PM -- 05:30 PM (PDT) @ Hall E #1222

We study sequential decision-making with known reward and unknown constraints, motivated by situations where the constraints represent expensive-to-evaluate human preferences, such as safe and comfortable driving behavior. We formalize the challenge of interactively learning about these constraints as a novel linear bandit problem which we call constrained linear best-arm identification. For this problem, we propose Adaptive Constraint Learning (ACOL). We provide an instance-dependent lower bound and show that ACOL's sample complexity matches the lower bound in the worst-case. In the average case, ACOL's sample complexity bound is still significantly tighter than bounds of simpler approaches. In synthetic experiments, ACOL performs on par with an oracle solution and outperforms a range of baselines. We consider learning constraints to represent human preferences in a driving simulation as an application. ACOL is significantly more sample efficient than alternatives for this application. Further, we find that learning preferences as constraints is more robust to changes in the driving scenario than encoding the preferences directly in the reward function.

Author Information

David Lindner (ETH Zürich)

My research has the goal to build robust intelligent systems that interact with the world. Currently, I am mainly interested in using reinforcement learning (RL) to achieve complex goals in the real world. RL has been successfully applied to situations with narrow and well-defined goals, such as in video games; but this is generally not given in the real world. To address this, I am interested in how RL systems can learn about complex goals from human feedback.

Sebastian Tschiatschek (University of Vienna)
Katja Hofmann (Microsoft)
Andreas Krause (ETH Zurich)

Andreas Krause is a Professor of Computer Science at ETH Zurich, where he leads the Learning & Adaptive Systems Group. He also serves as Academic Co-Director of the Swiss Data Science Center. Before that he was an Assistant Professor of Computer Science at Caltech. He received his Ph.D. in Computer Science from Carnegie Mellon University (2008) and his Diplom in Computer Science and Mathematics from the Technical University of Munich, Germany (2004). He is a Microsoft Research Faculty Fellow and a Kavli Frontiers Fellow of the US National Academy of Sciences. He received ERC Starting Investigator and ERC Consolidator grants, the Deutscher Mustererkennungspreis, an NSF CAREER award, the Okawa Foundation Research Grant recognizing top young researchers in telecommunications as well as the ETH Golden Owl teaching award. His research on machine learning and adaptive systems has received awards at several premier conferences and journals, including the ACM SIGKDD Test of Time award 2019 and the ICML Test of Time award 2020. Andreas Krause served as Program Co-Chair for ICML 2018, and is regularly serving as Area Chair or Senior Program Committee member for ICML, NeurIPS, AAAI and IJCAI, and as Action Editor for the Journal of Machine Learning Research.

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors