Timezone: »
We introduce the ``inverse bandit'' problem of estimating the rewards of a multi-armed bandit instance from observing the learning process of a low-regret demonstrator. Existing approaches to the related problem of inverse reinforcement learning assume the execution of an optimal policy, and thereby suffer from an identifiability issue. In contrast, our paradigm leverages the demonstrator's behavior en route to optimality, and in particular, the exploration phase, to obtain consistent reward estimates. We develop simple and efficient reward estimation procedures for demonstrations within a class of upper-confidence-based algorithms, showing that reward estimation gets progressively easier as the regret of the algorithm increases. We match these upper bounds with information-theoretic lower bounds that apply to any demonstrator algorithm, thereby characterizing the optimal tradeoff between exploration and reward estimation. Extensive simulations on both synthetic and semi-synthetic data corroborate our theoretical results.
Author Information
Wenshuo Guo (UC Berkeley)
Kumar Agrawal (UC Berkeley)
Aditya Grover (Stanford University)
Vidya Muthukumar (UC Berkeley)
Ashwin Pananjady (UC Berkeley)
More from the Same Authors
-
2021 : Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits »
Wenshuo Guo -
2022 Poster: No-Regret Learning in Partially-Informed Auctions »
Wenshuo Guo · Michael Jordan · Ellen Vitercik -
2022 Spotlight: No-Regret Learning in Partially-Informed Auctions »
Wenshuo Guo · Michael Jordan · Ellen Vitercik -
2021 Affinity Workshop: Women in Machine Learning (WiML) Un-Workshop »
Wenshuo Guo · Beliz Gokkaya · Arushi G K Majha · Vaidheeswaran Archana · Berivan Isik · Olivia Choudhury · Liyue Shen · Hadia Samil · Tatjana Chavdarova -
2021 : Introduction & Opening Remarks »
Wenshuo Guo -
2020 Poster: Neural Kernels Without Tangents »
Vaishaal Shankar · Alex Fang · Wenshuo Guo · Sara Fridovich-Keil · Jonathan Ragan-Kelley · Ludwig Schmidt · Benjamin Recht -
2020 Poster: Fair Generative Modeling via Weak Supervision »
Kristy Choi · Aditya Grover · Trisha Singh · Rui Shu · Stefano Ermon -
2019 Poster: Graphite: Iterative Generative Modeling of Graphs »
Aditya Grover · Aaron Zweig · Stefano Ermon -
2019 Oral: Graphite: Iterative Generative Modeling of Graphs »
Aditya Grover · Aaron Zweig · Stefano Ermon -
2019 Poster: Neural Joint Source-Channel Coding »
Kristy Choi · Kedar Tatwawadi · Aditya Grover · Tsachy Weissman · Stefano Ermon -
2019 Oral: Neural Joint Source-Channel Coding »
Kristy Choi · Kedar Tatwawadi · Aditya Grover · Tsachy Weissman · Stefano Ermon -
2018 Poster: Modeling Sparse Deviations for Compressed Sensing using Generative Models »
Manik Dhar · Aditya Grover · Stefano Ermon -
2018 Oral: Modeling Sparse Deviations for Compressed Sensing using Generative Models »
Manik Dhar · Aditya Grover · Stefano Ermon -
2018 Poster: Learning Policy Representations in Multiagent Systems »
Aditya Grover · Maruan Al-Shedivat · Jayesh K. Gupta · Yura Burda · Harrison Edwards -
2018 Oral: Learning Policy Representations in Multiagent Systems »
Aditya Grover · Maruan Al-Shedivat · Jayesh K. Gupta · Yura Burda · Harrison Edwards