Timezone: »

STEERING : Stein Information Directed Exploration for Model-Based Reinforcement Learning
Souradip Chakraborty · Amrit Bedi · Alec Koppel · Mengdi Wang · Furong Huang · Dinesh Manocha

Wed Jul 26 05:00 PM -- 06:30 PM (PDT) @ Exhibit Hall 1 #714

Directed Exploration is a crucial challenge in reinforcement learning (RL), especially when rewards are sparse. Information-directed sampling (IDS), which optimizes the information ratio, seeks to do so by augmenting regret with information gain. However, estimating information gain is computationally intractable or relies on restrictive assumptions which prohibit its use in many practical instances. In this work, we posit an alternative exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal, which under suitable conditions, can be computed in closed form with the kernelized Stein discrepancy (KSD). Based on KSD, we develop a novel algorithm STEERING: STEin information dirEcted exploration for model-based Reinforcement LearnING. To enable its derivation, we develop fundamentally new variants of KSD for discrete conditional distributions. We further establish that STEERING archives sublinear Bayesian regret, improving upon prior learning rates of information-augmented MBRL, IDS included. Experimentally, we show that the proposed algorithm is computationally affordable and outperforms several prior approaches.

Author Information

Souradip Chakraborty (University of Maryland, College Park)
Amrit Bedi (University of Maryland, College Park)
Alec Koppel (JP Morgan Chase AI Research)

Bio: Alec Koppel is a Team Lead/VP at JP Morgan Chase AI Research since June 2022. Previously, he was a Research Scientist within Supply Chain Optimization Technologies (SCOT) at Amazon during 2021-2022, and prior to that, was a Research Scientist at the U.S. Army Research Laboratory in the Computational and Information Sciences Directorate from 2017-2021. He completed his Master's degree in Statistics and Doctorate in Electrical and Systems Engineering, both at the University of Pennsylvania (Penn) in August of 2017. Before coming to Penn, he completed his Master's degree in Systems Science and Mathematics and Bachelor's Degree in Mathematics, both at Washington University in St. Louis (WashU), Missouri. He is a recipient of the 2016 UPenn ESE Dept. Award for Exceptional Service, an awardee of the Science, Mathematics, and Research for Transformation (SMART) Scholarship, a co-author of Best Paper Finalist at the 2017 IEEE Asilomar Conference on Signals, Systems, and Computers, a finalist for the ARL Honorable Scientist Award 2019, an awardee of the 2020 ARL Director's Research Award Translational Research Challenge (DIRA-TRC), a 2020 Honorable Mention from the IEEE Robotics and Automation Letters, and mentor to the 2021 ARL Summer Symposium Best Project Awardee. His research interests are in optimization and machine learning. His academic work focuses on approximate Bayesian inference, reinforcement learning, and decentralized optimization. Applications include robotics and autonomy, sourcing and vendor selection, and financial markets.

Mengdi Wang (Princeton University)
Furong Huang (University of Maryland)
Furong Huang

Furong Huang is an Assistant Professor of the Department of Computer Science at University of Maryland. She works on statistical and trustworthy machine learning, reinforcement learning, graph neural networks, deep learning theory and federated learning with specialization in domain adaptation, algorithmic robustness and fairness. Furong is a recipient of the MIT Technology Review Innovators Under 35 Asia Pacific Award, the MLconf Industry Impact Research Award, the NSF CRII Award, the Adobe Faculty Research Award, three JP Morgan Faculty Research Awards and finalist of AI in Research - AI researcher of the year for Women in AI Awards North America. She received her Ph.D. in electrical engineering and computer science from UC Irvine in 2016, after which she spent one year as a postdoctoral researcher at Microsoft Research NYC.

Dinesh Manocha (University of Maryland at College Park)

More from the Same Authors