Timezone: »

Real-world Sequential Decision Making: Reinforcement Learning and Beyond
Hoang Le · Yisong Yue · Adith Swaminathan · Byron Boots · Ching-An Cheng

Fri Jun 14 02:00 PM -- 06:00 PM (PDT) @ Seaside Ballroom

Workshop website: https://realworld-sdm.github.io/

This workshop aims to bring together researchers from industry and academia in order to describe recent advances and discuss future research directions pertaining to real-world sequential decision making, broadly construed. We aim to highlight new and emerging research opportunities for the machine learning community that arise from the evolving needs for making decision making theoretically and practically relevant for realistic applications.

Research interest in reinforcement and imitation learning has surged significantly over the past several years, with the empirical successes of self-playing in games and availability of increasingly realistic simulation environments. We believe the time is ripe for the research community to push beyond simulated domains and start exploring research directions that directly address the real-world need for optimal decision making. We are particularly interested in understanding the current theoretical and practical challenges that prevent broader adoption of current policy learning and evaluation algorithms in high-impact applications, across a broad range of domains.

This workshop welcomes both theory and application contributions.

Fri 2:00 p.m. - 2:30 p.m.
Emma Brunskill (Stanford) - Minimizing & Understanding the Data Needed to Learn to Make Good Sequences of Decisions (Invited Talk)  link » Emma Brunskill
Fri 2:30 p.m. - 3:00 p.m.
 link »

Contextual bandits are a learning protocol that encompasses applications such as news recommendation, advertising, and mobile health, where an algorithm repeatedly observes some information about a user, makes a decision what content to present, and accrues a reward if the presented content is successful. In this talk, I will focus on the fundamental task of evaluating a new policy given historic data. I will describe the asymptotically optimal approach of doubly robust (DR) estimation, the reasons for its shortcomings in finite samples, and how to overcome these shortcomings by directly optimizing the bound on finite-sample error. Optimization yields a new family of estimators that, similarly to DR, leverage any direct model of rewards, but shrink importance weights to obtain a better bias-variance tradeoff than DR. Error bounds can also be used to select the best among multiple reward predictors. Somewhat surprisingly, reward predictors that work best with standard DR are not the same as those that work best with our modified DR. Our new estimator and model selection procedure perform extremely well across a wide variety of settings, so we expect they will enjoy broad practical use.

Based on joint work with Yi Su, Maria Dimakopoulou, and Akshay Krishnamurthy.

Miro Dudik
Fri 3:00 p.m. - 4:00 p.m.
Poster Session Part 1 and Coffee Break (Poster Session)
Fri 4:00 p.m. - 4:30 p.m.
Suchi Saria (John Hopkins) - Link between Causal Inference and Reinforcement Learning and Applications to Learning from Offline/Observational Data (Invited Talk) [ Video  link » Suchi Saria
Fri 4:30 p.m. - 5:00 p.m.
[ Video  link »

Ride-hailing platforms like Uber, Lyft, Didi Chuxing, and Ola have achieved explosive growth, in part by improving the efficiency of matching between riders and drivers, and by calibrating the balance of supply and demand through dynamic pricing. We survey methods for dynamic pricing and matching in ride-hailing, and show that these are critical for providing an experience with low waiting time for both riders and drivers. We also discuss approaches used to predict key inputs into those algorithms: demand, supply, and travel time in the road network. Then we link the two levers together by studying a pool-matching mechanism called dynamic waiting that varies rider waiting and walking before dispatch, which is inspired by a recent carpooling product Express Pool from Uber. We show using data from Uber that by jointly optimizing dynamic pricing and dynamic waiting, price variability can be mitigated, while increasing capacity utilization, trip throughput, and welfare. We also highlight several key practical challenges and directions of future research from a practitioner's perspective.

Dawn Woodard
Fri 5:00 p.m. - 5:30 p.m.
Panel Discussion with Emma Brunskill, Miro Dudík, Suchi Saria, Dawn Woodard (Discussion Panel)
Fri 5:30 p.m. - 6:00 p.m.
Poster Session Part 2 (Poster Session)

Author Information

Hoang Le (Caltech)

Hoang M. Le is a PhD Candidate in the Computing and Mathematical Sciences Department at the California Institute of Technology. He received a M.S. in Cognitive Systems and Interactive Media from the Universitat Pompeu Fabra, Barcelona, Spain, and a B.A. in Mathematics from Bucknell University in Lewisburg, PA. He is a recipient of an Amazon AI Fellowship. Hoang’s research focuses on the theory and applications of sequential decision making, with a strong focus on imitation learning. He has broad familiarity with the latest advances in imitation learning techniques and applications. His own research in imitation learning blends principled new techniques with a diverse range of application domains. In addition to popular reinforcement learning domains such as maze navigation and Atari games, his prior work on imitation learning has been applied to learning human behavior in team sports and developing automatic camera broadcasting system.

Yisong Yue (Caltech)

Yisong Yue is an assistant professor in the Computing and Mathematical Sciences Department at the California Institute of Technology. He was previously a research scientist at Disney Research. Before that, he was a postdoctoral researcher in the Machine Learning Department and the iLab at Carnegie Mellon University. He received a Ph.D. from Cornell University and a B.S. from the University of Illinois at Urbana-Champaign. Yisong's research interests lie primarily in the theory and application of statistical machine learning. He is particularly interested in developing novel methods for interactive machine learning and structured prediction. In the past, his research has been applied to information retrieval, recommender systems, text classification, learning from rich user interfaces, analyzing implicit human feedback, data-driven animation, behavior analysis, sports analytics, policy learning in robotics, and adaptive planning & allocation problems.

Adith Swaminathan (Microsoft Research)
Byron Boots (Georgia Tech)
Ching-An Cheng (Georgia Tech)

More from the Same Authors