Timezone: »

Reinforcement Learning for Real Life
Yuxi Li · Alborz Geramifard · Lihong Li · Csaba Szepesvari · Tao Wang

Fri Jun 14 08:30 AM -- 12:30 PM (PDT) @ Seaside Ballroom
Event URL: https://sites.google.com/view/RL4RealLife »

Reinforcement learning (RL) is a general learning, predicting, and decision making paradigm. RL provides solution methods for sequential decision making problems as well as those can be transformed into sequential ones. RL connects deeply with optimization, statistics, game theory, causal inference, sequential experimentation, etc., overlaps largely with approximate dynamic programming and optimal control, and applies broadly in science, engineering and arts.

RL has been making steady progress in academia recently, e.g., Atari games, AlphaGo, visuomotor policies for robots. RL has also been applied to real world scenarios like recommender systems and neural architecture search. See a recent collection about RL applications at https://medium.com/@yuxili/rl-applications-73ef685c07eb. It is desirable to have RL systems that work in the real world with real benefits. However, there are many issues for RL though, e.g. generalization, sample efficiency, and exploration vs. exploitation dilemma. Consequently, RL is far from being widely deployed. Common, critical and pressing questions for the RL community are then: Will RL have wide deployments? What are the issues? How to solve them?

The goal of this workshop is to bring together researchers and practitioners from industry and academia interested in addressing practical and/or theoretical issues in applying RL to real life scenarios, review state of the arts, clarify impactful research problems, brainstorm open challenges, share first-hand lessons and experiences from real life deployments, summarize what has worked and what has not, collect tips for people from industry looking to apply RL and RL experts interested in applying their methods to real domains, identify potential opportunities, generate new ideas for future lines of research and development, and promote awareness and collaboration. This is not "yet another RL workshop": it is about how to successfully apply RL to real life applications. This is a less addressed issue in the RL/ML/AI community, and calls for immediate attention for sustainable prosperity of RL research and development.

Fri 8:30 a.m. - 8:50 a.m.
optional early-bird posters (Poster Session)
Fri 8:50 a.m. - 9:00 a.m.
opening remarks by organizers (Opening Remarks)
Fri 9:00 a.m. - 9:20 a.m.
[ Video  link »

In recent years, the real-time strategy game of StarCraft has emerged by consensus as an important challenge for AI research. It combines several major difficulties that are intractable for many existing algorithms: a large, structured action space; imperfect information about the opponent; a partially observed map; and cycles in the strategy space. Each of these challenges represents a major difficulty faced by real-world applications, for example those based on internet-scale action spaces, game theory in e.g. security, point-and-click interfaces, or robust AI in the presence of diverse and potentially exploitative user strategies. Here, we introduce AlphaStar: a novel combination of deep learning and reinforcement learning that mastered this challenging domain and defeated human professional players for the first time.

David Silver
Fri 9:20 a.m. - 9:40 a.m.
[ Video  link »

Abstract: Doing Real World Reinforcement Learning implies living with steep constraints on the sample complexity of solutions. Where is this viable? Where might it be viable in the near future? In the far future? How can we design a research program around identifying and building such solutions? In short, what are the missing elements we need to really make reinforcement learning more mundane and commonly applied than Supervised Learning? The potential is certainly there given the naturalness of RL compared to supervised learning, but the present is manifestly different. https://en.wikipedia.org/wiki/JohnLangford(computer_scientist)

John Langford
Fri 9:40 a.m. - 10:00 a.m.
[ Video  link »

Abstract: I'll present a brief overview of some recent work on reinforcement learning motivated by practical issues that arise in the application of RL to online, user-facing applications like recommender systems. These include stochastic action sets, long-term cumulative effects, and combinatorial action spaces. I'll provide some detail on the last of these, describing SlateQ, a novel decomposition technique that allows value-based RL (e.g., Q-learning) in slate-based recommender to scale to commercial production systems, and briefly describe both small-scale simulation and a large-scale experiment with YouTube.

Bio: Craig is Principal Scientist at Google, working on various aspects of decision making under uncertainty (e.g., reinforcement learning, Markov decision processes, user modeling, preference modeling and elicitation) and recommender systems. He received his Ph.D. from the University of Toronto in 1992, and has held positions at the University of British Columbia, University of Toronto, CombineNet, and co-founded Granata Decision Systems.

Craig was Editor-in-Chief of JAIR; Associate Editor with ACM TEAC, JAIR, JMLR, and JAAMAS; Program Chair for IJCAI-09 and UAI-2000. Boutilier is a Fellow of the Royal Society of Canada (RSC), the Association for Computing Machinery (ACM) and the Association for the Advancement of Artificial Intelligence (AAAI). He was recipient of the 2018 ACM/SIGAI Autonomous Agents Research Award and a Tier I Canada Research Chair; and has received (with great co-authors) a number of Best Paper awards including: the 2009 IJCAI-JAIR Best Paper Prize; the 2014 AIJ Prominent Paper Award; and the 2018 NeurIPS Best Paper Award.

Craig Boutilier
Fri 10:00 a.m. - 11:00 a.m.
posters (Poster Session)
Zhengxing Chen, Juan Jose Garau Luis, Ignacio Albert Smet, Aditya Modi, Sabina Tomkins, Riley Simmons-Edler, Hongzi Mao, Alex Irpan, Hao Lu, Rose Wang, Subhojyoti Mukherjee, Aniruddh Raghu, Syed Arbab Mohd Shihab, Byung Hoon Ahn, Rasool Fakoor, Pratik Chaudhari, Elena Smirnova, Min-hwan Oh, Xiaocheng Tang, Tony Qin, Qingyang Li, Marc Brittain, Ian Fox, Supratik Paul, Xiaofeng Gao, Yinlam Chow, Gabriel Dulac-Arnold, Ofir Nachum, Nikos Karampatziakis, Bharathan Balaji, Supratik Paul, Ali Davody, Djallel Bouneffouf, Himanshu Sahni, Soo Kim, Andrey Kolobov, Alexander Amini, Yao Liu, Xinshi Chen, kingsley, Craig Boutilier
Fri 10:30 a.m. - 11:00 a.m.
coffee break (Coffee Break)
Fri 11:00 a.m. - 12:00 p.m.
panel discussion with Craig Boutilier (Google Research), Emma Brunskill (Stanford), Chelsea Finn (Google Brain, Stanford, UC Berkeley), Mohammad Ghavamzadeh (Facebook AI), John Langford (Microsoft Research) and David Silver (Deepmind) (Panel Discussion)
Peter Stone, Craig Boutilier, Emma Brunskill, Chelsea Finn, John Langford, David Silver, Mohammad Ghavamzadeh
Fri 12:00 p.m. - 12:30 p.m.
optional posters (Poster Session)

Author Information

Yuxi Li (attain.ai)
Alborz Geramifard (Facebook)
Lihong Li (Google Research)
Csaba Szepesvari (Deepmind)
Tao Wang (Apple)

More from the Same Authors