Timezone: »
We address the problem of planning in an environment with deterministic dynamics and stochastic discounted rewards under a limited numerical budget where the ranges of both rewards and noise are unknown. We introduce PlaTypOOS, an adaptive, robust, and efficient alternative to the OLOP (open-loop optimistic planning) algorithm. Whereas OLOP requires a priori knowledge of the ranges of both rewards and noise, PlaTypOOS dynamically adapts its behavior to both. This allows PlaTypOOS to be immune to two vulnerabilities of OLOP: failure when given underestimated ranges of noise and rewards and inefficiency when these are overestimated. PlaTypOOS additionally adapts to the global smoothness of the value function. PlaTypOOS acts in a provably more efficient manner vs. OLOP when OLOP is given an overestimated reward and show that in the case of no noise, PlaTypOOS learns exponentially faster.
Author Information
Peter Bartlett (UC Berkeley)
Victor Gabillon (Huawei)
Jennifer Healey (Adobe)
Jennifer Healey has a long history of looking into how people interact with sensors and envisioning the new experiences that this enables. She holds BS, MS and PhD degrees from MIT in EECS. During here graduate studies at the Media Lab, she pioneered the field of “Affective Computing” with Rosalind Picard and developed the first wearable computer with physiological sensors and a video camera that allowed the wearer to track their daily activities and how record how they felt while doing them. She worked at both IBM Zurich and IBM TJ Watson on AI for smart phones with a multi-modal user interface that allowed the user to switch from voice to visual (input and output) seamlessly. She has been an Instructor in Translational Medicine at Harvard Medical School and Beth Israel Deaconess Medical Center, where she worked on new algorithms to predict cardiac health from mobile sensors. She continued working in Digital Health at both HP and Intel where she helped develop the Shimmer sensing platform and the Intel Health Guide. Her research at Intel extended to sensing people in cars and cooperative autonomous driving (see her TED talk). She has also continued her work in Affective computing, developing a new software platform for cell phones which included onboard machine learning algorithms for recognizing stress from heart rate, activation from features of voice and privacy protected sentiment analysis of texts and emails (Best Demo at MobileHCI 2018).
Michal Valko (DeepMind)
Michal is a research scientist in DeepMind Paris and SequeL team at Inria Lille - Nord Europe, France, lead by Philippe Preux and Rémi Munos. He also teaches the course Graphs in Machine Learning at l'ENS Cachan. Michal is primarily interested in designing algorithms that would require as little human supervision as possible. This means 1) reducing the “intelligence” that humans need to input into the system and 2) minimising the data that humans need spend inspecting, classifying, or “tuning” the algorithms. Another important feature of machine learning algorithms should be the ability to adapt to changing environments. That is why he is working in domains that are able to deal with minimal feedback, such as bandit algorithms, semi-supervised learning, and anomaly detection. Most recently he has worked on sequential algorithms with structured decisions where exploiting the structure can lead to provably faster learning. In the past the common thread of Michal's work has been adaptive graph-based learning and its application to the real world applications such as recommender systems, medical error detection, and face recognition. His industrial collaborators include Adobe, Intel, Technicolor, and Microsoft Research. He received his PhD in 2011 from University of Pittsburgh under the supervision of Miloš Hauskrecht and after was a postdoc of Rémi Munos.
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Oral: Scale-free adaptive planning for deterministic dynamics & discounted rewards »
Thu. Jun 13th 07:15 -- 07:20 PM Room Room 102
More from the Same Authors
-
2021 : When does gradient descent with logistic loss interpolate using deep networks with smoothed ReLU activations? »
Niladri Chatterji · Phil Long · Peter Bartlett -
2021 : On the Theory of Reinforcement Learning with Once-per-Episode Feedback »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett · Michael Jordan -
2021 : On the Theory of Reinforcement Learning with Once-per-Episode Feedback »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett · Michael Jordan -
2021 : Adversarial Examples in Random Deep Networks »
Peter Bartlett -
2020 Poster: On Thompson Sampling with Langevin Algorithms »
Eric Mazumdar · Aldo Pacchiano · Yian Ma · Michael Jordan · Peter Bartlett -
2020 Poster: Accelerated Message Passing for Entropy-Regularized MAP Inference »
Jonathan Lee · Aldo Pacchiano · Peter Bartlett · Michael Jordan -
2020 Poster: No-Regret Exploration in Goal-Oriented Reinforcement Learning »
Jean Tarbouriech · Evrard Garcelon · Michal Valko · Matteo Pirotta · Alessandro Lazaric -
2020 Poster: Budgeted Online Influence Maximization »
Pierre Perrault · Jennifer Healey · Zheng Wen · Michal Valko -
2019 : Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret »
Michal Valko -
2019 : Michal Valko: How Negative Dependence Broke the Quadratic Barrier for Learning with Graphs and Kernels »
Michal Valko -
2019 : Poster Session 1 (all papers) »
Matilde Gargiani · Yochai Zur · Chaim Baskin · Evgenii Zheltonozhskii · Liam Li · Ameet Talwalkar · Xuedong Shang · Harkirat Singh Behl · Atilim Gunes Baydin · Ivo Couckuyt · Tom Dhaene · Chieh Lin · Wei Wei · Min Sun · Orchid Majumder · Michele Donini · Yoshihiko Ozaki · Ryan P. Adams · Christian Geißler · Ping Luo · zhanglin peng · · Ruimao Zhang · John Langford · Rich Caruana · Debadeepta Dey · Charles Weill · Xavi Gonzalvo · Scott Yang · Scott Yak · Eugen Hotaj · Vladimir Macko · Mehryar Mohri · Corinna Cortes · Stefan Webb · Jonathan Chen · Martin Jankowiak · Noah Goodman · Aaron Klein · Frank Hutter · Mojan Javaheripi · Mohammad Samragh · Sungbin Lim · Taesup Kim · SUNGWOONG KIM · Michael Volpp · Iddo Drori · Yamuna Krishnamurthy · Kyunghyun Cho · Stanislaw Jastrzebski · Quentin de Laroussilhe · Mingxing Tan · Xiao Ma · Neil Houlsby · Andrea Gesmundo · Zalán Borsos · Krzysztof Maziarz · Felipe Petroski Such · Joel Lehman · Kenneth Stanley · Jeff Clune · Pieter Gijsbers · Joaquin Vanschoren · Felix Mohr · Eyke Hüllermeier · Zheng Xiong · Wenpeng Zhang · Wenwu Zhu · Weijia Shao · Aleksandra Faust · Michal Valko · Michael Y Li · Hugo Jair Escalante · Marcel Wever · Andrey Khorlin · Tara Javidi · Anthony Francis · Saurajit Mukherjee · Jungtaek Kim · Michael McCourt · Saehoon Kim · Tackgeun You · Seungjin Choi · Nicolas Knudde · Alexander Tornede · Ghassen Jerfel -
2019 Poster: Defending Against Saddle Point Attack in Byzantine-Robust Distributed Learning »
Dong Yin · Yudong Chen · Kannan Ramchandran · Peter Bartlett -
2019 Poster: Exploiting structure of uncertainty for efficient matroid semi-bandits »
Pierre Perrault · Vianney Perchet · Michal Valko -
2019 Oral: Exploiting structure of uncertainty for efficient matroid semi-bandits »
Pierre Perrault · Vianney Perchet · Michal Valko -
2019 Oral: Defending Against Saddle Point Attack in Byzantine-Robust Distributed Learning »
Dong Yin · Yudong Chen · Kannan Ramchandran · Peter Bartlett -
2019 Poster: Rademacher Complexity for Adversarially Robust Generalization »
Dong Yin · Kannan Ramchandran · Peter Bartlett -
2019 Poster: Online learning with kernel losses »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett -
2019 Oral: Rademacher Complexity for Adversarially Robust Generalization »
Dong Yin · Kannan Ramchandran · Peter Bartlett -
2019 Oral: Online learning with kernel losses »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett -
2018 Poster: Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks »
Peter Bartlett · Dave Helmbold · Phil Long -
2018 Poster: On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo »
Niladri Chatterji · Nicolas Flammarion · Yian Ma · Peter Bartlett · Michael Jordan -
2018 Oral: On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo »
Niladri Chatterji · Nicolas Flammarion · Yian Ma · Peter Bartlett · Michael Jordan -
2018 Oral: Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks »
Peter Bartlett · Dave Helmbold · Phil Long -
2018 Poster: Improved large-scale graph learning through ridge spectral sparsification »
Daniele Calandriello · Alessandro Lazaric · Ioannis Koutis · Michal Valko -
2018 Poster: Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates »
Dong Yin · Yudong Chen · Kannan Ramchandran · Peter Bartlett -
2018 Oral: Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates »
Dong Yin · Yudong Chen · Kannan Ramchandran · Peter Bartlett -
2018 Oral: Improved large-scale graph learning through ridge spectral sparsification »
Daniele Calandriello · Alessandro Lazaric · Ioannis Koutis · Michal Valko -
2017 : Faster graph bandit learning using information about the neighbors »
Michal Valko -
2017 Poster: Zonotope hit-and-run for efficient sampling from projection DPPs »
Guillaume Gautier · Rémi Bardenet · Michal Valko -
2017 Talk: Zonotope hit-and-run for efficient sampling from projection DPPs »
Guillaume Gautier · Rémi Bardenet · Michal Valko -
2017 Poster: Recovery Guarantees for One-hidden-layer Neural Networks »
Kai Zhong · Zhao Song · Prateek Jain · Peter Bartlett · Inderjit Dhillon -
2017 Poster: Second-Order Kernel Online Convex Optimization with Adaptive Sketching »
Daniele Calandriello · Alessandro Lazaric · Michal Valko -
2017 Talk: Second-Order Kernel Online Convex Optimization with Adaptive Sketching »
Daniele Calandriello · Alessandro Lazaric · Michal Valko -
2017 Talk: Recovery Guarantees for One-hidden-layer Neural Networks »
Kai Zhong · Zhao Song · Prateek Jain · Peter Bartlett · Inderjit Dhillon