Timezone: »

 
Oral
Scale-free adaptive planning for deterministic dynamics & discounted rewards
Peter Bartlett · Victor Gabillon · Jennifer Healey · Michal Valko

Thu Jun 13 12:15 PM -- 12:20 PM (PDT) @ Room 102

We address the problem of planning in an environment with deterministic dynamics and stochastic discounted rewards under a limited numerical budget where the ranges of both rewards and noise are unknown. We introduce \platypoos, an adaptive, robust and efficient alternative to the \OLOP (open-loop optimistic planning) algorithm. Whereas \OLOP requires apriori knowledge of the ranges of both rewards and noise, \platypoos dynamically adapts its behavior to both. This allows \platypoos to be immune to two vulnerabilities of \OLOP: failure when given underestimated ranges of noise and rewards and inefficiency when these are overestimated. \Platypoos additionally adapts to the global smoothness of the value function. We assess \platypoos’s performance in terms of the simple regret, the expected loss resulting from choosing our algorithm’s recommended action rather than an optimal one. We show that \platypoos acts in a provably more efficient manner vs \OLOP when \OLOP is given an overestimated reward and show that in the case of no noise, \platypoos learns exponentially faster than \OLOP.

Author Information

Peter Bartlett (UC Berkeley)
Victor Gabillon (Huawei)
Jennifer Healey (Adobe)

Jennifer Healey has a long history of looking into how people interact with sensors and envisioning the new experiences that this enables. She holds BS, MS and PhD degrees from MIT in EECS. During here graduate studies at the Media Lab, she pioneered the field of “Affective Computing” with Rosalind Picard and developed the first wearable computer with physiological sensors and a video camera that allowed the wearer to track their daily activities and how record how they felt while doing them. She worked at both IBM Zurich and IBM TJ Watson on AI for smart phones with a multi-modal user interface that allowed the user to switch from voice to visual (input and output) seamlessly. She has been an Instructor in Translational Medicine at Harvard Medical School and Beth Israel Deaconess Medical Center, where she worked on new algorithms to predict cardiac health from mobile sensors. She continued working in Digital Health at both HP and Intel where she helped develop the Shimmer sensing platform and the Intel Health Guide. Her research at Intel extended to sensing people in cars and cooperative autonomous driving (see her TED talk). She has also continued her work in Affective computing, developing a new software platform for cell phones which included onboard machine learning algorithms for recognizing stress from heart rate, activation from features of voice and privacy protected sentiment analysis of texts and emails (Best Demo at MobileHCI 2018).

Michal Valko (DeepMind)

Michal is a research scientist in DeepMind Paris and SequeL team at Inria Lille - Nord Europe, France, lead by Philippe Preux and Rémi Munos. He also teaches the course Graphs in Machine Learning at l'ENS Cachan. Michal is primarily interested in designing algorithms that would require as little human supervision as possible. This means 1) reducing the “intelligence” that humans need to input into the system and 2) minimising the data that humans need spend inspecting, classifying, or “tuning” the algorithms. Another important feature of machine learning algorithms should be the ability to adapt to changing environments. That is why he is working in domains that are able to deal with minimal feedback, such as bandit algorithms, semi-supervised learning, and anomaly detection. Most recently he has worked on sequential algorithms with structured decisions where exploiting the structure can lead to provably faster learning. In the past the common thread of Michal's work has been adaptive graph-based learning and its application to the real world applications such as recommender systems, medical error detection, and face recognition. His industrial collaborators include Adobe, Intel, Technicolor, and Microsoft Research. He received his PhD in 2011 from University of Pittsburgh under the supervision of Miloš Hauskrecht and after was a postdoc of Rémi Munos.

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors