Timezone: »
We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time. We consider special cases of the model, with a focus on logistic DCMDPs, which break the exponential dependence on history length by leveraging aggregation functions to determine context transitions. This special structure allows us to derive an upper-confidence-bound style algorithm for which we establish regret bounds. Motivated by our theoretical results, we introduce a practical model-based algorithm for logistic DCMDPs that plans in a latent space and uses optimism over history-dependent features. We demonstrate the efficacy of our approach on a recommendation task (using MovieLens data) where user behavior dynamics evolve in response to recommendations.
Author Information
Guy Tennenholtz (Google Research)
Nadav Merlis (ENSAE Paris)
Lior Shani (Google)
Martin Mladenov (Google)
Craig Boutilier (Google)
More from the Same Authors
-
2023 : Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding »
Alizée Pace · Hugo Yèche · Bernhard Schölkopf · Gunnar Ratsch · Guy Tennenholtz -
2023 : Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding »
Alizée Pace · Hugo Yèche · Bernhard Schölkopf · Gunnar Ratsch · Guy Tennenholtz -
2023 : Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding »
Alizée Pace · Hugo Yèche · Bernhard Schölkopf · Gunnar Ratsch · Guy Tennenholtz -
2023 : Preference Elicitation for Music Recommendations »
Ofer Meshi · Jon Feldman · Li Yang · Ben Scheetz · Yanli Cai · Mohammad Hossein Bateni · Corbyn Salisbury · Vikram Aggarwal · Craig Boutilier -
2023 Poster: On Preemption and Learning in Stochastic Scheduling »
Nadav Merlis · Hugo Richard · Flore Sentenac · Corentin Odic · Mathieu Molina · Vianney Perchet -
2023 Poster: Representation-Driven Reinforcement Learning »
Ofir Nabati · Guy Tennenholtz · Shie Mannor -
2022 : Modeling Recommender Ecosystems - Some Considerations »
Craig Boutilier -
2021 Poster: Meta-Thompson Sampling »
Branislav Kveton · Mikhail Konobeev · Manzil Zaheer · Chih-wei Hsu · Martin Mladenov · Craig Boutilier · Csaba Szepesvari -
2021 Spotlight: Meta-Thompson Sampling »
Branislav Kveton · Mikhail Konobeev · Manzil Zaheer · Chih-wei Hsu · Martin Mladenov · Craig Boutilier · Csaba Szepesvari -
2021 Poster: Confidence-Budget Matching for Sequential Budgeted Learning »
Yonathan Efroni · Nadav Merlis · Aadirupa Saha · Shie Mannor -
2021 Spotlight: Confidence-Budget Matching for Sequential Budgeted Learning »
Yonathan Efroni · Nadav Merlis · Aadirupa Saha · Shie Mannor -
2021 Poster: Ensemble Bootstrapping for Q-Learning »
Oren Peer · Chen Tessler · Nadav Merlis · Ron Meir -
2021 Spotlight: Ensemble Bootstrapping for Q-Learning »
Oren Peer · Chen Tessler · Nadav Merlis · Ron Meir -
2020 Poster: Optimistic Policy Optimization with Bandit Feedback »
Lior Shani · Yonathan Efroni · Aviv Rosenberg · Shie Mannor -
2020 Poster: ConQUR: Mitigating Delusional Bias in Deep Q-Learning »
DiJia Su · Jayden Ooi · Tyler Lu · Dale Schuurmans · Craig Boutilier -
2020 Poster: Optimizing Long-term Social Welfare in Recommender Systems: A Constrained Matching Approach »
Martin Mladenov · Elliot Creager · Omer Ben-Porat · Kevin Swersky · Richard Zemel · Craig Boutilier -
2019 : panel discussion with Craig Boutilier (Google Research), Emma Brunskill (Stanford), Chelsea Finn (Google Brain, Stanford, UC Berkeley), Mohammad Ghavamzadeh (Facebook AI), John Langford (Microsoft Research) and David Silver (Deepmind) »
Peter Stone · Craig Boutilier · Emma Brunskill · Chelsea Finn · John Langford · David Silver · Mohammad Ghavamzadeh -
2019 : posters »
Zhengxing Chen · Juan Jose Garau Luis · Ignacio Albert Smet · Aditya Modi · Sabina Tomkins · Riley Simmons-Edler · Hongzi Mao · Alexander Irpan · Hao Lu · Rose Wang · Subhojyoti Mukherjee · Aniruddh Raghu · Syed Arbab Mohd Shihab · Byung Hoon Ahn · Rasool Fakoor · Pratik Chaudhari · Elena Smirnova · Min-hwan Oh · Xiaocheng Tang · Tony Qin · Qingyang Li · Marc Brittain · Ian Fox · Supratik Paul · Xiaofeng Gao · Yinlam Chow · Gabriel Dulac-Arnold · Ofir Nachum · Nikos Karampatziakis · Bharathan Balaji · Supratik Paul · Ali Davody · Djallel Bouneffouf · Himanshu Sahni · Soo Kim · Andrey Kolobov · Alexander Amini · Yao Liu · Xinshi Chen · · Craig Boutilier -
2019 : invited talk by Craig Boutilier (Google Research): Reinforcement Learning in Recommender Systems: Some Challenges »
Craig Boutilier -
2019 Poster: Exploration Conscious Reinforcement Learning Revisited »
Lior Shani · Yonathan Efroni · Shie Mannor -
2019 Oral: Exploration Conscious Reinforcement Learning Revisited »
Lior Shani · Yonathan Efroni · Shie Mannor