Timezone: »
Poster
Multi-User Reinforcement Learning with Low Rank Rewards
Dheeraj Nagaraj · Suhas Kowshik · Naman Agarwal · Praneeth Netrapalli · Prateek Jain
We consider collaborative multi-user reinforcement learning, where multiple users have the same state-action space and transition probabilities but different rewards. Under the assumption that the reward matrix of the $N$ users has a low-rank structure -- a standard and practically successful assumption in the collaborative filtering setting -- we design algorithms with significantly lower sample complexity compared to the ones that learn the MDP individually for each user. Our main contribution is an algorithm which explores rewards collaboratively with $N$ user-specific MDPs and can learn rewards efficiently in two key settings: tabular MDPs and linear MDPs. When $N$ is large and the rank is constant, the sample complexity per MDP depends logarithmically over the size of the state-space, which represents an exponential reduction (in the state-space size) when compared to the standard ``non-collaborative'' algorithms. Our main technical contribution is a method to construct policies which obtain data such that low rank matrix completion is possible (without a generative model). This goes beyond the regular RL framework and is closely related to mean field limits of multi-agent RL.
Author Information
Dheeraj Nagaraj (Google Research)
I am a Research Scientist at Google Research
Suhas Kowshik (Amazon)
Naman Agarwal (Google Research)
Praneeth Netrapalli (Microsoft Research)
Prateek Jain (Google Research)
More from the Same Authors
-
2021 : Differentially Private Model Personalization »
Prateek Jain · J K Rush · Adam Smith · Shuang Song · Abhradeep Guha Thakurta -
2021 : Private Alternating Least Squares: Practical Private Matrix Completion with Tighter Rates »
Steve Chien · Prateek Jain · Walid Krichene · Steffen Rendle · Shuang Song · Abhradeep Guha Thakurta · Li Zhang -
2022 : DAFT: Distilling Adversarially Fine-tuned teachers for OOD Robustness »
Anshul Nasery · Sravanti Addepalli · Praneeth Netrapalli · Prateek Jain -
2023 Poster: Multi-Task Differential Privacy Under Distribution Skew »
Walid Krichene · Prateek Jain · Shuang Song · Mukund Sundararajan · Abhradeep Guha Thakurta · Li Zhang -
2021 Poster: Private Alternating Least Squares: Practical Private Matrix Completion with Tighter Rates »
Steve Chien · Prateek Jain · Walid Krichene · Steffen Rendle · Shuang Song · Abhradeep Guha Thakurta · Li Zhang -
2021 Oral: Private Alternating Least Squares: Practical Private Matrix Completion with Tighter Rates »
Steve Chien · Prateek Jain · Walid Krichene · Steffen Rendle · Shuang Song · Abhradeep Guha Thakurta · Li Zhang -
2021 Poster: A Regret Minimization Approach to Iterative Learning Control »
Naman Agarwal · Elad Hazan · Anirudha Majumdar · Karan Singh -
2021 Spotlight: A Regret Minimization Approach to Iterative Learning Control »
Naman Agarwal · Elad Hazan · Anirudha Majumdar · Karan Singh -
2021 Poster: Optimal regret algorithm for Pseudo-1d Bandit Convex Optimization »
Aadirupa Saha · Nagarajan Natarajan · Praneeth Netrapalli · Prateek Jain -
2021 Spotlight: Optimal regret algorithm for Pseudo-1d Bandit Convex Optimization »
Aadirupa Saha · Nagarajan Natarajan · Praneeth Netrapalli · Prateek Jain -
2021 Poster: Acceleration via Fractal Learning Rate Schedules »
Naman Agarwal · Surbhi Goel · Cyril Zhang -
2021 Spotlight: Acceleration via Fractal Learning Rate Schedules »
Naman Agarwal · Surbhi Goel · Cyril Zhang -
2020 Poster: Boosting for Control of Dynamical Systems »
Naman Agarwal · Nataly Brukhim · Elad Hazan · Zhou Lu -
2019 Poster: Efficient Full-Matrix Adaptive Regularization »
Naman Agarwal · Brian Bullins · Xinyi Chen · Elad Hazan · Karan Singh · Cyril Zhang · Yi Zhang -
2019 Poster: Online Control with Adversarial Disturbances »
Naman Agarwal · Brian Bullins · Elad Hazan · Sham Kakade · Karan Singh -
2019 Oral: Efficient Full-Matrix Adaptive Regularization »
Naman Agarwal · Brian Bullins · Xinyi Chen · Elad Hazan · Karan Singh · Cyril Zhang · Yi Zhang -
2019 Oral: Online Control with Adversarial Disturbances »
Naman Agarwal · Brian Bullins · Elad Hazan · Sham Kakade · Karan Singh