Timezone: »

Multi-User Reinforcement Learning with Low Rank Rewards
Dheeraj Nagaraj · Suhas Kowshik · Naman Agarwal · Praneeth Netrapalli · Prateek Jain

Tue Jul 25 05:00 PM -- 06:30 PM (PDT) @ Exhibit Hall 1 #524
We consider collaborative multi-user reinforcement learning, where multiple users have the same state-action space and transition probabilities but different rewards. Under the assumption that the reward matrix of the $N$ users has a low-rank structure -- a standard and practically successful assumption in the collaborative filtering setting -- we design algorithms with significantly lower sample complexity compared to the ones that learn the MDP individually for each user. Our main contribution is an algorithm which explores rewards collaboratively with $N$ user-specific MDPs and can learn rewards efficiently in two key settings: tabular MDPs and linear MDPs. When $N$ is large and the rank is constant, the sample complexity per MDP depends logarithmically over the size of the state-space, which represents an exponential reduction (in the state-space size) when compared to the standard ``non-collaborative'' algorithms. Our main technical contribution is a method to construct policies which obtain data such that low rank matrix completion is possible (without a generative model). This goes beyond the regular RL framework and is closely related to mean field limits of multi-agent RL.

Author Information

Dheeraj Nagaraj (Google Research)

I am a Research Scientist at Google Research

Suhas Kowshik (Amazon)
Naman Agarwal (Google Research)
Praneeth Netrapalli (Microsoft Research)
Prateek Jain (Google Research)

More from the Same Authors