Timezone: »

Learning to Collaborate in Markov Decision Processes
Goran Radanovic · Rati Devidze · David Parkes · Adish Singla

Tue Jun 11 02:35 PM -- 02:40 PM (PDT) @ Room 104
We consider a two-agent MDP framework where agents repeatedly solve a task in a collaborative setting. We study the problem of designing a learning algorithm for the first agent (A1) that facilitates a successful collaboration even in cases when the second agent (A2) is adapting its policy in an unknown way. The key challenge in our setting is that the presence of the second agent leads to non-stationarity and non-obliviousness of rewards and transitions for the first agent. We design novel online learning algorithms for agent A1 whose regret decays as $O(T^{1-\frac{3}{7} \cdot \alpha})$ with $T$ learning episodes provided that the magnitude of agent A2's policy changes between any two consecutive episodes are upper bounded by $O(T^{-\alpha})$. Here, the parameter $\alpha$ is assumed to be strictly greater than $0$, and we show that this assumption is necessary provided that the {\em learning parity with noise} problem is computationally hard. We show that sub-linear regret of agent A1 further implies near-optimality of the agents' joint return for MDPs that manifest the properties of a {\em smooth} game.

Author Information

Goran Radanovic (Harvard University)
Rati Devidze (Max Planck Institute for Software Systems)
David Parkes (Harvard University)
Adish Singla (Max Planck Institute (MPI-SWS))
Adish Singla

Adish Singla is a faculty member at the Max Planck Institute for Software Systems (MPI-SWS), Germany, where he has been leading the Machine Teaching Group since 2017. He conducts research in the area of Machine Teaching, with a particular focus on open-ended learning and problem-solving domains. In recent years, his research has centered around developing AI-driven educational technology for introductory programming environments. He has received several awards for his research, including an AAAI Outstanding Paper Honorable Mention Award (2022) and an ERC Starting Grant (2021). He also has extensive experience working in the industry and is a recipient of several industry awards, including a research grant from Microsoft Research Ph.D. Scholarship Programme (2018), Facebook Graduate Fellowship (2015), Microsoft Tech Transfer Award (2011), and Microsoft Gold Star Award (2010).

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors