Timezone: »

Making Linear MDPs Practical via Contrastive Representation Learning
Tianjun Zhang · Tongzheng Ren · Mengjiao Yang · Joseph E Gonzalez · Dale Schuurmans · Bo Dai

Thu Jul 21 01:25 PM -- 01:30 PM (PDT) @ Room 307

It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting the low-rank representations, which motivates the recent theoretical study on linear MDPs. However, most of approaches require the representation to be given and encounter unrealistic assumptions about the normalization of the decomposition and unresolved computational challenges.Instead, we consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning via contrastive estimation. The framework also admits confidence-adjusted index algorithms, which enables an efficient and principled approach to incorporating optimism or pessimism under uncertainty. To the best of our knowledge, this provides the first practical representation learning method for linear MDPs that achieves both strong theoretical guarantees and empirical performance. Theoretically, we prove that the proposed algorithm is sample efficient in both the online and offline settings. Empirically, we demonstrate superior practical performance over existing state-of-the-art algorithms on several benchmark tasks in both cases.

Author Information

Tianjun Zhang (UC Berkeley)
Tongzheng Ren (UT Austin / Google Brain)
Mengjiao Yang (Google Brain)
Joseph E Gonzalez (UC Berkeley)
Dale Schuurmans (Google / University of Alberta)
Bo Dai (Google Brain)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors