Oral
Scalable Bilinear Pi Learning Using State and Action Features
Yichen Chen · Lihong Li · Mengdi Wang

Thu Jul 12th 11:40 -- 11:50 AM @ A1

Approximate linear programming (ALP) represents one of the major algorithmic families to solve large-scale Markov decision processes (MDP). In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided. This algorithm enjoys a number of advantages. First, it adopts linear and bilinear models to represent the high-dimensional value function and state-action distributions, respectively, using given state and action features. Its run-time complexity depends on the number of features, not the size of the underlying MDPs. Second, it operates in a fully online fashion without having to store any sample, thus having minimal memory footprint. Third, we prove that it is sample-efficient, solving for the optimal policy to high precision with a sample complexity linear in the dimension of the parameter space.

Author Information

Yichen Chen (Princeton University)
Lihong Li (Google Inc.)
Mengdi Wang (Princeton University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors