Timezone: »

Adaptive Model Design for Markov Decision Process
Siyu Chen · Donglin Yang · Jiayang Li · Senmiao Wang · Zhuoran Yang · Zhaoran Wang

Wed Jul 20 12:50 PM -- 12:55 PM (PDT) @ Room 307

In a Markov decision process (MDP), the optimal policy selected by the agent is achieved by an evolutionary process by means of which it incrementally searches for better policies. During this process, the agent usually does not bear the external costs/benefits of its actions, sometimes leading to an inefficient outcome that only fulfills its own interest. Therefore, appropriate regulations are often required to induce a more desirable outcome in an MDP model. In this paper, we study how to regulate such an agent by redesigning model parameters that can affect the rewards and/or the transition kernels. We formulate this problem as a hierarchical mathematical program, in which the lower level MDP is regulated by the upper-level model designer. To solve this problem, we develop a scheme that allows the designer to iteratively predict the reaction of the agent by solving the MDP, and then adaptively update model parameters to guide the agent's behavior towards the desired end. The convergence of the algorithm is first theoretically analyzed and then empirically tested on several MDP models arising in economics and robotics.

Author Information

Siyu Chen (Tsinghua University)
Donglin Yang (Tsinghua University)
Jiayang Li (Northwestern University)
Senmiao Wang (Northwestern University)
Zhuoran Yang (Yale University)
Zhaoran Wang (Northwestern University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors