Fast Mixing Steady-State Control in Markov Decision Processes
Abstract
Stability is a property of fundamental importance in real-world systems. Although it has been widely studied and well understood in control theory (CT) for deterministic systems, it is largely overlooked in stochastic systems such as Markov decision processes (MDPs). In this paper, we aim to translate the steady-state control problem, well established in CT, where the goal is to synthesize a controller with prescribed asymptotic stability properties, into the MDP framework. To this end, we propose the novel fast-mixing steady-state (FMSS) problem. Given an ergodic MDP and a target steady-state distribution, the objective is to synthesize a Markovian policy that induces this distribution with the fastest possible convergence rate. Addressing this problem requires controlling the spectral properties of the induced Markov chain (MC) transition matrix, which generally leads to non-convex programs. Thus, we derive a tractable surrogate objective that leads to a convex program, whose properties we study in terms of approximation quality, feasibility, and computational complexity. We then move to the learning setting and propose an "offline" sample-based algorithm for FMSS (FMSS-SV), designed for tabular MDPs, in which the environment’s transition model is estimated from data. We quantify the impact of transition model estimation errors on both the objective value and the learned policy, and provide a finite-sample complexity analysis.