Timezone: »

Reinforcement Learning for Cost-Aware Markov Decision Processes
Wesley A Suttle · Kaiqing Zhang · Zhuoran Yang · Ji Liu · David N Kraemer

Tue Jul 20 06:40 AM -- 06:45 AM (PDT) @

Ratio maximization has applications in areas as diverse as finance, reward shaping for reinforcement learning (RL), and the development of safe artificial intelligence, yet there has been very little exploration of RL algorithms for ratio maximization. This paper addresses this deficiency by introducing two new, model-free RL algorithms for solving cost-aware Markov decision processes, where the goal is to maximize the ratio of long-run average reward to long-run average cost. The first algorithm is a two-timescale scheme based on relative value iteration (RVI) Q-learning and the second is an actor-critic scheme. The paper proves almost sure convergence of the former to the globally optimal solution in the tabular case and almost sure convergence of the latter under linear function approximation for the critic. Unlike previous methods, the two algorithms provably converge for general reward and cost functions under suitable conditions. The paper also provides empirical results demonstrating promising performance and lending strong support to the theoretical results.

Author Information

Wesley A Suttle (Stony Brook University)

Wesley Suttle is currently a Ph.D. candidate in the Applied Mathematics and Statistics Department at Stony Brook University. He holds a M.Sc. degree in Applied Mathematics and Statistics from Stony Brook University and a B.A. degree in Mathematics and Philosophy from the University of Minnesota, Twin Cities. His research interests include reinforcement learning for ratio optimization problems and multi-agent reinforcement learning.

Kaiqing Zhang (MIT)
Zhuoran Yang (Princeton University)
Ji Liu (Stony Brook University)
David N Kraemer (Stony Brook University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors