Deep Reinforcement Learning Finds Bayes-Nash Equilibrium in Competitive Newsvendor Problems
Abstract
We investigate learning dynamics in competitive newsvendor games, a class of continuous-action games with strategic substitutes. Despite established equilibrium properties, convergence of independent learning algorithms in repeated general-sum play remains uncertain. We analyze structural properties under complete and incomplete information, deriving closed-form equilibria for a symmetric complete-information benchmark with perfect substitution. Our main theoretical contribution proves strict monotonicity in both complete-information and Bayesian models with private costs, ensuring equilibrium uniqueness and ruling out unstable dynamics. This provides convergence guarantees for variational-inequality-based algorithms. Numerical experiments using deep reinforcement learning agents with Proximal Policy Optimization empirically demonstrate convergence to Nash and Bayesian Nash equilibria, verified by equilibrium checks. These results establish a foundation for applying deep reinforcement learning in competitive inventory management.