Decentralized Bandits without Global Clock for Dynamic Matching Market
Mengtong Gao ⋅ Zhenhe Zhang ⋅ Jichen Li ⋅ Xia Xuanzhi ⋅ Wentao Zhou ⋅ Jing Chen
Abstract
Two-sided matching markets are pervasive in numerous real-world applications, ranging from labor markets to online advertising. Recently, a rich line of research has studied the matching bandit problem, where participants learn their preferences through iterative interactions. However, existing works assume a static environment with fixed participants and require synchronized learning, in which all participants start simultaneously and have access to a global clock. In reality, matching markets are inherently dynamic: participants may enter and leave at arbitrary time steps without any global signal, leading to an uncoordinated scenario that render existing algorithms inapplicable. To address this challenge, we first investigate the one-sided learning setting under uncoordinated player arrivals, where only players need to learn their preferences. We propose the Way-SE algorithm, which achieves an $O(\frac{K^2 \log T}{\Delta_{\min}^2})$ regret through a distributed exploration mechanism that enables participants to implicitly coordinate exploration phases using only local clocks, without global synchronization. More importantly, we extend to the fully decentralized, dynamic two-sided learning setting where both sides need to learn their preferences and players may arrive or depart arbitrarily. We introduce Way-SE-2S, the first algorithm to achieve a sublinear regret $O(T^{\frac{K-1}{K}} \log T)$ in this challenging environment, without requiring global signals, restrictive preference structures, or observability of competing agents' outcomes. Our work provides the first theoretical guarantee for stable matching in fully decentralized and uncoordinated bandit markets.
Successful Page Load