Skip to yearly menu bar Skip to main content


Poster

Federated Combinatorial Optimization with Multi-Agent Multi-Armed Bandits

Fares Fourati · Mohamed-Slim Alouini · Vaneet Aggarwal


Abstract: This paper introduces a federated learning framework tailored for online combinatorial optimization with bandit feedback. In this setting, agents choose subsets of arms, observe noisy rewards for these subsets without access to individual arm information, and can cooperate and share information at specific intervals. Our framework converts any offline resilient single-agent $(\alpha-\epsilon)$-approximation algorithm with complexity in the general form of $\mathcal{O}\left(\frac{\psi}{\epsilon^\beta}\log^\gamma\left(\frac{1}{\epsilon}\right)\right)$ into an online multi-agent algorithm with an $\alpha$-regret of at most $\tilde{\mathcal{O}}\left(m^{-\frac{1}{3+\beta}} \psi^\frac{1}{3+\beta} T^\frac{2+\beta}{3+\beta}\right)$. This not only eliminates the $\epsilon$ approximation error but also ensures sub-linearity concerning the time horizon $T$ and demonstrates linear speedup with an increasing number of communicating agents $m$. Additionally, the algorithm is notably communication-efficient, requiring only sublinear $\tilde{\mathcal{O}}\left(\psi T^\frac{\beta}{\beta+1}\right)$ communication rounds. Furthermore, the framework is successfully applied to online stochastic submodular maximization using various offline approximation algorithms, yielding first results for single-agent and multi-agent settings and recovering specialized single-agent theoretical guarantees. Furthermore, we empirically validate our approach to a stochastic data summarization problem, illustrating the effectiveness of the proposed framework, even for single-agent scenarios.

Live content is unavailable. Log in and register to view live content