Online Packet Scheduling with Deadlines and Learning
Gianmarco Genalti ⋅ Achraf Azize ⋅ Vianney Perchet
Abstract
Network routers that enforce Quality-of-Service (QoS) guarantees must decide, at every clock cycle, which expiring packet of information to transmit, even when the value of the packet is unknown until it is processed. We frame this problem as the Online Packet Scheduling with Deadlines (OPSD) problem under Partial Feedback: packets arrive at every clock cycle, with different deadlines, but the weights are only observed after execution. Under a stochastic assumption on the unknown weights, we explore different variants of the OPSD problem with bandit feedback. We establish a connection between our setting and the sleeping bandits problem, and set our learning goal to $\alpha$-regret minimization. We provide algorithms with provable $\alpha$-regret guarantees under different spans of slackness, distinguishing systems allowing for randomization and systems that do not. In every scenario, our algorithms achieve an $\alpha$-regret upper bound of $\widetilde{\mathcal{O}}\left(\sqrt{KT}\right)$, matching the lower bound for the standard bandit setting. In the practically relevant case of $2$-bounded deadline instances, where the deadline is set at most one clock cycle away from the arrival, our deterministic algorithm achieves the provably tightest possible competitive ratio. Remarkably, when the number of distinct packet types $K\ge 2$ is finite, it is possible to break the well-established $\Phi = \frac{1+\sqrt{5}}{2}$ competitive ratio barrier and attain a tighter competitive ratio $\theta_K$ ranging in $[\sqrt{2}, \Phi)$.
Successful Page Load