A Two-Layer Framework for Joint Online Configuration Selection and Admission Control
Owen Shen ⋅ Haoran Xu ⋅ Yinyu Ye ⋅ Peter Glynn ⋅ Patrick Jaillet
Abstract
We study online configuration selection with admission control problem, which arises in LLM serving, GPU scheduling, and revenue management. In a planning horizon with $T$ periods, we consider a two-layer framework for the decisions made within each time period. In the first layer, the decision maker selects one of the $K$ configurations (ex. quantization, parallelism, fare class) which induces distribution over the reward-resource pair of the incoming request. In the second layer, the decision maker observes the request and then decides whether to accept it or not. Benchmarking this framework requires care. We introduce a \textbf{switching-aware fluid oracle} that accounts for the value of mixing configurations over time, provably upper-bounding any online policy. We derive a min-max formulation for evaluating the benchmark, and we characterize saddle points of the max-min problem via primal-dual optimality conditions linking equilibrium, feasibility, and complementarity. This guides the design of \textbf{SP-UCB--OLP} algorithm, which solves an optimistic saddle point problem and achieves $\tilde{O}(\sqrt{KT})$ regret.
Successful Page Load