The Pareto-optimal Trade-off between Regret and Statistical Inference in Linear Stochastic Bandits under Safety Constraints
Yuming Shao ⋅ Zhixuan Fang
Abstract
Linear bandits traditionally prioritize regret minimization, often overlooking statistical inference of the underlying parameter as a critical objective. In high-stakes settings such as healthcare, precise parameter estimation is indispensable, as it provides fundamental insights into system mechanisms and ensures robust decision-making under covariate shift. We investigate the tripartite balance between regret, inference, and safety, deriving a fundamental minimax lower bound that characterizes the Pareto-optimal frontier of these competing goals. We then propose SERMiSC, a novel algorithm that achieves the optimal trade-off by matching this lower bound while maintaining a near-constant $\tilde{O}(1)$ safety risk. Empirical results demonstrate that SERMiSC effectively navigates the Pareto frontier and outperforms various baselines, thereby validating our theoretical analysis.
Successful Page Load