Adaptively Grouped Contextual Bandits for Heterogeneous Human-AI Decision Making with Conformal Prediction Sets
Yanchen Wu ⋅ Bo Li
Abstract
Personalizing AI decision support for heterogeneous human decision-makers remains a key challenge. We study a collaboration workflow where AI provides a reduced prediction set via conformal prediction as an input for human, and human makes final decision. We use contextual bandits to learn the complex and intangible human decision function, where the optimal set size, governed by a significance parameter $\alpha$ (arms), varies across individuals and tasks (context). To address large arms spaces and high-dimensional contexts, we introduce the Adaptively Grouped Contextual Bandit (AGCB) framework, bypassing unreliable complex function online approximation in favor of directly exploiting Human-AI problem structure through two pillars: continuity-aware counterfactual reasoning that efficiently shares information across decisions, and a data-driven zooming mechanism that adaptively partitions the context space. The zooming mechanism performs a principled, native trade-off between intra-group estimation error and inter-group approximation bias, ensuring optimal granularity for both cumulative and simple regret objectives. Crucially, a single continuity assumption uniquely enables both the bias control for adaptive grouping and the robustness of our counterfactual updates. This leads to minimax-optimal regret rates. Empirical results confirm that AGCB significantly outperforms existing methods in heterogeneous, data-scarce environments.
Successful Page Load