Provably Adaptive Linear Approximation for the Shapley Value and Beyond
Weida Li ⋅ Yaoliang Yu ⋅ Bryan Kian Hsiang Low
Abstract
The Shapley value, and its broader family of semi-values, has received much attention in various attribution problems. A fundamental and long-standing challenge is their efficient approximation, since exact computation generally requires an exponential number of utility queries in the number of players $n$. To meet the challenges of large-scale applications, we explore the limits of efficiently approximating semi-values under a $\Theta(n)$ space constraint. Building upon a vector concentration inequality, we establish a theoretical framework that enables sharper asymptotic query complexities for existing unbiased randomized algorithms. Within this framework, we systematically develop a linear-space algorithm that requires $O(\frac{n}{\epsilon^{2}}\log\frac{1}{\delta})$ utility queries to ensure $P(\\|\hat{\boldsymbol\phi}-\boldsymbol\phi\\|\_{2}\geq\epsilon)\leq \delta$ for all commonly used semi-values. In particular, our framework naturally bridges OFA, unbiased kernelSHAP, SHAP-IQ and the regression-adjusted approach, and definitively characterizes when paired sampling is beneficial. Moreover, our algorithm allows explicit minimization of the approximation variance $\mathbb{E}[\\|\hat{\boldsymbol\phi}-\boldsymbol\phi\\|_{2}^{2}]$ for each specific utility function. Accordingly, we introduce the first adaptive, linear-time, linear-space randomized algorithm, Adalina, that theoretically achieves improved approximation variance. All of our theoretical findings are experimentally validated.
Successful Page Load