Sparse and Faithful Local Explanations with Piecewise Linear Surrogates
Abstract
Local post-hoc explanations are widely used to understand black-box models on tabular data, with Local Interpretable Model-agnostic Explanations (LIME) being a popular approach. LIME approximates a black-box model using a sparse linear surrogate in a local neighborhood, implicitly assuming feature-wise linear homogeneity. However, this assumption often fails when local feature effects exhibit heterogeneous or nonlinear behaviors, resulting in unfaithful and unstable explanations. Moreover, LIME relies on a decoupled feature selection procedure that is not aligned with the surrogate modeling objective, further exacerbating instability under local sampling. To address these limitations, we propose PL-LIME, a two-stage sparse local explanation framework that ensures objective consistency across stages. PL-LIME models feature-wise local effects using instance-anchored piecewise linear functions, providing a minimal yet principled extension beyond linear surrogates under a fixed explanation budget. Sparsity is enforced through a decoupled nonnegative shrinkage procedure that directly scales the estimated local effects, improving stability while preserving interpretability. Experiments on synthetic and real-world datasets demonstrate that PL-LIME achieves higher local fidelity and stability, and provides more reliable local explanations that capture finer-grained local effect structures than LIME.