Gaining Free or Low-Cost Interpretability with Interpretable Partial Substitute
Tong Wang

Thu Jun 13th 10:00 -- 10:05 AM @ Grand Ballroom

This work addresses the situation where a black-box model outperforms all its interpretable competitors. The existing solution to understanding the black-box is to use an explainer model to generate explanations, which can be ambiguous and inconsistent. We propose an alternative solution by finding an interpretable substitute on a subset of data where the black-box model is \emph{overkill} or nearly overkill and use this interpretable model to process this subset of data, leaving the rest to the black-box. This way, on this subset of data, the model gains complete interpretability and transparency to replace otherwise non-perfect approximations by an external explainer. This transparency is obtained at minimal cost or no cost of the predictive performance. Under this framework, we develop Partial Substitute Rules (PSR) model that uses decision rules to capture the subspace of data where the rules are as accurate or almost as accurate as the black-box provided. PSR is agnostic to the black-box model. To train a PSR, we devise an efficient search algorithm that iteratively finds the optimal model and exploits theoretically grounded strategies to reduce computation. Experiments on structured and text data show that PSR obtains an effective trade-off between transparency and interpretability.

Author Information

Tong Wang (University of Iowa)

Related Events (a corresponding poster, oral, or spotlight)