Workshop
“Could it have been different?” Counterfactuals in Minds and Machines
Nina Corvelo Benz · Ricardo DominguezOlmedo · Manuel GomezRodriguez · Thorsten Joachims · AmirHossein Karimi · Stratis Tsirtsis · Isabel Valera · Sarah A Wu
Meeting Room 301
Had I left 5 minutes earlier, I would have caught the bus. Had I been driving slower, I would have avoided the accident. Counterfactual thoughts—“what if?” scenarios about outcomes contradicting what actually happened—play a key role in everyday human reasoning and decisionmaking. In conjunction with rapid advancements in the mathematical study of causality, there has been an increasing interest in the development of machine learning methods that support elements of counterfactual reasoning, i.e., they make predictions about outcomes that "could have been different". Such methods find applications in a wide variety of domains ranging from personalized healthcare and explainability to AI safety and offline reinforcement learning. Although the research at the intersection of causal inference and machine learning is blooming, there has been no venue so far explicitly focusing on methods involving counterfactuals. In this workshop, we aim to fill that space by facilitating interdisciplinary interactions that will shed light onto the three following questions: (i) What insights can causal machine learning take from the latest advances in cognitive science? (ii) In what use cases is each causal modeling framework most appropriate for modeling counterfactuals? (iii) What barriers need to be lifted for the wider adoption of counterfactualbased machine learning applications, like personalized healthcare?
Schedule
Sat 12:00 p.m.  12:10 p.m.

Welcome & Introduction
(
Remarks
)
SlidesLive Video 
🔗 
Sat 12:10 p.m.  12:30 p.m.

Mihaela van der Schaar  Causal Deep Learning
(
Invited Talk
)
SlidesLive Video 
Mihaela van der Schaar 🔗 
Sat 12:30 p.m.  12:50 p.m.

Jonathan Richens  Counterfactual reasoning is necessary for avoiding harm
(
Invited Talk
)
SlidesLive Video 
🔗 
Sat 12:50 p.m.  1:00 p.m.

Interventional and Counterfactual Inference with Diffusion Models
(
Contributed Talk
)
SlidesLive Video We consider the problem of answering observational, interventional, and counterfactual queries in a causally sufficient setting where only observational data and the causal graph are available. Utilizing the recent developments in diffusion models, we introduce diffusionbased causal models (\DCM) to learn causal mechanisms, that generate unique latent encodings. These encodings enable us to directly sample under interventions and perform abduction for counterfactuals. Diffusion models are a natural fit here, since they can encode each node to a latent representation that acts as a proxy for exogenous noise. Our empirical evaluations demonstrate significant improvements over existing stateoftheart methods for answering causal queries. Furthermore, we provide theoretical results that offer a methodology for analyzing counterfactual estimation in general encoderdecoder models, which could be useful in settings beyond our proposed approach. 
Patrick Chao · Patrick Bloebaum · Shiva Kasiviswanathan 🔗 
Sat 1:00 p.m.  1:30 p.m.

Coffee Break
(
Break
)

🔗 
Sat 1:30 p.m.  1:50 p.m.

Himabindu Lakkaraju  Regulating Explainable AI: Technical Challenges and Opportunities
(
Invited Talk
)
SlidesLive Video 
Hima Lakkaraju 🔗 
Sat 1:50 p.m.  2:40 p.m.

Counterfactual reasoning: From minds to machines to practical applications
(
Panel Discussion
)
SlidesLive Video 
🔗 
Sat 2:40 p.m.  2:50 p.m.

Causal Proxy Models for ConceptBased Model Explanations
(
Contributed Talk
)
SlidesLive Video Explainability methods for NLP systems encounter a version of the fundamental problem of causal inference: for a given groundtruth input text, we never truly observe the counterfactual texts necessary for isolating the causal effects of model representations on outputs. In response, many explainability methods make no use of counterfactual texts, assuming they will be unavailable. In this paper, we show that robust causal explainability methods can be created using approximate counterfactuals, which can be written by humans to approximate a specific counterfactual or simply sampled using metadataguided heuristics. The core of our proposal is the Causal Proxy Model (CPM). A CPM explains a blackbox model N because it is trained to have the same actual input/output behavior as N while creating neural representations that can be intervened upon to simulate the counterfactual input/output behavior of N. Furthermore, we show that the best CPM for N performs comparably to N in making factual predictions, which means that the CPM can simply replace N, leading to more explainable deployed models. 
Zhengxuan Wu · Karel D'Oosterlinck · Atticus Geiger · Amir Zur · Christopher Potts 🔗 
Sat 2:50 p.m.  3:00 p.m.

Counterfactual Explanations for Misclassified Images: How Human and Machine Explanations Differ
(
Contributed Talk
)
SlidesLive Video Counterfactual explanations have emerged as a popular solution for the eXplainable AI (XAI) problem of elucidating the predictions of blackbox deeplearning systems because people easily understand them, they apply across different problem domains and seem to be legally compliant. While 100+ counterfactual methods exist in the literature, few of these methods have actually been tested on users (∼7%). Even fewer studies adopt a usercentered perspective; for instance, asking people for their counterfactual explanations to determine their perspective on a “good explanation”. This gap in the literature is addressed here using a novel methodology that (i) gathers human generated counterfactual explanations for misclassified images, in two user studies and, then, (ii) compares these humangenerated explanations to computationallygenerated explanations for the same misclassifications. Results indicate that humans do not “minimally edit” images when generating counterfactual explanations. Instead, they make larger, “meaningful” edits that better approximate prototypes in the counterfactual class. An analysis based on “explanation goals” is proposed to account for this divergence between human and machine explanations. The implications of these proposals for future work are discussed. 
Eoin Delaney · Arjun Pakrashi · Derek Greene · Mark Keane 🔗 
Sat 3:00 p.m.  4:15 p.m.

Lunch Break
(
Break
)

🔗 
Sat 4:15 p.m.  5:00 p.m.

Poster Session #1
(
Poster Session
)

🔗 
Sat 5:00 p.m.  5:20 p.m.

Suchi Saria  TBD
(
Invited Talk
)

🔗 
Sat 5:20 p.m.  5:40 p.m.

Alison Gopnik  Counterfactuals, play and causal inference in young children and machines
(
Invited Talk
)
SlidesLive Video 
Alison Gopnik 🔗 
Sat 5:40 p.m.  5:50 p.m.

Natural Counterfactuals With Necessary Backtracking
(
Contributed Talk
)
SlidesLive Video Counterfactual reasoning, a cognitive ability possessed by humans, is being actively studied for incorporation into machine learning systems. In the causal modelling approach to counterfactuals, Judea Pearl's theory remains the most influential and dominant. However, being thoroughly nonbacktracking, the counterfactual probability distributions defined by Pearl can be hard to learn by nonparametric models, even when the causal structure is fully given. A big challenge is that nonbacktracking counterfactuals can easily step outside of the support of the training data, the inference of which becomes highly unreliable with the current machine learning models. To mitigate this issue, we propose an alternative theory of counterfactuals, namely, natural counterfactuals. This theory is concerned with counterfactuals within the support of the data distribution, and defines in a principled way a different kind of counterfactual that backtracks if (but only if) necessary. To demonstrate potential applications of the theory and illustrate the advantages of natural counterfactuals, we conduct a case study of counterfactual generation and discuss empirical observations that lend support to our approach. 
Guangyuan Hao · Jiji Zhang · Hao Wang · Kun Zhang 🔗 
Sat 5:50 p.m.  6:00 p.m.

Counterfactually Comparing Abstaining Classifiers
(
Contributed Talk
)
SlidesLive Video Abstaining classifiers have the option to abstain from making predictions on inputs that they are unsure about. These classifiers are becoming increasingly popular in highstake decisionmaking problems, as they can withhold uncertain predictions to improve their reliability and safety. When evaluating blackbox abstaining classifier(s), however, we lack a principled approach that accounts for what the classifier would have predicted on its abstentions. These missing predictions are crucial when, e.g., a radiologist is unsure of their diagnosis or when a driver is inattentive in a selfdriving car. In this paper, we introduce a novel approach and perspective to the problem of evaluating and comparing abstaining classifiers by treating abstentions as missing data. Our evaluation approach is centered around defining the counterfactual score of an abstaining classifier, defined as the expected performance of the classifier had it not been allowed to abstain. We specify the conditions under which the counterfactual score is identifiable: if the abstentions are stochastic, and if the evaluation data is independent of the training data (ensuring that the predictions are missing at random), then the score is identifiable. Note that, if abstentions are deterministic, then the score is unidentifiable because the classifier can perform arbitrarily poorly on its abstentions. Leveraging tools from observational causal inference, we then develop nonparametric and doubly robust methods to efficiently estimate this quantity under identification. Our approach is examined in both simulated and real data experiments. 
Yo Joong Choe · Aditya Gangrade · Aaditya Ramdas 🔗 
Sat 6:00 p.m.  6:15 p.m.

Coffee Break
(
Break
)

🔗 
Sat 6:15 p.m.  7:00 p.m.

Poster Session #2
(
Poster Session
)

🔗 
Sat 7:10 p.m.  7:30 p.m.

Thomas Icard  What's so special about counterfactuals?
(
Invited Talk
)
SlidesLive Video 
🔗 
Sat 7:30 p.m.  7:50 p.m.

Ruth Byrne  How People Reason about Counterfactual Explanations for Decisions by Artificial Intelligence Systems
(
Invited Talk
)
SlidesLive Video 
🔗 
Sat 7:50 p.m.  8:00 p.m.

Closing Remarks
(
Remarks
)

🔗 


Counterfactual Memorization in Neural Language Models
(
Poster
)
Modern neural language models that are widely used in various NLP tasks risk memorizing sensitive information from their training data.Understanding this memorization is important in real world applications and also from a learningtheoretical perspective. An open question in previous studies of language model memorization is how to filter out ``common'' memorization. In fact, most memorization criteria strongly correlate with the number of occurrences in the training set, capturing memorized familiar phrases, public knowledge, templated texts, or other repeated data.We formulate a notion of counterfactual memorization which characterizes how a model's predictions change if a particular document is omitted during training.We identify and study counterfactuallymemorized training examples in standard text datasets.We estimate the influence of each memorized training example on the validation set and on generated texts, showing how this can provide direct evidence of the source of memorization at test time. 
Chiyuan Zhang · Daphne Ippolito · Katherine Lee · Matthew Jagielski · Florian Tramer · Nicholas Carlini 🔗 


Semantic Meaningfulness: Evaluating Counterfactual Approaches for RealWorld Plausibility and Feasibility
(
Poster
)
Counterfactual explanations are rising in popularity when aiming to increase the explainability of machine learning models. One of the main challenges that remains is generating meaningful counterfactuals that are coherent with realworld relations. Multiple approaches incorporating realworld relations have been proposed in the past (e.g. by utilizing data distributions or structural causal models), but evaluating whether the explanations from different counterfactual approaches fulfill known causal relationships is still an open issue. To fill this gap, this work proposes two metrics  Semantic Meaningful Output (SMO) and Semantic Meaningful Relations (SMR)  to measure the ability of counterfactual generation approaches to depict realworld relations. In addition, we provide multiple datasets with known structural causal models and leverage them to benchmark the semantic meaningfulness of new and existing counterfactual approaches. Finally, we evaluate the semantic meaningfulness of nine wellestablished counterfactual explanation approaches and conclude that none of the noncausal approaches were able to create semantically meaningful counterfactuals consistently. 
Jacqueline Hoellig · Aniek Markus · Jef de Slegte · Prachi Bagave 🔗 


In the Eye of the Beholder: Robust Prediction with Causal User Modeling
(
Poster
)
Accurately predicting the relevance of items to users is crucial to the success of many social platforms. Conventional approaches train models on logged historical data; but recommendation systems, media services, and online marketplaces all exhibit a constant influx of new contentmaking relevancy a moving target, to which standard predictive models are not robust. In this paper, we propose a learning framework for relevance prediction that is robust to changes in the data distribution. Our key observation is that robustness can be obtained by accounting for \emph{how users causally perceive the environment}. We model users as boundedlyrational decision makers whose causal beliefs are encoded by a causal graph, and show how minimal information regarding the graph can be used to contend with distributional changes. Experiments in multiple settings demonstrate the effectiveness of our approach. 
Amir Feder · Nir Rosenfeld 🔗 


Timeuniform confidence bands for the CDF under nonstationarity
(
Poster
)
Estimation of the complete distribution of a random variable is a useful primitive for both manual and automated decision making. This problem has received extensive attention in the i.i.d. setting, but the arbitrary data dependent setting remains largely unaddressed. Consistent with known impossibility results, we present computationally felicitous timeuniform and valueuniform bounds on the CDF of the running averaged conditional distribution of a realvalued random variable which are always valid and sometimes trivial, along with an instancedependent convergence guarantee. The importanceweighted extension is appropriate for estimating complete counterfactual distributions of rewards given controlled experimentation data exhaust, e.g., from an A/B test or a contextual bandit. 
Paul Mineiro · Steve Howard 🔗 


Rethinking Counterfactual Explanations as Local and Regional Counterfactual Policies
(
Poster
)
Counterfactual Explanations (CE) face several unresolved challenges, such as ensuring stability, synthesizing multiple CEs, and providing plausibility and sparsity guarantees. From a more practical point of view, recent studies (Pawelczyket al., 2022) show that the prescribed counterfactual recourses are often not implemented exactly by individuals and demonstrate that most stateoftheart CE algorithms are very likely to fail in this noisy environment. To address these issues, we propose a probabilistic framework that gives a sparse local counterfactual rule for each observation, providing rules that give a range of values capable of changing decisions with high probability. These rules serve as a summary of diverse counterfactual explanations and yield robust recourses. We further aggregate these local rules into a regional counterfactual rule, identifying shared recourses for subgroups of the data. Our local and regional rules are derived from the Random Forest algorithm, which offers statistical guarantees and fidelity to data distribution by selecting recourses in highdensity regions. Moreover, our rules are sparse as we first select the smallest set of variables having a high probability of changing the decision. We have conducted experiments to validate the effectiveness of our counterfactual rules in comparison to standard CE and recent similar attempts. Our methods are available as a Python package. 
Salim I. Amoukou · Nicolas JB Brunel 🔗 


Counterfactual Generation with Identifiability Guarantees
(
Poster
)
Counterfactual generation requires the identification of the disentangled latent representations, such as content and style, that underlie the observed data. Existing unsupervised methods crucially rely on oversimplified assumptions, such as assuming independent content and style variables, to identify the latent variables, even though such assumptions may not hold for complex data distributions. This problem is exacerbated when data are sampled from multiple domains, as required by prior work, since the dependence between content and style may vary significantly over domains. In this work, we tackle the dependence between the content and the style variables inherent in the counterfactual generation task. We show identification guarantees by leveraging the relative sparsity of the influences from different latent variables. Our theoretical insights enable the development of a doMain AdapTive conTrollable text gEneration model, called MATTE. It achieves stateofart performance in unsupervised controllable text generation tasks on largescale datasets. 
Hanqi Yan · Lingjing Kong · Lin Gui · Yuejie Chi · Eric Xing · Yulan He · Kun Zhang 🔗 


Bayesian Predictive Synthetic Control Methods
(
Poster
)
We study Bayesian model synthesis for synthetic control methods (SCMs). SCMs have garnered significant attention as an indispensable tool for comparative case studies. The fundamental concept underlying SCMs involves the prediction of counterfactual outcomes for a treated unit by a weighted summation of observed outcomes from untreated units. In this study, we reinterpret the untreated outcomes as predictors for the treated outcomes and employ Bayesian predictive synthesis (BPS) to synthesize these forecasts. We refer to our novel approach as Bayesian Predictive SCM (BPSCM). The BPSCM represents a comprehensive, and foundational framework encompassing diverse statistical models, including dynamic linear models and mixture models, and generalizes SCMs significantly. Moreover, our proposal possesses the capability to synthesize a range of predictive models utilizing covariates, such as random forests. From a statistical decisionmaking perspective, our method can be interpreted as a Bayesian approach aimed at minimizing regrets in the prediction of counterfactual outcomes. Additionally, Bayesian approaches can effectively address challenges encountered in frequentist SCMs, such as statistical inference with finite sample sizes, timevarying parameters, and model misspecification. Through the utilization of simulation examples and empirical analysis, we substantiate the robustness of our proposed BPSCM. 
Akira Fukuda · Masahiro Kato · Kenichiro McAlinn · Kosaku Takanashi 🔗 


Causal Inference with Synthetic Control Methods by Density Matching under Implicit Endogeneitiy
(
Poster
)
Synthetic control methods (SCMs) have become a crucial tool for causal inference in comparative case studies. The fundamental idea of SCMs is to estimate counterfactual outcomes for a treated unit by using a weighted sum of observed outcomes from untreated units. The accuracy of the synthetic control (SC) is critical for estimating the causal effect, and therefore, the estimation of SC weights has been the focus of much research. In this paper, we first point out that existing SCMs suffer from an implicit endogeneity problem, which is the correlation between the outcomes of untreated units and the error term of the synthetic control, and yields a bias in the causal effect estimator. We then propose a novel SCM based on density matching, assuming that the density of outcomes of the treated unit can be approximated by a weighted average of the joint density of untreated units (i.e., a mixture model). Based on this assumption, we estimate SC weights by matching moments of treated outcomes and the weighted sum of moments of untreated outcomes. Our proposed method has three advantages over existing methods. First, our estimator is asymptotically unbiased under the assumption of the mixture model. Second, due to the asymptotic unbiasedness, we can reduce the mean squared error for counterfactual prediction. Third, our method generates full densities of the treatment effect, not only expected values, which broadens the applicability of SCMs. We provide experimental results to demonstrate the effectiveness of our proposed method. 
Masahiro Kato · Akari Ohda · Masaaki Imaizumi · Kenichiro McAlinn 🔗 


Why Don’t We Focus on Episodic Future Reasoning, Not Only Counterfactual?
(
Poster
)
Understanding cognitive processes in multiagent interactions is a primary goal in cognitive science. It can guide the direction of artificial intelligence (AI) research toward social decisionmaking in heterogeneous multiagent systems. In this paper, we introduce the episodic future thinking (EFT) mechanism of a reinforcement learning (RL) agent by benchmarking the cognitive process of animals. To achieve future thinking functionality, we first train a multicharacter policy that reflects heterogeneous characters with an ensemble of heterogeneous policies. An agent's character is defined as a different weight combination on reward components, thus explaining the agent's behavioral preference. The future thinking agent collects observationaction trajectories of the target agents and uses the pretrained multicharacter policy to infer their characters. Once the character is inferred, the agent predicts the upcoming actions of the targets and simulates the future. This capability allows the agent to adaptively select the optimal action, considering the upcoming behavior of others in multiagent scenarios. To evaluate the proposed mechanism, we consider the multiagent autonomous driving scenario in which autonomous vehicles with different driving traits are on the road. Simulation results demonstrate that the EFT mechanism with accurate character inference leads to a higher reward than existing multiagent solutions. We also confirm that the effect of reward improvement remains valid across societies with different levels of character diversity. 
Dongsu Lee · Minhae Kwon 🔗 


Identification of Nonlinear Latent Hierarchical Causal Models
(
Poster
)
Counterfactual reasoning entails the identification of causal models. However, the task of identifying latent variables and causal structures from observational data can be highly challenging, especially when observed variables are generated by causally related latent variables with nonlinear functions. In this work, we investigate the identification problem for nonlinear latent hierarchical models in which observed variables are generated by causally related latent variables, and some latent variables may not have observed children. We show that the identifiability of both causal structure and latent variables can be achieved under mild assumptions: on causal structures, we allow for multiple paths between any pair of variables, which relaxes latent tree assumptions in prior work; on structural functions, we do not make parametric assumptions, thus permitting general nonlinearity and multidimensional continuous variables. Specifically, we first develop a basic identification criterion in the form of novel identifiability guarantees for an elementary latent variable model. Leveraging this criterion, we show that both causal structures and latent variables of the hierarchical model can be identified asymptotically by explicitly constructing an estimation procedure. To the best of our knowledge, our work is the first to establish identifiability guarantees for both causal structures and latent variables in nonlinear latent hierarchical models. 
Lingjing Kong · Biwei Huang · Feng Xie · Eric Xing · Yuejie Chi · Kun Zhang 🔗 


Counterfactuals for the Future
(
Poster
)
Counterfactuals are often described as 'retrospective,' focusing on hypothetical alternatives to a realized past. This description relates to an often implicit assumption about the structure and stability of exogenous variables in the system being modeled  an assumption that is reasonable in many settings where counterfactuals are used. In this work, we consider cases where we might reasonably make a different assumption about exogenous variables, namely, that the exogenous noise terms of each unit do exhibit some unitspecific structure and/or stability. This leads us to a different use of counterfactuals  a 'forwardlooking' rather than 'retrospective' counterfactual. We introduce "counterfactual treatment choice," a type of treatment choice problem that motivates using forwardlooking counterfactuals. We then explore how mismatches between interventional versus forwardlooking counterfactual approaches to treatment choice, consistent with different assumptions about exogenous noise, can lead to counterintuitive results. 
Lucius Bynum · Joshua Loftus · Julia Stoyanovich 🔗 


Causal Dependence Plots
(
Poster
)
Explaining artificial intelligence is increasingly important. To use such datadriven systems wisely we must understand how they interact with the world, including how they depend causally on data inputs. In this work we develop Causal Dependence Plots (CDPs) to visualize how one variablean outcomedepends on changes in another variablea predictor\emph{along with any consequent causal changes in other predictor variables}. Crucially, CDPs differ from standard methods based on holding other predictors constant or assuming they are independent. CDPs make use of an auxiliary causal model because causal conclusions require causal assumptions. With simulations and real data experiments, we show CDPs can be combined in a modular way with methods for causal learning or sensitivity analysis. Since people often think causally about inputoutput dependence, CDPs can be powerful tools in the xAI or interpretable machine learning toolkit and contribute to applications like scientific machine learning and algorithmic fairness. 
Joshua Loftus · Lucius Bynum · Sakina Hansen 🔗 


Finding Counterfactually Optimal Action Sequences in Continuous State Spaces
(
Poster
)
Humans performing tasks that involve taking a series of multiple dependent actions over time often learn from experience by reflecting on specific cases and points in time, where different actions could have led to significantly better outcomes. While recent machine learning methods to retrospectively analyze sequential decision making processes promise to aid decision makers in identifying such cases, they have focused on environments with finitely many discrete states. However, in many practical applications, the state of the environment is inherently continuous in nature. In this paper, we aim to fill this gap. We start by formally characterizing a sequence of discrete actions and continuous states using finite horizon Markov decision processes and a broad class of bijective structural causal models. Building upon this characterization, we formalize the problem of finding counterfactually optimal action sequences and show that, in general, we cannot expect to solve it in polynomial time. Then, we develop a search method based on the A∗ algorithm that, under a natural form of Lipschitz continuity of the environment’s dynamics, is guaranteed to return the optimal solution to the problem. Experiments on real clinical data show that our method is very efficient in practice, and it has the potential to offer interesting insights for sequential decision making tasks. 
Stratis Tsirtsis · Manuel GomezRodriguez 🔗 


Empowering Counterfactual Reasoning for Graph Neural Networks via Inductivity
(
Poster
)
Graph neural networks (GNNs) have various practical applications, such as drug discovery, recommendation engines, and chip design. However, GNNs lack transparency as they cannot provide understandable explanations for their predictions. To address this issue, counterfactual reasoning is used. The main goal is to make minimal changes to the input graph of a GNN in order to alter its prediction. While several algorithms have been proposed for counterfactual explanations of GNNs, most of them have two main drawbacks. Firstly,they only consider edge deletions as perturbations. Secondly, the counterfactual explanation models are transductive, meaning they do not generalize to unseen data. In this study, we introduce an inductive algorithm called INDUCE, which overcomes these limitations. By conducting extensive experiments on several datasets, we demonstrate that incorporating edge additions leads to better counterfactual results compared to the existing methods. Moreover, the inductive modeling approach allows INDUCE to directly predict counterfactual perturbations without requiring instancespecific training. This results in significant computational speed improvements compared to baseline methods and enables scalable counterfactual analysis for GNNs. 
Samidha Verma · Burouj Armgaan · Sourav Medya · Sayan Ranu 🔗 


Counterfactual Fairness Without Modularity
(
Poster
)
In this work we demonstrate different variations of counterfactual fairness that avoid one of the central sociological and normative tensions in counterfactualbased fairness notions: modularity. Building on recent developments in causal modeling formalisms, we introduce \emph{backtracking counterfactual fairness}, a novel definition of counterfactual fairness that avoids the modularity assumption by using backtracking rather than interventional counterfactuals. We also propose an alternate modeling strategy via causal relational modeling that instead provides a solution at the database schemalevel: choosing to represent variables that violate the modularity assumption as database \emph{entities} rather than individual attributes. Both of these proposals allow for the consideration of counterfactualbased fairness notions even in the presence of nonmodular variables. 
Lucius Bynum · Joshua Loftus · Julia Stoyanovich 🔗 


Observational Counterfactual Explanations in Sequential Decision Making
(
Poster
)
Human decisionmaking is plagued by a considerable amount of noise, as evidenced by the research of (Kahneman et al., 2021), which revealed thatmedical doctors in the same hospital often arrive at different treatment decisions when presented with identical patient cases. In this work, we utilize observational (backtracking) counterfactual explanations to support the diagnostic processin sequential decisionmaking. In particular, we explore how this approach can be effectively employed when decisions are influenced solely by external factors like the decision maker’s fatigue levels, or personal biases. We aim to strengthen trust in the decisionmaking process by exploring these counterfactual explanations. 
Abdirisak Mohamed 🔗 


Advancing Counterfactual Inference through Quantile Regression
(
Poster
)
The capacity to address counterfactual "what if" inquiries is crucial for understanding and making use of causal influences. Traditional counterfactual inference usually assumes a structural causal model is available. However, in practice, such a causal model is often unknown and may not be identifiable. This paper aims to perform reliable counterfactual inference based on the (learned) qualitative causal structure and observational data, without a given causal model or even directly estimating conditional distributions. We recast counterfactual reasoning as an extended quantile regression problem using neural networks. The approach is statistically more efficient than existing ones, and further makes it possible to develop the generalization ability of the estimated counterfactual outcome to unseen data and provide an upper bound on the generalization error. Experiment results on multiple datasets strongly support our theoretical claims. 
Shaoan Xie · Biwei Huang · Bin Gu · Tongliang Liu · Kun Zhang 🔗 


Closedloop Reasoning about Counterfactuals to Improve Policy Transparency
(
Poster
)
Explanations are a powerful way of increasing the transparency of complex policies. Such explanations must not only be informative regarding the policy in question, but must also be tailored to the human explainee. In particular, it is critical to consider the explainee's current beliefs and the counterfactuals (i.e. alternate outcomes) with which they will likely interpret any given explanation. E.g., the explainee will be inclined to wonder ``why did event P happen instead of counterfactual Q?'' In this vein, we first model human beliefs using a particle filter to consider the counterfactuals the human will likely use to interpret a potential explanation, which in turn helps select an explanation that is highly informative. Second, we design a closedloop explanation framework, inspired by the education literature, that continuously updates the particle filter not only based on the explanations provided but also based on feedback from the human regarding their understanding. Finally, we present a user study design for testing the iterative modeling of a human's likely counterfactuals in conveying effective explanations. 
Michael S Lee · Henny Admoni · Reid Simmons 🔗 


Delphic Offline Reinforcement Learning under Nonidentifiable Hidden Confounding
(
Poster
)
A prominent challenge of offline reinforcement learning (RL) is the issue of hidden confounding: unobserved variables may influence both the actions taken by the agent and the observed outcomes. Hidden confounding can compromise the validity of any causal conclusion drawn from data and presents a major obstacle to effective offline RL. In the present paper, we tackle the problem of hidden confounding in the nonidentifiable setting. We propose a definition of uncertainty due to hidden confounding bias, termed delphic uncertainty, which measures variation over counterfactual predictions compatible with the observations, and differentiate it from the wellknown epistemic and aleatoric uncertainties. We derive a practical method for estimating the three types of uncertainties, and construct a pessimistic offline RL algorithm to account for them. Our method does not assume identifiability of the unobserved confounders, and attempts to reduce the amount of confounding bias. We demonstrate through extensive experiments and ablations the efficacy of our approach on a sepsis management benchmark, as well as on electronic health records. Our results suggest that nonidentifiable hidden confounding bias can be mitigated to improve offline RL solutions in practice. 
Alizée Pace · Hugo Yèche · Bernhard Schölkopf · Gunnar Ratsch · Guy Tennenholtz 🔗 


Adaptive Principal Component Regression with Applications to Panel Data
(
Poster
)
Principal component regression (PCR) is a popular technique for fixeddesign errorinvariables regression, a generalization of the linear regression setting in which the observed covariates are corrupted with random noise. We provide the first timeuniform finite sample guarantees for online (regularized) PCR whenever data is collected adaptively. Since the proof techniques for PCR in the fixed design setting do not readily extend to the online setting, our results rely on adapting tools from modern martingale concentration to the errorinvariables setting. As an application of our bounds, we provide a framework for counterfactual estimation of unitspecific treatment effects in panel data settings when interventions are assigned adaptively. Our framework may be thought of as a generalization of the synthetic interventions framework where data is collected via an adaptive intervention assignment policy. 
Anish Agarwal · Keegan Harris · Justin Whitehouse · Steven Wu 🔗 


Strategyproof DecisionMaking in Panel Data Settings and Beyond
(
Poster
)
We consider the classical problem of decisionmaking using panel data, in which a decisionmaker gets noisy, repeated measurements of multiple units (or agents). We consider a setup where there is a preintervention period, when the principal observes the outcomes of each unit, after which the principal uses these observations to assign a treatment to each unit. Unlike this classical setting, we permit the units generating the panel data to be strategic, i.e. units may modify their preintervention outcomes in order to receive a more desirable intervention. The principal's goal is to design a strategyproof intervention policy, i.e. a policy that assigns units to their correct interventions despite their potential strategizing. We first identify a necessary and sufficient condition under which a strategyproof intervention policy exists, and provide a strategyproof mechanism with a simple closed form when one does exist. Along the way, we prove impossibility results for strategic multiclass classification, which may be of independent interest. When there are two interventions, we establish that there always exists a strategyproof mechanism, and provide an algorithm for learning such a mechanism. For three or more interventions, we provide an algorithm for learning a strategyproof mechanism if there exists a sufficiently large gap in the principal's rewards between different interventions. Finally, we empirically evaluate our model using realworld panel data collected from product sales over 18 months. We find that our methods compare favorably to baselines which do not take strategic interactions into consideration, even in the presence of model misspecification. 
Keegan Harris · Anish Agarwal · Chara Podimata · Steven Wu 🔗 


Counterfactuals for Subjective Wellbeing Panel Data: Integrated Application of Statistical Ensemble and Machine Learning Methods
(
Poster
)
We apply an integrated framework combining differenceindifferences and synthetic controls ensemble with causal forest to the UK Household Longitudinal Survey. Household relocations are leveraged as natural experiments to quantify the causal relationship between the built environment and subjective wellbeing. We demonstrate the complementarity and interoperability of canonical statistics and novel machine learning methods. 
Jerry Chen · Li Wan 🔗 


Learning Linear Causal Representations from Interventions under General Nonlinear Mixing
(
Poster
)
We study the problem of learning causal representations from unknown, latent interventions in a general setting, where the latent distribution is Gaussian but the mixing function is completely general. We prove strong identifiability results given unknown singlenode interventions, i.e., without having access to the intervention targets. This generalizes prior works which have focused on weaker classes, such as linear maps or paired counterfactual data. This is also the first instance of causal identifiability from nonpaired interventions for deep neural network embeddings. Our proof relies on carefully uncovering the highdimensional geometric structure present in the data distribution after a nonlinear density transformation, which we capture by analyzing quadratic forms of precision matrices of the latent distributions. Finally, we propose a contrastive algorithm to identify the latent variables in practice and evaluate its performance on various tasks. 
Simon Buchholz · Goutham Rajendran · Elan Rosenfeld · Bryon Aragam · Bernhard Schölkopf · Pradeep Ravikumar 🔗 


Counterfactual Explanation Policies in RL
(
Poster
)
As Reinforcement Learning (RL) agents are increasingly employed in diverse decisionmaking problems using reward preferences, it becomes important to ensure that policies learned by these frameworks in mapping observations to a probability distribution of the possible actions are explainable. However, there is little to no work in the systematic understanding of these complex policies in a contrastive manner, i.e., what minimal changes to the policy would improve/worsen its performance to a desired level. In this work, we present COUNTERPOL, the first framework to analyze RL policies using counterfactual explanations in the form of minimal changes to the policy that lead to the desired outcome. We do so by incorporating counterfactuals in supervised learning in RL with the target outcome regulated using desired return. We establish a theoretical connection between COUNTERPOL and widely used trust regionbased policy optimization methods in RL. Extensive empirical analysis shows the efficacy of COUNTERPOL in generating explanations for (un)learning skills while keeping close to the original policy. Our results on five different RL environments with diverse state and action spaces demonstrate the utility of counterfactual explanations, paving the way for new frontiers in designing and developing counterfactual policies. 
Shripad Deshmukh · Srivatsan R · Supriti Vijay · Jayakumar Subramanian · Chirag Agarwal 🔗 


Leveraging Contextual Counterfactuals Toward Belief Calibration
(
Poster
)
Beliefs and values are increasingly being incorporated into our AI systems through alignment processes, such as carefully curating data collection principles or regularizing the loss function used for training. However, the metaalignment problem is that these human beliefs are diverse and not aligned across populations; furthermore, the implicit strength of each belief may not be well calibrated even among humans, especially when trying to generalize across contexts. Specifically, in high regret situations, we observe that contextual counterfactuals and recourse costs are particularly important in updating a decision maker's beliefs and the strength to which such beliefs are held. Therefore, we argue that including counterfactuals is key to an accurate calibration of beliefs during alignment. To do this, we first segment belief diversity into two categories: subjectivity (across individual within a population) and epistemic uncertainty (within an individual across different contexts). By leveraging our notion of epistemic uncertainty, we introduce `the belief calibration cycle' framework to more holistically calibrate this diversity of beliefs with contextdriven counterfactual reasoning by using a multiobjective optimization. We empirically apply our framework for finding a Pareto frontier of clustered optimal belief strengths that generalize across different contexts, demonstrating its efficacy on a toy dataset for credit decisions. 
Richard Zhang · Mike Lee · Sherol Chen 🔗 


Extending counterfactual reasoning models to capture unconstrained social explanations
(
Poster
)
Human explanations are thought to be shaped by counterfactual reasoning but formal accounts of this ability are limited to simple scenarios and fixed response options. In naturalistic or social settings, human explanations are often more creative, involving imputation of hidden causal factors in addition to selection among established causes. Across two experiments, we extend a counterfactual account of explanation to capture how people generate free explanations for an agent’s behaviour across a set of scenarios. To do this, we have one group of participants (N=95) make predictions about scenarios that combine short biographies with potential trajectories through a gridworld, using this to crowdsource a causalmodel of the overall scenario. A separate set of participants (N=49) then reacted to particular outcomes, providing freetext explanations for why the agent moved the way they did. Our final model captures how these free explanations depend onthe general situation and specific outcome but also how participants’ explanatory strategy is shaped by how surprising or incongruent the behaviour is. Consistent with past work, we find people reason with counterfactuals that stay relatively close to what actually happens, but beyond this, we model how their tendency to impute unobserved factors depends on the degree to which the explanandum is surprising. 
Stephanie Droop · Neil Bramley 🔗 


Budgeting Counterfactual for Offline RL
(
Poster
)
The main challenge of offline reinforcement learning, where data is limited, arises from a sequence of counterfactual reasoning dilemmas within the realm of potential actions: What if we were to choose a different course of action? These circumstances frequently give rise to extrapolation errors, which tend to accumulate exponentially with the problem horizon. Hence, it becomes crucial to acknowledge that not all decision steps are equally important to the final outcome, and to budget the number of counterfactual decisions a policy make in order to control the extrapolation. Contrary to existing approaches that use regularization on either the policy or value function, we propose an approach to explicitly bound the amount of outofdistribution actions during training. Specifically, our method utilizes dynamic programming to decide where to extrapolate and where not to, with an upper bound on the decisions different from behavior policy. It balances between the potential for improvement from taking outofdistribution actions and the risk of making errors due to extrapolation. Theoretically, we justify our method by the constrained optimality of the fixed point solution to our $Q$ updating rules. Empirically, we show that the overall performance of our method is better than the stateoftheart offline RL methods on tasks in the widelyused D4RL benchmarks.

Yao Liu · Pratik Chaudhari · Rasool Fakoor 🔗 


Counterfactual Learning to Rank via Knowledge Distillation
(
Poster
)
Knowledge distillation is a transfer learning technique to improve the performance of a student model trained on a Distilled Empirical Risk, formed via a label distribution defined by some teacher model, which is typically trained on the same task and belongs to a hypothesis class with richer representational capacity.In this work, we study knowledge distillation in the context of counterfactual Learning To Rank(LTR) from implicit user feedback.We consider a generic partial information search ranking scenario, where the relevancy of the items in the logged search context is observed only in the event of an explicit user engagement.The premise of using knowledge distillation in this counterfactual setup is to leverage teacher's distilled knowledge in the form of soft predicted relevance labels to help the student with more effective listwise comparisons, variance reduction, and improved generalization behavior. We build empirical risk estimates that rely not only on the debiased observed user feedback via standard Inverse Propensity Weighting, but also on the teacher's distilled knowledge via potential outcome modeling.Our distillationbased counterfactual LTR framework offers a new perspective on how explanatory click models, trained for a click prediction task with privileged encoding of the confounding search context, can explain away the effect of presentationrelated confounding for the student model that is trained for a ranking task.We analyze the generalization performance of the proposed empirical risk estimators from a theoretical perspective by establishing bounds on their estimation error. We also conduct rigorous counterfactual offline evaluations as well as online controlled randomized experiments for a search ranking task in a major Ecommerce platform. We report strong empirical results that the distilled knowledge from a teacher trained on expert judgments can significantly improve the generalization performance of the student ranker. 
Ehsan Ebrahimzadeh · Alex Cozzi · Abraham Bagherjeiran 🔗 


Unveiling the Betrayal of Counterfactual Explanations within Recommender Systems
(
Poster
)
Deep learningbased recommender systems have become an integral part of several online platforms. However, their blackbox nature emphasizes the need for explainable artificial intelligence (XAI) approaches to provide humanunderstandable reasons why a specific item gets recommended to a given user.One such method is counterfactual explanation(CF). While CFs can be highly beneficial for users and system designers, malicious actors may also exploit these explanations to undermine the system's security.In this work, we propose HCARS, a novel strategy to poison recommender systems via CFs. Specifically, we first train a logicalreasoningbased surrogate model on training data derived from counterfactual explanations. By reversing the learning process of the recommendation model, we thus develop a proficient greedy algorithm to generate fabricated user profiles and their associated interaction records for the aforementioned surrogate model.Our experiments, which employ a wellknown CF generation method and are conducted on two distinct datasets, show that HCARS yields significant and successful attack performance. 
Ziheng Chen · Jin Huang · Ping Chang Lee · Fabrizio Silvestri · Hongshik Ahn · Jia Wang · Yongfeng Zhang · Gabriele Tolomei 🔗 


ForwardINF : Efficient Data Influence Estimation with Dualitybased Counterfactual Analysis
(
Poster
)
Largescale blackbox models have become ubiquitous across numerous applications. Understanding the influence of individual training samples on predictions made by these models is crucial for improving their trustworthiness. The backbone of current influence estimation techniques involves computing gradients for every training point or repeated training on different subsets. These approaches face obvious computational challenges when scaled up to large datasets and models.In this work, we introduce a principled approach to address the computational challenge in data influence estimation. Our approach is empowered by a novel insight into the duality of data influence. Specifically, we discover that the problem of training data influence estimation has an equivalent counterfactual dual problem — how would the prediction on training samples change if the model was trained on a specific test sample? Surprisingly, solving the dual yields results equivalent to the original problem. Further, we demonstrate that highly efficient methods exist for this dual problem, which entails only a forward pass of the neural network for each training point.We demonstrate the utility of our approach across various applications, including data leakage detection, memorization, and language model behavior tracing. 
Myeongseob Ko · Feiyang Kang · Weiyan Shi · Ming Jin · Zhou Yu · Ruoxi Jia 🔗 


Inverse Transition Learning for Characterizing NearOptimal Dynamics in Offline Reinforcement Learning
(
Poster
)
Offline Reinforcement learning is commonly used for sequential decisionmaking in domains such as healthcare, where the rewards are known and the dynamics must be estimated on the basis of a single batch data. A key challenge for all tasks is how to learn a reliable estimate of the dynamics that produce nearoptimal policies that are safe to deploy in highstake settings. We propose a new constraintbased approach that captures our desiderata for reliably learning a set of dynamics that is free from gradients. Our results demonstrate that by using our constraints to learn an estimate of model dynamics, we learn nearoptimal policies, while considerably reducing the policy's variance. We also show how combining uncertainty estimation with these constraints can help us infer a ranking of actions that produce higher returns, thereby enabling more interpretable performant policies for planning overall. 
Leo Benac · Sonali Parbhoo · Finale DoshiVelez 🔗 


Leveraging Factored Action Spaces for OffPolicy Evaluation
(
Poster
)
In highstakes decisionmaking domains such as healthcare and selfdriving cars, offpolicy evaluation (OPE) can help practitioners understand the performance of a new policy before deployment by using observational data. However, when dealing with problems involving large and combinatorial action spaces, existing OPE estimators often suffer from substantial bias and/or variance. In this work, we investigate the role of factored action spaces in improving OPE. Specifically, we propose and study a new family of decomposed IS estimators that leverage the inherent factorisation structure of actions. We theoretically prove that our proposed estimator achieves lower variance and remains unbiased, subject to certain assumptions regarding the underlying problem structure. Empirically, we demonstrate that our estimator outperforms standard IS in terms of mean squared error and conduct sensitivity analyses probing the validity of various assumptions. Future work should investigate how to design or derive the factorisation for practical problems so as to maximally adhere to the theoretical assumptions. 
Aaman Rebello · Shengpu Tang · Jenna Wiens · Sonali Parbhoo 🔗 


NeuroSymbolic Models of Human Moral Judgment: LLMs as Automatic Feature Extractors
(
Poster
)
As AI systems gain prominence in society, concerns about their safety become crucial to address. There have been repeated calls to align powerful AI systems with human morality. However, attempts to do this have used blackbox systems that cannot be interpreted or explained. In response, we introduce a methodology leveraging the natural language processing abilities of large language models (LLMs) and the interpretability of symbolic models to form competitive neurosymbolic models for predicting human moral judgment. Our method involves using LLMs to extract morallyrelevant features from a stimulus and then passing those features through a cognitive model that predicts human moral judgment. This approach achieves stateoftheart performance on the MoralExceptQA benchmark, improving on the previous F1 score by 20 points and accuracy by 18 points, while also enhancing model interpretability by baring all key features in the model's computation. We also run an experiment verifying that the features identified as important by the LLM are actually important to the LLM's computation, by creating counterfactual scenarios in which the feature values are varied, and asking the LLM for zeroshot moral acceptability judgments. We propose future directions for harnessing LLMs to develop more capable and interpretable neurosymbolic models, emphasizing the critical role of interpretability in facilitating the safe integration of AI systems into society. 
joseph kwon · Sydney Levine · Josh Tenenbaum 🔗 


Navigating Explanatory Multiverse Through Counterfactual Path Geometry
(
Poster
)
Counterfactual explanations are the de facto standard when attempting to interpret the decisions of opaque predictive models. Their generation is often subject to algorithmic and domainspecific constraints  such as densitybased feasibility for the former, and attribute (im)mutability or directionality of change for the latter  that aim to maximise their reallife practicality. In addition to desiderata with respect to the counterfactual instance itself, the existence of a viable path connecting it with the factual data point, known as algorithmic recourse, has become an important technical consideration. While both of these requirements ensure that the steps of the journey as well as its destination are admissible, current literature does not deal with the multiplicity of such counterfactual paths. To address this shortcoming we introduce the novel concept of explanatory multiverse that encompasses all the possible counterfactual journeys, and show how to navigate, reason about and compare the geometry of these paths  their affinity, branching, divergence and possible future convergence  with two methods: vector spaces and graphs. Implementing this (interactive) explanatory process grants explainees more agency by allowing them to select counterfactuals based on the properties of the journey leading to them in addition to their absolute differences. 
Edward Small · Yueqing Xuan · Kacper Sokol 🔗 


Causal Proxy Models for ConceptBased Model Explanations
(
Poster
)
Explainability methods for NLP systems encounter a version of the fundamental problem of causal inference: for a given groundtruth input text, we never truly observe the counterfactual texts necessary for isolating the causal effects of model representations on outputs. In response, many explainability methods make no use of counterfactual texts, assuming they will be unavailable. In this paper, we show that robust causal explainability methods can be created using approximate counterfactuals, which can be written by humans to approximate a specific counterfactual or simply sampled using metadataguided heuristics. The core of our proposal is the Causal Proxy Model (CPM). A CPM explains a blackbox model N because it is trained to have the same actual input/output behavior as N while creating neural representations that can be intervened upon to simulate the counterfactual input/output behavior of N. Furthermore, we show that the best CPM for N performs comparably to N in making factual predictions, which means that the CPM can simply replace N, leading to more explainable deployed models. 
Zhengxuan Wu · Karel D'Oosterlinck · Atticus Geiger · Amir Zur · Christopher Potts 🔗 


Interventional and Counterfactual Inference with Diffusion Models
(
Poster
)
We consider the problem of answering observational, interventional, and counterfactual queries in a causally sufficient setting where only observational data and the causal graph are available. Utilizing the recent developments in diffusion models, we introduce diffusionbased causal models (\DCM) to learn causal mechanisms, that generate unique latent encodings. These encodings enable us to directly sample under interventions and perform abduction for counterfactuals. Diffusion models are a natural fit here, since they can encode each node to a latent representation that acts as a proxy for exogenous noise. Our empirical evaluations demonstrate significant improvements over existing stateoftheart methods for answering causal queries. Furthermore, we provide theoretical results that offer a methodology for analyzing counterfactual estimation in general encoderdecoder models, which could be useful in settings beyond our proposed approach. 
Patrick Chao · Patrick Bloebaum · Shiva Kasiviswanathan 🔗 


Counterfactual Explanations for Misclassified Images: How Human and Machine Explanations Differ
(
Poster
)
Counterfactual explanations have emerged as a popular solution for the eXplainable AI (XAI) problem of elucidating the predictions of blackbox deeplearning systems because people easily understand them, they apply across different problem domains and seem to be legally compliant. While 100+ counterfactual methods exist in the literature, few of these methods have actually been tested on users (∼7%). Even fewer studies adopt a usercentered perspective; for instance, asking people for their counterfactual explanations to determine their perspective on a “good explanation”. This gap in the literature is addressed here using a novel methodology that (i) gathers human generated counterfactual explanations for misclassified images, in two user studies and, then, (ii) compares these humangenerated explanations to computationallygenerated explanations for the same misclassifications. Results indicate that humans do not “minimally edit” images when generating counterfactual explanations. Instead, they make larger, “meaningful” edits that better approximate prototypes in the counterfactual class. An analysis based on “explanation goals” is proposed to account for this divergence between human and machine explanations. The implications of these proposals for future work are discussed. 
Eoin Delaney · Arjun Pakrashi · Derek Greene · Mark Keane 🔗 


Natural Counterfactuals With Necessary Backtracking
(
Poster
)
Counterfactual reasoning, a cognitive ability possessed by humans, is being actively studied for incorporation into machine learning systems. In the causal modelling approach to counterfactuals, Judea Pearl's theory remains the most influential and dominant. However, being thoroughly nonbacktracking, the counterfactual probability distributions defined by Pearl can be hard to learn by nonparametric models, even when the causal structure is fully given. A big challenge is that nonbacktracking counterfactuals can easily step outside of the support of the training data, the inference of which becomes highly unreliable with the current machine learning models. To mitigate this issue, we propose an alternative theory of counterfactuals, namely, natural counterfactuals. This theory is concerned with counterfactuals within the support of the data distribution, and defines in a principled way a different kind of counterfactual that backtracks if (but only if) necessary. To demonstrate potential applications of the theory and illustrate the advantages of natural counterfactuals, we conduct a case study of counterfactual generation and discuss empirical observations that lend support to our approach. 
Guangyuan Hao · Jiji Zhang · Hao Wang · Kun Zhang 🔗 


Counterfactually Comparing Abstaining Classifiers
(
Poster
)
Abstaining classifiers have the option to abstain from making predictions on inputs that they are unsure about. These classifiers are becoming increasingly popular in highstake decisionmaking problems, as they can withhold uncertain predictions to improve their reliability and safety. When evaluating blackbox abstaining classifier(s), however, we lack a principled approach that accounts for what the classifier would have predicted on its abstentions. These missing predictions are crucial when, e.g., a radiologist is unsure of their diagnosis or when a driver is inattentive in a selfdriving car. In this paper, we introduce a novel approach and perspective to the problem of evaluating and comparing abstaining classifiers by treating abstentions as missing data. Our evaluation approach is centered around defining the counterfactual score of an abstaining classifier, defined as the expected performance of the classifier had it not been allowed to abstain. We specify the conditions under which the counterfactual score is identifiable: if the abstentions are stochastic, and if the evaluation data is independent of the training data (ensuring that the predictions are missing at random), then the score is identifiable. Note that, if abstentions are deterministic, then the score is unidentifiable because the classifier can perform arbitrarily poorly on its abstentions. Leveraging tools from observational causal inference, we then develop nonparametric and doubly robust methods to efficiently estimate this quantity under identification. Our approach is examined in both simulated and real data experiments. 
Yo Joong Choe · Aditya Gangrade · Aaditya Ramdas 🔗 