Timezone: »

 
On formalizing causal off-policy sequential decision-making
Sonali Parbhoo · Shalmali Joshi · Finale Doshi-Velez

Assessing the effects of deploying a policy based on retrospective data collected from a different policy is a common problem across several high-stake decision making domains. A number of off-policy evaluation (OPE) techniques have been proposed for this purpose with different bias-variance tradeoffs. However, these methods largely formulate OPE as a problem disassociated from the process used to generate the data. Posing OPE instead as a causal estimand has strong implications ranging from our fundamental understanding of the complexity of the OPE problem to which methods we apply in practice, and can help highlight gaps in existing literature in terms of the overall objective of OPE. Many formalisms of OPE additionally overlook the role of uncertainty entirely in the estimation process, which can significantly bias the estimation of counterfactuals and produce large errors in OPE as a result. Finally, depending on how we formalise OPE, human expertise can be particularly helpful in assessing the validity of OPE estimates or improving estimation from a finite number of samples to achieve certain efficiency guarantees. In this position paper, we discuss each of these issues in terms of the role they play on OPE. Importantly, each of these aspects may be viewed as a means of assessing the validity of various other common assumptions made in causal inference.

Author Information

Sonali Parbhoo (Harvard University)
Shalmali Joshi (Harvard University (SEAS))
Finale Doshi-Velez (Harvard University)

More from the Same Authors