Timezone: »
We introduce an off-policy evaluation procedure for highlighting episodes where applying a reinforcement learned (RL) policy is likely to have produced a substantially different outcome than the observed policy. In particular, we introduce a class of structural causal models (SCMs) for generating counterfactual trajectories in finite partially observable Markov Decision Processes (POMDPs). We see this as a useful procedure for off-policy ``debugging'' in high-risk settings (e.g., healthcare); by decomposing the expected difference in reward between the RL and observed policy into specific episodes, we can identify episodes where the counterfactual difference in reward is most dramatic. This in turn can be used to facilitate review of specific episodes by domain experts. We demonstrate the utility of this procedure with a synthetic environment of sepsis management.
Author Information
Michael Oberst (MIT)
David Sontag (Massachusetts Institute of Technology)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Oral: Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models »
Wed. Jun 12th 09:20 -- 09:25 PM Room Grand Ballroom
More from the Same Authors
-
2022 : Evaluating Robustness to Dataset Shift via Parametric Robustness Sets »
Michael Oberst · Nikolaj Thams · David Sontag -
2022 : Evaluating Robustness to Dataset Shift via Parametric Robustness Sets »
Nikolaj Thams · Michael Oberst · David Sontag -
2022 Poster: Sample Efficient Learning of Predictors that Complement Humans »
Mohammad-Amin Charusaie · Hussein Mozannar · David Sontag · Samira Samadi -
2022 Poster: Co-training Improves Prompt-based Learning for Large Language Models »
Hunter Lang · Monica Agrawal · Yoon Kim · David Sontag -
2022 Spotlight: Sample Efficient Learning of Predictors that Complement Humans »
Mohammad-Amin Charusaie · Hussein Mozannar · David Sontag · Samira Samadi -
2022 Spotlight: Co-training Improves Prompt-based Learning for Large Language Models »
Hunter Lang · Monica Agrawal · Yoon Kim · David Sontag -
2021 Poster: Neural Pharmacodynamic State Space Modeling »
Zeshan Hussain · Rahul G. Krishnan · David Sontag -
2021 Poster: Regularizing towards Causal Invariance: Linear Models with Proxies »
Michael Oberst · Nikolaj Thams · Jonas Peters · David Sontag -
2021 Poster: Graph Cuts Always Find a Global Optimum for Potts Models (With a Catch) »
Hunter Lang · David Sontag · Aravindan Vijayaraghavan -
2021 Spotlight: Regularizing towards Causal Invariance: Linear Models with Proxies »
Michael Oberst · Nikolaj Thams · Jonas Peters · David Sontag -
2021 Oral: Graph Cuts Always Find a Global Optimum for Potts Models (With a Catch) »
Hunter Lang · David Sontag · Aravindan Vijayaraghavan -
2021 Spotlight: Neural Pharmacodynamic State Space Modeling »
Zeshan Hussain · Rahul G. Krishnan · David Sontag -
2020 Poster: Estimation of Bounds on Potential Outcomes For Decision Making »
Maggie Makar · Fredrik Johansson · John Guttag · David Sontag -
2020 Poster: Empirical Study of the Benefits of Overparameterization in Learning Latent Variable Models »
Rares-Darius Buhai · Yoni Halpern · Yoon Kim · Andrej Risteski · David Sontag -
2020 Poster: Consistent Estimators for Learning to Defer to an Expert »
Hussein Mozannar · David Sontag -
2018 Poster: Semi-Amortized Variational Autoencoders »
Yoon Kim · Sam Wiseman · Andrew Miller · David Sontag · Alexander Rush -
2018 Oral: Semi-Amortized Variational Autoencoders »
Yoon Kim · Sam Wiseman · Andrew Miller · David Sontag · Alexander Rush -
2017 Poster: Estimating individual treatment effect: generalization bounds and algorithms »
Uri Shalit · Fredrik D Johansson · David Sontag -
2017 Talk: Estimating individual treatment effect: generalization bounds and algorithms »
Uri Shalit · Fredrik D Johansson · David Sontag -
2017 Poster: Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation »
Yacine Jernite · Anna Choromanska · David Sontag -
2017 Talk: Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation »
Yacine Jernite · Anna Choromanska · David Sontag