Timezone: »

Interpretable learning-to-defer for sequential decision-making
Shalmali Joshi · Sonali Parbhoo · Finale Doshi-Velez

We focus on the problem of learning-to-defer to an expert under non-stationary dynamics in a sequential decision-making setting, by identifying pre-emptive deferral strategies. Pre-emptive deferral strategies are desirable when delaying deferral can result in suboptimal or undesirable long term outcomes, e.g. unexpected potential side-effects of a treatment. We formalize a deferral policy as being pre-emptive if delaying deferral does not lead to improved long-term outcomes. Our method, Sequential Learning-to-Defer (SLTD), explicitly measures the (expected) value of deferring now versus later based on the underlying uncertainty in non-stationary dynamics via posterior sampling. We demonstrate that capturing this uncertainty can allow us to test whether delaying deferral can help improve mean outcomes, and also provides domain experts with an indication of when the model's performance is reliable. Finally, we show that our approach outperforms existing non-sequential learning-to-defer baselines, whilst reducing overall uncertainty on multiple synthetic and semi-synthetic (Sepsis-Diabetes) simulators.

Author Information

Shalmali Joshi (Harvard University (SEAS))
Sonali Parbhoo (Harvard University)
Finale Doshi-Velez (Harvard University)

More from the Same Authors