Oral
in
Workshop: Machine Learning for Multimodal Healthcare Data
InterSynth: a semi-synthetic framework for benchmarking prescriptive inference from observational data
Dominic Giles · Robert Gray · Chris Foulon · Guilherme Pombo · Tianbo Xu · James K Ruffle · Rolf Jäger · Jorge Cardoso · Sebastien Ourselin · Geraint Rees · Ashwani Jha · Parashkev Nachev
Keywords: [ Benchmarking, domain shifts, and generalization ] [ Electronic healthcare records ] [ Medical Imaging ] [ Data sparsity, incompleteness and complexity ]
Treatments are prescribed to individuals in pursuit of contemporaneously unobserved outcomes, based on evidence derived from populations with historically observed treatments and outcomes. Since neither treatments nor outcomes are typically replicable in the same individual, alternatives remain counterfactual in both settings. Prescriptive fidelity therefore cannot be evaluated empirically at the individual-level, forcing reliance on lossy, group-level estimates, such as average treatment effects, that presume an implausibly low ceiling on individuation. The lack of empirical ground truths critically impedes the development of individualised prescriptive models, on which realising personalised care inevitably depends. Here we present InterSynth, a general platform for modelling biologically-plausible, empirically-informed, semi-synthetic ground truths, for the evaluation of prescriptive models operating at the individual level. InterSynth permits comprehensive simulation of heterogeneous treatment effect sizes and variability, and observed and unobserved confounding treatment allocation biases, with explicit modelling of decoupled response failure and spontaneous recovery. Operable with high-dimensional data such as high-resolution brain lesion maps, InterSynth offers a principled means of quantifying the fidelity of prescriptive models across a wide range of plausible real-world conditions. We demonstrate end-to-end use of the platform with an example employing real neuroimaging data from patients with ischaemic stroke, volume image-based succinct lesion representations, and semi-synthetic ground truths informed by functional, transcriptomic and receptomic data. We make our platform freely available to the scientific community.