$\text{DT}^\text{2}$: Decision-Targeted Digital Twins
Harry Amad ⋅ Mihaela van der Schaar
Abstract
A digital twin (DT) is a virtual model of a real-world system that can assist decision-making by simulating scenarios induced by different policies. However, the typical design process of machine learning-based DTs does not optimise for this objective. We prove that, when model capacity is limited, typical DT training paradigms, which minimise one-step transition errors, can produce suboptimal models for ranking sets of policies. We further show that this holds empirically, even with expressive model classes. To address this, we introduce DT$^2$, a decision-targeted DT training paradigm. DT$^2$ uses off-policy evaluation methods to estimate values of candidate policies on offline data, and encourages the DT to generate rollouts that preserve pairwise policy rankings derived from these proxy ground-truths with an architecture-agnostic loss function. We empirically demonstrate the efficacy of our method across a range of settings and architectures, showing that DT$^2$ consistently improves policy ranking and reduces decision regret relative to conventional DT training, both for policies used during training and for unseen policies, while maintaining a good level of raw simulation fidelity.
Successful Page Load