Poster
in
Workshop: Data-centric Machine Learning Research (DMLR): Datasets for Foundation Models
What is the Right Notion of Distance between Predict-then-Optimize Tasks?
Paula Rodriguez-Diaz · Kai Wang · David Alvarez-Melis · Milind Tambe
Dataset distances computed via optimal transport have emerged as a principled way to measure task similarity and serve as an informative criterion for various machine learning tasks, such as domain adaptation and transfer learning. Their utility has been primarily evaluated based on their informativeness for transfer learning success, typically measured by prediction error minimization. However, in Predict-then-Optimize (PtO) frameworks, where machine learning predictions are used as parameters for downstream optimization tasks, adaptation success is measured by decision regret minimization rather than prediction error minimization. In this work, we (i) demonstrate that dataset distances based solely on feature and label dimensions lack informativeness in the PtO framework, and (ii) propose a new dataset distance that accounts for downstream decisions. Our results show that decision-aware dataset distances effectively capture adaptation success in the PtO framework across three different predict-then-optimize tasks from the literature.