Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Data-centric Machine Learning Research (DMLR): Datasets for Foundation Models

What is the Right Notion of Distance between Predict-then-Optimize Tasks?

Paula Rodriguez-Diaz · Kai Wang · David Alvarez-Melis · Milind Tambe


Abstract:

Dataset distances computed via optimal transport have emerged as a principled way to measure task similarity and serve as an informative criterion for various machine learning tasks, such as domain adaptation and transfer learning. Their utility has been primarily evaluated based on their informativeness for transfer learning success, typically measured by prediction error minimization. However, in Predict-then-Optimize (PtO) frameworks, where machine learning predictions are used as parameters for downstream optimization tasks, adaptation success is measured by decision regret minimization rather than prediction error minimization. In this work, we (i) demonstrate that dataset distances based solely on feature and label dimensions lack informativeness in the PtO framework, and (ii) propose a new dataset distance that accounts for downstream decisions. Our results show that decision-aware dataset distances effectively capture adaptation success in the PtO framework across three different predict-then-optimize tasks from the literature.

Chat is not available.