Position: Every Ground Truth is a Human Construction, not an Objective Truth
Abstract
Ground truth datasets play a fundamental role as reference values in the training and evaluation of machine learning models. This position paper argues that ground truths are not neutral objective measurements that are naturally given, but instead that they are constructed by arrangements of humans and technologies. We argue that the ML community will benefit by articulating and discussing these often invisible or unreported choices and by acknowledging that reference data sets are contingent, not universal. Focusing on the situated and context-dependent nature of ground truths can improve reliability by enabling a better informed perspective on where, when, and how the datasets, and the models they have shaped, can best be used. We argue for increasing `situated reliability' which includes articulating the limits and strengths of models and their truth claims. Finally, paying more attention to the construction of ground truths can help achieve transparency and accountability and support interdisciplinary work in ML development.