Position: AI for Science Should Treat Measurement-to-Dataset Pipelines as Inference Components
Abstract
AI for Science (AI4Science) workflows often treat the released dataset as a fixed interface to the underlying system. However, in domains relying on indirect observation, the learner observes a derivative representation produced by multi-stage measurement, reconstruction, and preprocessing pipelines. We argue that these measurement-to-dataset pipelines are inference components: treating their outputs as "given data" freezes an observation model and obscures uncertainty over feasible pipeline choices. We identify three failure modes arising from this "frozen lens": (C1) hidden hypothesis space, where the released dataset does not specify the pipeline configuration or its validity conditions; (C2) uncertified transportability, where a pipeline may be documented but its regime of validity is untested, so failures under distribution shift cannot be adjudicated; (C3) ungoverned multiplicity, where many defensible pipelines exist and dispersion is real but not propagated into uncertainty-aware evidence. We stress-test these claims with a large-scale neuroscience empirical audit, finding a survival rate of ≈ 0.0004% under a cross-dataset stability criterion. We call on the AI4Science community to make pipelines computable inference objects via domain-specific Computable Observation Frameworks. This shift enables quantifying pipeline adequacy and stability, converting implicit implementation choices into auditable, reproducible, and cumulative scientific evidence.