Credible Information Subset Decomposition: An End-to-End Multi-fidelity Learning Model by Modeling Label Information
Abstract
In the AI4Chemistry scenario, utilizing heterogeneous data at different fidelity levels is a common and core issue. High-fidelity data is accurate but scarce, while low-fidelity data is abundant but biased. Traditional multi-fidelity methods typically identify cross-fidelity biases based on paired samples under different fidelity labels. However, due to the mismatch in dataset input distribution and the complexity of the biases themselves, these methods are difficult to implement in real-world scientific environments. To address this, we propose a trusted information subset decomposition framework that can efficiently utilize multi-fidelity data without requiring paired samples. Multi-fidelity label supervision is decomposed into three complementary subsets: a trusted information subset based on the absolute value of high-fidelity labels; a trusted subset that captures the reliability of the high-fidelity and medium-fidelity label intervals through adaptive constraints; and an ordered trusted subset representing the numerical relationships within the same fidelity level. These subsets are then integrated into a unified end-to-end model, enabling the reasonable utilization of medium- and low-fidelity information. Extensive experiments on various molecular and material property benchmarks demonstrate that our method consistently outperforms state-of-the-art multifidelity and singlefidelity baseline methods, and exhibits good robustness under real-world unpaired multifidelity conditions.