Adaptive Proxy Evaluation for Autonomously Improving ML Agents
Vignesh Baskaran ⋅ Prannay Hebbar ⋅ Samuel Verboomen ⋅ Alesia Ivanova ⋅ Selvam Palanimalai ⋅ Kunal Bhatia ⋅ Yogendra Manawat
Abstract
Autonomous ML agents search over many candidate solutions, but full training runs are too expensive to evaluate each one, so agents rely on proxy evaluations: shorter training, smaller data, fewer gradient steps. These proxies make search tractable but are unreliable. A weak proxy promotes candidates that look good early and fail later; a strong proxy is too costly to run at scale; and when each candidate defines its own evaluation, the resulting self-reported metrics make cross-candidate comparison untrustworthy. We address this with an external, adaptive proxy layer that evaluates every candidate under identical conditions and progressively raises fidelity as the search converges, giving fast feedback during exploration and reliable feedback during refinement. We implement the system in MLEvolve, a Monte Carlo Graph Search framework, with no task-specific configuration, pilot runs, or human intervention. On the MLE-bench Ventilator Pressure Prediction task, a single run reaches a state-of-the-art MAE of $0.1354$ within a 12-hour budget, outperforming all prior methods.
Successful Page Load