Depth over Fidelity in Fixed-Budget Noisy Evolution Strategies
Abstract
Noisy evolution strategies commonly mitigate ranking uncertainty by improving per-generation fidelity—for example, by allocating budget to resampling candidates or using robust aggregation to stabilize the within-generation ordering. Under strict fixed evaluation budgets, however, any additional intra-generation querying directly reduces the number of generations the algorithm can execute, shortening the optimization trajectory. This dynamic can be characterized as prioritizing fidelity over depth. We propose a paradigm shift in fixed-budget regimes toward depth over fidelity, arguing that the cumulative progress from a long sequence of noise-smoothed updates often outweighs that of a short sequence of rigorously denoised ones. We operationalize this principle via probabilistic elite membership, replacing hard truncation with conditional expected rank weights that integrate over ranking uncertainty. This shifts noise handling from the evaluation stage to the selection stage: rather than repeatedly reevaluating candidates to denoise their objective values, we directly smooth the selection signal driving the update. We instantiate this approach using residual bootstrapping: we perform sparse reevaluations near the selection boundary, store standardized noise residuals in a reusable pool, and generate bootstrap rankings to estimate expected weights. Recognizing that residual pool mismatch constitutes a potential statistical risk, we derive a falsifiable error decomposition and provide runtime diagnostics to ensure estimator validity. To prevent oversmoothing in low-noise regimes, we introduce an adaptive probe-and-switch mechanism that leverages a low-cost rank disagreement metric to dynamically select between standard CMA-ES and our bootstrap-based updates. Extensive evaluations across the COCO bbob-noisy suite and diverse external tasks—including RL policy search and noisy HPO—demonstrate consistent gains. Specifically, in high-misranking regimes constrained by strict budgets, our residual-bootstrap approach achieves substantially steeper progress curves than both uncertainty-handling CMA-ES and fixed-k resampling baselines. These results substantiate a testable thesis: when budgets are limited and ranking uncertainty is high, integrating uncertainty at the selection stage is more sample-efficient than reducing it at the evaluation stage.