Escaping Whack-a-Mole: Code Documentation Optimization via Dependency-Guided Bi-level Search
Abstract
As large language models increasingly serve as autonomous coding agents, code documentation must be optimized for agent comprehension rather than human readability. We frame agent-oriented documentation generation as a black-box optimization problem over the documentation space, where quality is measured solely by downstream code correctness. A central challenge for conventional LLM refinement methods is output coupling—program entities are interdependent, and refining the documentation of one entity can invalidate its callers, resulting in a persistent whack-a-mole phenomenon during inference-time scaling. We propose DocSearch, a dependency-guided bi-level search framework that systematically exploits test-time feedback. The outer level conducts a priority search over the program-entity dependency DAG, enforcing a callee-before-caller refinement order to prevent downstream interference. The inner level performs a beam search over documentation refinements, using diversified error message sampling from self-generated unit tests to better exploit diagnostic signals and escape local optima. We provide theoretical guarantees of monotonic progress, showing that our worthy condition prevents regression while enabling efficient exploration. On DevEval+, DocSearch achieves a 90.7% solve rate with GPT-4o, outperforming the strongest baseline by 32.6%. Cross-language experiments further demonstrate that optimized documentation transfers effectively to different target programming languages.