AgentExpt: Automating AI Experiment Design with LLM-based Resource Retrieval Agent
Abstract
In modern AI research, baseline and dataset selection is a high-stakes decision in experimental design. It operationalizes a research idea into a concrete evaluation protocol and largely determines the validity and comparability of empirical conclusions. However, making appropriate choices is increasingly difficult as baselines and datasets proliferate, while suitability is inherently context-dependent and rarely captured by baseline and dataset metadata. To address these challenges, we present \textbf{AgentExpt}, a comprehensive framework for baseline and dataset recommendation. We first curate a large-scale, high-quality knowledge base that links 108{,}825 accepted papers to their used baselines and datasets. Based on this resource, we design a \textit{collective perception-enhanced retriever} that represents each baseline or dataset by integrating first-person self-descriptions with third-person citation contexts, thereby effectively positioning them within the scholarly network. We further design a \textit{reasoning-augmented reranker} that encodes baseline-dataset interaction chains as a reasoning prior to fine-tune an LLM, producing refined rankings with interpretable justifications. Experiments show that our framework outperforms the strongest baseline, with average gains of +5.85\% in Recall@20 and +7.90\% in HitRate@10, and ablation studies confirm the effectiveness of our designed components. Overall, AgentExpt advances the efficient and reliable automation of experimental design. Our code is available at \url{https://anonymous.4open.science/r/Agentexpt-DD3E}.