Socrates: Structured Questioning Unlocks Latent Knowledge in AI Research Agents
Abstract
LLM agents on open-ended research tasks consistently fail to apply knowledge they demonstrably possess: frontier models score above 88% on MMLU machine-learning content yet earn Kaggle medals on only 16.9% of MLE-bench tasks. We argue the bottleneck is knowledge activation, not capacity. We introduce Socrates, a multi-agent protocol pairing a tool-using Scientist with a question-only advisor that cannot provide answers, directives, or use tools. Asking probing questions forces the Scientist to surface its own latent knowledge into context. On five MLE-bench tasks, Socrates improves Kaggle test scores on 4 of 5 tasks (mean +55.9%) and outperforms a generic-PI baseline on 4 of 5, confirming the gain comes from the nature of questioning rather than extra interaction.