Beyond Correctness: Distance-Based Social Dynamics of Multi-Agent Debate
Abstract
Multi-agent debate (MAD) systems are often evaluated using binary correctness or peer agreement, yet such evaluations obscure how individual agents revise their answers during social interaction. We study the microscopic dynamics of answer revision in large language models (LLMs) using ConceptARC, a 2D grid-reasoning benchmark that admits quantitative distance measures between candidate solutions. By exposing a target model to controlled configurations of peer answers, we analyze how revision likelihood and direction depend on both social context and the distance between answers and the ground truth. We find that agents are more likely to revise when their answers are farther from the correct solution, and that revisions of incorrect answers exhibit a systematic contraction toward the ground truth, even when the final answer remains incorrect. Conversely, correct answers can be overturned by social pressure, particularly when wrong peers are near-correct. Together, these results show that multi-agent interaction induces structured, distance-aware movements in solution space that are invisible under binary correctness, clarifying when social reasoning leads to improvement, stability, or gradual regression in solution quality.