Feedback Attribution Determines Representation Geometry: Metrics for Evaluating World-Feedback Quality in Multi-Agent RL
Abstract
Cooperative multi-agent RL systems routinely use team-averaged rewards, mis-attributing world feedback: each agent receives the team outcome regardless of its individual contribution. We show this has a measurable geometric consequence on learned representations. We propose EffRank/n (effective rank normalized by agent count) and Dact (mean pairwise KL divergence between agents' action distributions) as diagnostics for feedback-attribution quality. On Tribal Village, individually rewarded MAPPO develops role-separable representations (EffRank/n = 0.22, Dact = 0.12, probe = 0.92; 3-seed mean). Switching only the feedback mechanism to team-averaged rewards collapses all three metrics monotonically: Dact falls to exactly zero, probe drops to near chance (0.37), and EffRank/n drops by more than half, with a mixed condition yielding intermediate values, confirming a dose-response between attribution granularity and geometry. On SMACv2, where unit type is directly encoded in the observation, EffRank/n saturates regardless of reward type while Dact still responds (0.134 vs. 0.040), revealing the complementary coverage of the two metrics. Both diagnostics are computable on standard training minibatches with <5% overhead.