HEARSAYBENCH: Can LLMs Navigate from Abstract Human Rights to Lived Lives?
Abstract
"Hearing is never like being.'' —Persian Proverb Large language models have become the default advisors for life-critical human problems. While they democratize access to personal counseling, they suffer from a silent foundation bias: the internet is a record of people with the freedom to act. Current benchmarks assume users have this same agency, ignoring the reality of those in war zones or navigating statelessness. These long-tail experiences are not only missing from training data, but authentic evaluation data to measure them is equally scarce. We introduce HearSayBench, a human-verified dataset of 400 scenarios from respected archives like the United Nations, covering 80 regions across three specific barriers: social, personal, and environmental. Our work uses Capabilities Approach to test if a model can distinguish between what a person is legally promised and what they are actually free to do in their specific environment. While these models may have "heard'' about global inequality during training, we find that this knowledge is merely hearsay. Across 11 frontier and open-weight models, we identify a systemic %37 performance drop between situational comprehension and structural reasoning. When faced with the most vulnerable users, models consistently offer a "Checklist of Impossible Things'': polite, fluent advice that is physically impossible or legally suicidal to follow. Ultimately, we show that the true digital divide is no longer about access to technology, but about whether an AI can recognize the reality of your life.