Where Models Concentrate and Humans Spread: A Coverage Framework for Distributional Pluralism in Open-Ended Generation
Abstract
We introduce a coverage-based framework for evaluating distributional pluralism in open-ended generation, asking whether outputs from large language models (LLMs) cover the empirical distribution of human responses across diverse contributors and communities. Building on Sorensen et al.’s distributional pluralism, we instantiate it as a geometry-based coverage problem: given a sample of legitimate human responses, we estimate a human response space without relying on pre-specified groups, opinions, or value dimensions. This framework allows us to evaluate not only whether LLMs generate plausible responses, but also where their outputs concentrate and which regions of human variation remain uncovered. We construct an empirical human response boundary in a shared sentence-embedding space and evaluate model outputs along two complementary metrics: how often they remain inside the boundary (IBR) and how much of the human response distribution they cover (LLM-Cov). Across tasks, LLMs show substantially lower coverage than a human-to-human reference, with the gap concentrated in peripheral regions of the human distribution. We further show that model outputs concentrate in central, high-density regions of the human distribution, while under-covered regions in the narrative task are structured. In HP fanfiction, models more easily reach canon-visible and stylistically regular writing while missing more implicit, irregular, and community-specific forms of expression, illustrating how open-ended generation can produce pluralistic under-representation even when outputs remain plausible.