Normative alignment: A new paradigm for principled autonomous agents
Abstract
How can agents make safe autonomous decisions in complex dynamic environments? While significant progress has been made in establishing safety guardrails to enforce compliance in generative models, these negative constraints often prove brittle in open-world environments. I argue that achieving generalisable agentic safety requires Normative Alignment: a new paradigm bridging positive alignment goals with context-dependent values to actively support human flourishing. Realising this paradigm presents a triple challenge of capability, measurement, and governance. First, it requires a shift in capability toward normative competence beyond generic reward maximisation. Anchoring agents in “thick” value concepts (such as duty of care or human autonomy) provides the contextual reasoning needed to adjudicate complex trade-offs in non-verifiable domains. Second, it demands new metrics that move optimisation targets beyond immediate preference satisfaction toward long-term human well-being. Third, it requires democratic governance. To ensure that this framework avoids algorithmic paternalism, these capabilities and metrics must be grounded in pluralism and representative societal input.