Position: Why LLMs Should Be Reasonably Morally Inconsistent
Jakob Stenseke ⋅ Aidan Kierans ⋅ Itamar Pres ⋅ Dylan Hadfield-Menell
Abstract
It is a widely held assumption that Large Language Models should be morally consistent. In this position paper, we critically analyze this assumption. We disambiguate six distinct notions of moral consistency and show that, for each, there exist cases where inconsistency can be both justified and desirable. Building on this analysis, we propose that LLMs should instead be $\textit{reasonably}$ morally inconsistent: consistency should be treated as a desirable but ultimately defeasible norm, with deviations permitted when $\textit{justified}$ by recognizable moral or contextual reasons that are made $\textit{transparent}$. We argue that recent benchmarks, by treating moral consistency as an unqualified good, are misguided and potentially counterproductive. As an alternative, we point towards pluralistic and process-focused alignment, and sketch a concrete benchmark format that aims to better accommodate the legitimate role of inconsistency in moral behavior and thought.
Successful Page Load