ICML Altared Environments: The Role of Normative Infrastructure in AI Alignment

Poster
in
Workshop: Agentic Markets Workshop

Altared Environments: The Role of Normative Infrastructure in AI Alignment

Rakshit Trivedi · Nikhil Chandak · Carter Blair · Atrisha Sarkar · Tehilla Weltman · Dylan Hadfield-Menell · Gillian Hadfield

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Cooperation is central to human life, distinguishing humans as ultra-cooperative among mammals. We form stable groups that enhance welfare through mutual protection, knowledge sharing, and economic exchanges. As artificial intelligence gains autonomy in shared environments, ensuring AI agents can engage in cooperative behaviors is crucial. Research in AI views this as an alignment challenge and frames it in terms of embedding norms and values in AI systems. Such an approach, while promising, neglects how humans achieve stable cooperation through \textit{normative infrastructure}. This infrastructure establishes shared norms enforced by agents who recognize and sanction norm violations. Using multi-agent reinforcement learning (MARL), we investigate the impact of normative infrastructure on agents' learning dynamics and their cooperative abilities in mixed-motive games. We introduce the concept of an \textit{\textbf{altar}}, an environmental feature that encodes actions deemed sanctionable by a group of agents. Comparing the performance of simple, independent learning agents in environments with and without the altar, we assess the potential of normative infrastructure in facilitating AI agent alignment to foster stable cooperation.

Chat is not available.

Poster in Workshop: Agentic Markets Workshop

Altared Environments: The Role of Normative Infrastructure in AI Alignment

Rakshit Trivedi · Nikhil Chandak · Carter Blair · Atrisha Sarkar · Tehilla Weltman · Dylan Hadfield-Menell · Gillian Hadfield

Poster
in
Workshop: Agentic Markets Workshop