Skip to yearly menu bar Skip to main content


Persuade Me If You Can: Evaluating AI Agent Influence on Safety Monitors

Jennifer Za ⋅ Julija Bainiaksina ⋅ Tanush Chopra ⋅ Nikita Ostrovsky ⋅ Victoria Krakovna

Abstract

Chat is not available.