Skip to yearly menu bar Skip to main content


Persuade Me If You Can: Evaluating AI Agent Influence on Safety Monitors

Jennifer Za · Julija Bainiaksina · Tanush Chopra · Nikita Ostrovsky · Victoria Krakovna

Abstract

Chat is not available.