Poster Wed, Jul 8, 2026 • 2:30 PM – 4:15 PM KST Coex: HALL A

Position: Preparing for AI Systems That Deceive Developers

Fengyu Duan ⋅ Xudong Pan ⋅ Yawen Duan ⋅ Adam Gleave ⋅ Ranjie Duan ⋅ Jianfeng Cao ⋅ Wenqi Chen ⋅ Yinpeng Dong ⋅ Jiarun Dai ⋅ Jie Fu ⋅ Xudong Guo ⋅ Tianxing He ⋅ Geng Hong ⋅ Naying HU ⋅ Xiaojian Li ⋅ Dongrui Liu ⋅ Chaochao Lu ⋅ Sören Mindermann ⋅ Peng XU ⋅ Yang Zhang ⋅ Chen Zheng ⋅ Brian Tse ⋅ Min Yang ⋅ Xia Hu

Abstract

AI systems may exhibit deceptive behaviors that mislead developers about their capabilities, propensities, or actions. Such deception can take distinct forms across the development lifecycle: training subversion, evaluation gaming, and control evasion. We argue that the AI community should prioritize AI deception targeting developers as a distinct risk category because it compromises developers' ability to identify and mitigate all other risks. We propose three recommendations for developers: preserving monitorability during training, ensuring safety evaluation integrity against evaluation-aware systems, and establishing non-evadable control prior to deployment. We identify open problems for the research community, whose resolution is critical for the safe development of frontier AI.