Workshop
|
Sat 0:40
|
Sociotechnical Safety Evaluation of AI systems
Laura Weidinger
|
|
Workshop
|
Fri 4:30
|
UK AI Safety Institute: Empirically Assessing AI's Risks & Advancing Systemic Safety
|
|
Social
|
Tue 8:30
|
AI Safety Social: Navigating Misuse, Ethical Challenges, and Systemic Risks
|
|
Workshop
|
Sat 5:00
|
UK AI Safety Institute: Overview & Agents Evals
Herbie Bradley
|
|
Workshop
|
Sat 2:20
|
Oral: Scalable AI Safety via Doubly-Efficient Debate
|
|
Workshop
|
Sat 1:10
|
Update from UK Gov's AI Safety Institute - Evals & Advancing AI Governance
Cozmin Ududec
|
|
Workshop
|
|
Unified Taxonomy in AI Safety: Watermarks, Adversarial Defenses, and Transferable Attacks
Grzegorz Gluch · Sai Ganesh Nagarajan · Berkant Turan
|
|
Workshop
|
|
Games for AI-Control: Models of Safety Evaluations of AI Deployment Protocols
Charlie Griffin · Buck Shlegeris · Alessandro Abate
|
|
Workshop
|
|
Uncovering a Culture of AI Grassroots Experimentation by Boston City Employees: Safety Risks and Mitigation
Jude Ha · Audrey Chang
|
|
Oral
|
Wed 8:00
|
AI Control: Improving Safety Despite Intentional Subversion
Ryan Greenblatt · Buck Shlegeris · Kshitij Sachan · Fabien Roger
|
|
Workshop
|
|
Weak-to-Strong Jailbreaking on Large Language Models
Xuandong Zhao · Xianjun Yang · Tianyu Pang · Chao Du · Lei Li · Yu-Xiang Wang · William Wang
|
|
Workshop
|
|
AdaptiveBackdoor: Backdoored Language Model Agents that Detect Human Overseers
Heng Wang · Ruiqi Zhong · Jiaxin Wen · Jacob Steinhardt
|
|