Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

48 Results

<<   <   Page 1 of 4   >   >>
Workshop
Sat 0:40 Sociotechnical Safety Evaluation of AI systems
Laura Weidinger
Workshop
Fri 4:30 UK AI Safety Institute: Empirically Assessing AI's Risks & Advancing Systemic Safety
Tue 8:30 AI Safety Social: Navigating Misuse, Ethical Challenges, and Systemic Risks
Workshop
Sat 5:00 UK AI Safety Institute: Overview & Agents Evals
Herbie Bradley
Workshop
Sat 2:20 Oral: Scalable AI Safety via Doubly-Efficient Debate
Workshop
Sat 1:10 Update from UK Gov's AI Safety Institute - Evals & Advancing AI Governance
Cozmin Ududec
Workshop
Unified Taxonomy in AI Safety: Watermarks, Adversarial Defenses, and Transferable Attacks
Grzegorz Gluch · Sai Ganesh Nagarajan · Berkant Turan
Workshop
Games for AI-Control: Models of Safety Evaluations of AI Deployment Protocols
Charlie Griffin · Buck Shlegeris · Alessandro Abate
Workshop
Uncovering a Culture of AI Grassroots Experimentation by Boston City Employees: Safety Risks and Mitigation
Jude Ha · Audrey Chang
Oral
Wed 8:00 AI Control: Improving Safety Despite Intentional Subversion
Ryan Greenblatt · Buck Shlegeris · Kshitij Sachan · Fabien Roger
Workshop
Weak-to-Strong Jailbreaking on Large Language Models
Xuandong Zhao · Xianjun Yang · Tianyu Pang · Chao Du · Lei Li · Yu-Xiang Wang · William Wang
Workshop
AdaptiveBackdoor: Backdoored Language Model Agents that Detect Human Overseers
Heng Wang · Ruiqi Zhong · Jiaxin Wen · Jacob Steinhardt