Workshop
|
|
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko · Francesco Croce · Nicolas Flammarion
|
|
Workshop
|
|
Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
Bang An · Sicheng Zhu · Ruiyi Zhang · Michael-Andrei Panaitescu-Liess · Yuancheng Xu · Furong Huang
|
|
Poster
|
Thu 4:30
|
Scalable AI Safety via Doubly-Efficient Debate
Jonah Brown-Cohen · Geoffrey Irving · Georgios Piliouras
|
|
Oral Session
|
Thu 7:30
|
Oral 6E Robustness and Safety
|
|
Oral
|
Tue 7:45
|
Position: AI-Powered Autonomous Weapons Risk Geopolitical Instability and Threaten AI Research
Riley Simmons-Edler · Ryan Badman · Shayne Longpre · Kanaka Rajan
|
|
Oral
|
Wed 8:00
|
AI Control: Improving Safety Despite Intentional Subversion
Ryan Greenblatt · Buck Shlegeris · Kshitij Sachan · Fabien Roger
|
|
Poster
|
Thu 2:30
|
Fair Data Representation for Machine Learning at the Pareto Frontier
Shizhou Xu · Thomas Strohmer
|
|
Social
|
Tue 8:30
|
AI Safety Social: Navigating Misuse, Ethical Challenges, and Systemic Risks
|
|
Poster
|
Wed 4:30
|
Monotone Individual Fairness
Yahav Bechavod
|
|
Poster
|
Thu 2:30
|
Standardized Interpretable Fairness Measures for Continuous Risk Scores
Ann-Kristin Becker · Oana Dumitrasc · Klaus Broelemann
|
|
Poster
|
Wed 2:30
|
Position: Machine Learning-powered Assessments of the EU Digital Services Act Aid Quantify Policy Impacts on Online Harms
Eleonora Bonel · Luca Nannini · Davide Bassi · Michele Maggini
|
|
Oral
|
Thu 8:15
|
Scalable AI Safety via Doubly-Efficient Debate
Jonah Brown-Cohen · Geoffrey Irving · Georgios Piliouras
|
|